Skip to content

feat: enable GRPO training with logprobs from offline trajectory data #812

feat: enable GRPO training with logprobs from offline trajectory data

feat: enable GRPO training with logprobs from offline trajectory data #812