Skip to content

jashvira/policy-gradient-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Policy Gradient Experiments

Collection of policy gradient and reinforcement learning experiments for various domains.

Experiments

Branch Description Domain
MBPP Code generation with GRPO + vLLM + SandboxFusion MBPP coding tasks
gsm8k Mathematical reasoning with GRPO GSM8K dataset
math_posttrain Post-training experiments MATH dataset (Hendrycks)

Each experiment lives in its own branch with complete code, documentation, and results. The main branch serves as an index to all experiments.

About

Collection of policy gradient experiments with LLMs

Resources

Stars

Watchers

Forks

Contributors