Reinforcement Learning
How to Beat GRPO Without Touching Model Weights
Berkeley beat GRPO by 10 points with 35× fewer rollouts and no GPU training,
A collection of 3 posts
Berkeley beat GRPO by 10 points with 35× fewer rollouts and no GPU training,
The era of not writing custom reward functions.
A series of technical deep dives on Reinforcement Learning that covers fundamentals and background, the classical techniques, MDPs, Bellman equations, deep RL methods, how RL is used to train modern language models, agentic RL, and much more.