Proximal Policy Optimization
RL Part 8: Trust regions, the clipped surrogate, and the workhorse of modern RL.
RL Part 8: Trust regions, the clipped surrogate, and the workhorse of modern RL.
RL Part 7: Learning the policy directly, from REINFORCE to actor-critic.
RL Part 6: From linear features to neural networks, and the engineering choices that makes deep value-based RL possible.
RL Part 5: From tables to parameterized value functions.
A deep dive on building production-grade memory for Agents.
RL Part 4: Learning value functions and policies without a model. Monte Carlo methods, TD(0), SARSA, Q-learning, and the bias-variance bridge between them.
Everything you need to understand and customize Hermes Agent.
...explained with code and tradeoffs.