Function Approximation
RL Part 5: From tables to parameterized value functions.
A collection of 6 posts
RL Part 5: From tables to parameterized value functions.
RL Part 4: Learning value functions and policies without a model. Monte Carlo methods, TD(0), SARSA, Q-learning, and the bias-variance bridge between them.
RL Part 3: Bellman expectation and optimality equations, policy iteration, value iteration, and why dynamic programming needs a model.
RL Part 2: Markov decision processes, returns, policies, and value functions.
RL Part 1: Agents, environments, rewards, and why RL is different from supervised learning.
A series of technical deep dives on Reinforcement Learning that covers fundamentals and background, the classical techniques, MDPs, Bellman equations, deep RL methods, how RL is used to train modern language models, agentic RL, and much more.