Archive

Bellman Equations and Dynamic Programming

RL Part 3: Bellman expectation and optimality equations, policy iteration, value iteration, and why dynamic programming needs a model.

RL Part 2: Markov decision processes, returns, policies, and value functions.

Berkeley beat GRPO by 10 points with 35× fewer rollouts and no GPU training,

The era of not writing custom reward functions.

RL Part 1: Agents, environments, rewards, and why RL is different from supervised learning.

...using Karpathy's context engineering principles!

Diffusion LLMs Part 2: How dLLMs scale to 100B parameters, the inference stack that makes them fast, hands-on code, and when to actually use them.

...explained with usage.