Akshay Pachaar

Reinforcement Learning Course

Verifiable Rewards and GRPO

RL Part 10: Dropping two models from the four-model pipeline, and building rewards you can trust.

Jun 28

Reinforcement Learning Course

RLHF: Aligning Language Models with Human Feedback

Part 9: From human preferences to a trained reward signal, and the four-model PPO pipeline.

Jun 21

RLHF: Aligning Language Models with Human Feedback

Reinforcement Learning Course

Proximal Policy Optimization

RL Part 8: Trust regions, the clipped surrogate, and the workhorse of modern RL.

Jun 14

Reinforcement Learning Course

Policy Gradients: REINFORCE and Actor-Critic

RL Part 7: Learning the policy directly, from REINFORCE to actor-critic.

Jun 7

Policy Gradients: REINFORCE and Actor-Critic

Reinforcement Learning Course

Introduction to Deep RL and DQN

RL Part 6: From linear features to neural networks, and the engineering choices that makes deep value-based RL possible.

May 31

Reinforcement Learning Course

Function Approximation

RL Part 5: From tables to parameterized value functions.

May 24

Reinforcement Learning Course

Model-Free Learning

RL Part 4: Learning value functions and policies without a model. Monte Carlo methods, TD(0), SARSA, Q-learning, and the bias-variance bridge between them.

May 17

Reinforcement Learning Course

Bellman Equations and Dynamic Programming

RL Part 3: Bellman expectation and optimality equations, policy iteration, value iteration, and why dynamic programming needs a model.

May 10

Bellman Equations and Dynamic Programming

Reinforcement Learning Course

Markov Decision Processes and Value Functions

RL Part 2: Markov decision processes, returns, policies, and value functions.

May 3

Markov Decision Processes and Value Functions

Reinforcement Learning Course

Foundations of Reinforcement Learning

RL Part 1: Agents, environments, rewards, and why RL is different from supervised learning.

Apr 26

Classical ML and Deep Learning

Diffusion LLMs from the Ground Up: Training, Inference, and Practical Engineering

Diffusion LLMs Part 2: How dLLMs scale to 100B parameters, the inference stack that makes them fast, hands-on code, and when to actually use them.

Apr 19

Diffusion LLMs from the Ground Up: Training, Inference, and Practical Engineering

Classical ML and Deep Learning

Diffusion LLMs from the Ground Up: Theory, Math, and Why They Work

Diffusion LLMs Part 1: Understanding how diffusion language models work from first principles, the math behind masked diffusion, and why they represent a fundamentally different approach to text generation.

Verifiable Rewards and GRPO

RLHF: Aligning Language Models with Human Feedback

Proximal Policy Optimization

Policy Gradients: REINFORCE and Actor-Critic

Introduction to Deep RL and DQN

Function Approximation

Model-Free Learning

Bellman Equations and Dynamic Programming

Markov Decision Processes and Value Functions

Foundations of Reinforcement Learning

Diffusion LLMs from the Ground Up: Training, Inference, and Practical Engineering

Diffusion LLMs from the Ground Up: Theory, Math, and Why They Work

MLOps and LLMOps: Case Studies

Concepts of LLM Serving

LLM Inference and Optimization: Fundamentals, Bottlenecks, and Techniques

LLM Fine-tuning: Techniques for Adapting Language Models

Evaluation: Multi-turn Conversations, Tool Use, Tracing, and Red Teaming

Evaluation: Model Benchmarks and LLM Application Assessment

Evaluation: Fundamentals

Context Engineering: Memory and Temporal Context

Context Engineering: An Introduction to the Information Environment for LLMs

Context Engineering: Prompt Management, Defense, and Control

Context Engineering: Foundations, Categories, and Techniques of Prompt Engineering

Building Blocks of LLMs: Decoding, Generation Parameters, and the LLM Application Lifecycle

Building Blocks of LLMs: Attention, Architectural Designs and Training

Building Blocks of LLMs: Tokenization and Embeddings

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part C)

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part B)

Foundations of AI Engineering and LLMs

A Practical Guide to Integrate Evaluation and Observability into LLM Apps