Diffusion LLMs from the Ground Up: Training, Inference, and Practical Engineering
Diffusion LLMs Part 2: How dLLMs scale to 100B parameters, the inference stack that makes them fast, hands-on code, and when to actually use them.
A collection of 37 posts
Diffusion LLMs Part 2: How dLLMs scale to 100B parameters, the inference stack that makes them fast, hands-on code, and when to actually use them.
Diffusion LLMs Part 1: Understanding how diffusion language models work from first principles, the math behind masked diffusion, and why they represent a fundamentally different approach to text generation.
A guide to building robust decision-making systems in businesses with causal inference.
A guide to building robust decision-making systems in businesses with causal inference.
A deep dive into interpretability methods, why they matter, along with their intuition, considerations, how to avoid being misled, and code.
A deep dive into interpretability methods, why they matter, along with their intuition, considerations, how to avoid being misled, and code.
A deep dive into PDPs and ICE plots, along with their intuition, considerations, how to avoid being misled, and code.
A practical and beginner-friendly guide to building neural networks on graph data.
A practical and beginner-friendly guide to building neural networks on graph data.
A practical and beginner-friendly guide to building neural networks on graph data.
How to make ML models reflect true probabilities in their predictions?
Learn real-world ML model development with a primary focus on data privacy – A practical guide.
How to make ML models reflect true probabilities in their predictions?
A critical step towards building and using ML models reliably.
A step-by-step demonstration of an emerging neural network architecture — KANs.
What are KANs, how are they trained, and what makes them so powerful?
Techniques that help you become a "machine learning engineer" from a "machine learning model developer."
Immensely simplify deep learning model building with PyTorch Lightning.
Mathematically understanding the surprising phenomena that arise when dealing with data in high dimensions.
Diving into the mathematical motivation for using bagging.
Gaussian Mixture Models: A more robust alternative to KMeans.
A beginner-friendly introduction to HDBSCAN clustering and how it is superior to DBSCAN clustering.
Addressing major limitations of the most popular density-based clustering algorithm — DBSCAN.
The most extensive visual guide to never forget how t-SNE works.
The caveats of grid search and random search and how Bayesian optimization addresses them.
An extensive visual guide to never forget how XGBoost works.
Approaching PCA as an optimization problem.
The limitations of always using cross-entropy loss in ordinal datasets.
The lesser-known limitations of the R-squared metric.