MLOps and LLMOps: Case Studies
An exploration of real-world MLOps and LLMOps case studies, examining the importance of reliable ML and AI engineering and their significance for business outcomes.
A collection of 34 posts
An exploration of real-world MLOps and LLMOps case studies, examining the importance of reliable ML and AI engineering and their significance for business outcomes.
LLMOps Part 14: An overview of the fundamentals of LLM serving, including API-based access, inference with vLLM, and practical decisions.
LLMOps Part 13: Exploring the mechanics of LLM inference, from prefill and decode phases to KV caching, batching, and optimization techniques that improve latency and throughput.
LLMOps Part 12: Understanding LLM fine-tuning, parameter-efficient methods like LoRA and QLoRA, and alignment techniques such as RLHF, DPO, and GRPO.
LLMOps Part 11: Understanding evaluation of conversational LLM systems, tool evaluations, tracing with Langfuse, and automated red teaming.
LLMOps Part 10: Understanding model benchmarks, LLM application evaluation, and tooling.
LLMOps Part 9: A foundational guide to the evaluation of LLM applications, covering challenges and a practical taxonomy of evaluation methods.
LLMOps Part 8: A concise overview of memory, dynamic and temporal context in LLM systems, covering short and long-term memory, dynamic context injection, and some of the common context failure modes in agentic applications.
LLMOps Part 7: A conceptual overview of context engineering, covering context types, context construction principles, and retrieval-centric techniques for building high-signal inputs.
LLMOps Part 6: Exploring prompt versioning, defensive prompting, and techniques such as verbalized sampling, role prompting and more.
LLMOps Part 5: An introduction to prompt engineering (a subset of context engineering), covering prompt types, the prompt development workflow, and key techniques in the field.
LLMOps Part 4: An exploration of key decoding strategies, sampling parameters, and the general lifecycle of LLM-based applications.
LLMOps Part 3: A focused look at the core ideas behind attention mechanism, transformer and mixture-of-experts architectures, and model pretraining and fine-tuning.
LLMOps Part 2: A detailed walkthrough of tokenization, embeddings, and positional representations, building the foundational translation layer that enables LLMs to process and reason over text.
LLMOps Part 1: An overview of AI engineering and LLMOps, and the core dimensions that define modern AI systems.
MLOps Part 18: A hands-on guide to CI/CD in MLOps with DVC, Docker, GitHub Actions, and GitOps-based Kubernetes delivery on Amazon EKS.
MLOps Part 17: ML monitoring in practice with Evidently, Prometheus and Grafana, stitched into a FastAPI inference service with drift reports, metrics scraping, and dashboards.
MLOps Part 16: A comprehensive overview of drift detection using statistical techniques, and how logging and observability keep ML systems healthy.
MLOps Part 15: Understanding the EKS lifecycle, getting hands-on with AWS setup, and deploying a simple ML inference service on Amazon EKS.
MLOps Part 14: Understanding AWS cloud platform, and zooming into EKS.
MLOps Part 13: An overview of cloud concepts that matter, from virtualization and storage choices to VPC, load balancing, identity, and observability.
MLOps Part 12: An introduction to Kubernetes, plus a practical walkthrough of deploying a simple FastAPI inference service using Kubernetes.
MLOps Part 11: A practical guide to taking models beyond notebooks, exploring serialization formats, containerization, and serving predictions using REST and gRPC.
MLOps Part 10: A comprehensive guide to model compression covering knowledge distillation, low-rank factorization, and quantization, followed by ONNX and ONNX Runtime as the bridge from training frameworks to fast, portable production inference.
MLOps Part 9: A deep dive into model fine-tuning and compression, specifically pruning and related improvements.
MLOps Part 8: A systems-first guide to model development and optimizing performance with disciplined hyperparameter tuning.
MLOps Part 7: An applied look at distributed data processing with Spark and workflow orchestration and scheduling with Prefect.
A full MLOps and LLMOps blueprint that covers the foundations of MLOps, projects, and real-world insights to build production-ready system. All-in-one course where you will learn how ML meets software engineering, DevOps, and data engineering.
MLOps Part 6: A deep dive into sampling, class imbalance, and data leakage; plus a hands-on Feast feature store demo.
MLOps Part 5: A detailed walkthrough of data engineering for MLOps, covering data sources, format performance trade-offs, and ETL/ELT pipelines.