

TODAY'S ISSUE
TODAYβS DAILY DOSE OF DATA SCIENCE
Deploy a Qwen 3 Agentic RAG
Today, we'll learn how to deploy an Agentic RAG powered by Alibaba's latest Qwen 3.
Here's our tool stack:
- CrewAI for Agent orchestration.
- Firecrawl for web search.
- LightningAI's LitServe for deployment.
The diagram shows our Agentic RAG flow:

- The Retriever Agent accepts the user query.
- It invokes a relevant tool (Firecrawl web search or vector DB tool) to get context and generate insights.
- The Writer Agent generates a response.
Next, let's implement and deploy it!
Step-by-step Implementation
Deploying an Agentic RAG
Here's the entire code to serve our Agentic RAG.

- The
setup
method orchestrates the Agents. - The
decode_request
method prepares the input. - The
predict
method invokes the Crew. - The
encode_response
method sends the response back.
Let's understand it step by step below.
Set up LLM
CrewAI seamlessly integrates with all popular LLMs and providers.
Here's how we set up a local Qwen 3 via Ollama.

Define Research Agent and Task
This Agent accepts the user query and retrieves the relevant context using a vectorDB tool and a web search tool powered by Firecrawl.
Again, put this in the LitServe setup()
method:

Define Writer Agent and Task
Next, the Writer Agent accepts the insights from the Researcher Agent to generate a response.
Yet again, we add this in the LitServe setup
method:

Set up the Crew
Once we have defined the Agents and their tasks, we orchestrate them into a crew using CrewAI and put that into a setup method.

Decode request
With that, we have orchestrated the Agentic RAG workflow, which will be executed upon an incoming request.
Next, from the incoming request body, we extract the user query.
Check the highlighted code below:

Predict
We use the decoded user query and pass it to the Crew defined earlier to generate a response from the model.
Check the highlighted code below:

Encode response
Here, we can post-process the response & send it back to the client.
Note: LitServe internally invokes these methods in order: decode_request
β predict
β encode_request
.
Check the highlighted code below:

With that, we are done with the server code.
Next, we have the basic client code to invoke the API we created using the requests Python library:

Done!
We have deployed our fully private Qwen 3 Agentic RAG using LitServe. Here's a recording of our deployed Qwen3 Agentic RAG:
That said, we started a crash course to help you implement reliable Agentic systems, understand the underlying challenges, and develop expertise in building Agentic apps on LLMs, which every industry cares about now.
Hereβs what we have done in the crash course (with implementation):
- ββIn Part 1ββ, we covered the fundamentals of Agentic systems, understanding how AI agents act autonomously to perform tasks.
- ββIn Part 2ββ, we extended Agent capabilities by integrating custom tools, using structured outputs, and we also built modular Crews.
- ββIn Part 3ββ, we focused on Flows, learning about state management, flow control, and integrating a Crew into a Flow.
- ββIn Part 4ββ, we extended these concepts into real-world multi-agent, multi-crew Flow projects.
- In ββPart 5ββ and ββPart 6ββ, we moved into advanced techniques that make AI agents more robust, dynamic, and adaptable, like Guardrails, Async execution, Callbacks, Human-in-the-loop, Multimodal Agents, and more.
- In ββPart 7ββ, we covered Knowledge of agentic Systems.
- In ββPart 8ββ and ββPart 9ββ, we primarily focused on 5 types of Memory for AI agents, which help agents βrememberβ and utilize past information.
- In ββPart 10ββ, we implemented the ReAct pattern from scratch.
- In ββPart 11ββββ, we implemented the Planning pattern from scratch.
- In βPart 12β and βPart 13β, we covered 10 practical steps to improve Agentic systems.
Of course, if you have never worked with LLMs, thatβs okay. We cover everything in a practical and beginner-friendly way.
You can βfind the code in this GitHub repo ββ
Thanks for reading, and weβll see you next week!
ROADMAP
From local ML to production ML
Once a model has been trained, we move to productionizing and deploying it.
If ideas related to production and deployment intimidate you, hereβs a quick roadmap for you to upskill (assuming you know how to train a model):
- First, you would have to compress the model and productionize it. Read these guides:
- Reduce their size with βModel Compression techniquesβ.
- βSupercharge ββPyTorch Modelsββ With TorchScript.β
- If you use sklearn, learn how to βoptimize them with tensor operationsβ.
- Next, you move to deployment. βHereβs a beginner-friendly hands-on guideβ that teaches you how to deploy a model, manage dependencies, set up model registry, etc.
- Although you would have tested the model locally, it is still wise to test it in production. There are risk-free (or low-risk) methods to do that. βLearn what they are and how to implement them hereβ.
This roadmap should set you up pretty well, even if you have NEVER deployed a single model before since everything is practical and implementation-driven.
THAT'S A WRAP
No-Fluff Industry ML resources to
Succeed in DS/ML roles

At the end of the day, all businesses care about impact. Thatβs it!
- Can you reduce costs?
- Drive revenue?
- Can you scale ML models?
- Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
- Learn sophisticated graph architectures and how to train them on graph data in this crash course.
- So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
- Run large models on small devices using Quantization techniques.
- Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
- Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
- Learn how to scale and implement ML model training in this practical guide.
- Learn 5 techniques with implementation to reliably test ML models in production.
- Learn how to build and implement privacy-first ML systems using Federated Learning.
- Learn 6 techniques with implementation to compress ML models.
All these resources will help you cultivate key skills that businesses and companies care about the most.
SPONSOR US
Advertise to 600k+ data professionals
Our newsletter puts your products and services directly in front of an audience that matters β thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.