

TODAY'S ISSUE
Hands-on demo
​Build a multi-agent content creation system​
Lately, we have been experimenting heavily with ​Motia​, an open-source modern backend framework that brings together:

- Multi-agent orchestration
- APIs
- Event handling
- and background jobs…everything under one unified system.
The video below shows a demo where we have built a multi-agent content creation system that is also exposed via APIs.
Tech stack:
- ​Motia​ as the unified backend framework.
- ​Firecrawl​ to scrape web content.
- Ollama to locally serve Deepseek-R1 LLM.
Here’s the workflow:

- User submits URL to scrape.
- Firecrawl scrapes content and converts it to markdown.
- Twitter and LinkedIn agents run in parallel to generate content.
- Generated content gets scheduled via Typefully.
It’s easier to explain everything via video, so we have added one at the top.
It demonstrates building a multi-agent content creation system that is also exposed via APIs.
Just like React streamlines frontend development, Motia simplifies the AI backend, where you only need one solution instead of a dozen tools.
Key features:
- You can have Python, JS & TypeScript in the same workflow.
- You can deploy from your laptop to prod in one click.
- It has built-in observability & state management.
- It provides automatic retries & fault tolerance.
- It supports streaming response.
​​GitHub repo →​ (don’t forget to star it)
Hands-on
Tool calling in LLMs

Printing the response
, we get:

Notice that the message
key in the above response
object has tool_calls
, which includes relevant details, such as:
tool.function.name
: The name of the tool to be called.tool.function.arguments
: The arguments required by the tool.
Thus, we can utilize this info to produce a response as follows:

This produces the following output.

This produces the expected output.
Of course, the above output can also be passed back to the AI to generate a more vivid response, which we haven't shown here.
But this simple demo shows that with tool calling, the assistant can be made more flexible and powerful to handle diverse user needs.
👉 Over to you: What else would you like to learn about in LLMs?
LLM fine-tuning
​Implementing DoRA (an improved LoRA) from scratch​​
We have covered several LLM fine-tuning approaches before.
​​​​DoRA​ is another promising and SOTA technique that improves LoRA and other similar fine-tuning techniques.
These results show that even with a reduced rank (e.g., halving the LoRA rank), DoRA significantly outperforms LoRA:

But why care about efficient LLM fine-tuning techniques?
Traditional fine-tuning is practically infeasible with LLMs.

To understand, consider this:
GPT-3, which has 175B parameters. That's 350GB of memory just to store model weights under float16 precision.
This means that if OpenAI used traditional fine-tuning within its fine-tuning API, it would have to maintain one model copy per user:
- If 10 users fine-tuned GPT-3 → they need 3500 GB to store model weights.
- If 1000 users fine-tuned GPT-3 → they need 350k GB to store model weights.
- If 100k users fine-tuned GPT-3 → they need 35 million GB to store model weights.
And the problems don't end there:
- OpenAI bills solely based on usage. What if someone fine-tunes the model for fun or learning purposes but never uses it?
- Since a request can come anytime, should they always keep the fine-tuned model loaded in memory? Wouldn't that waste resources since several models may never be used?
Techniques like LoRA (and other variants) solved this key business problem.

​DoRA​ further optimized this.
We did an algorithmic breakdown of how DoRA works and observations from LoRA that led to its development.
We also implemented it from scratch in PyTorch to help you build an intuitive understanding.
​​​​RAG​​​​ is powerful, but not always ideal if you want to augment LLMs with more information.
Several industry use cases heavily rely on efficient LLM fine-tuning, which you must be aware of in addition to ​building robust RAG solutions​.
THAT'S A WRAP
No-Fluff Industry ML resources to
Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!
- Can you reduce costs?
- Drive revenue?
- Can you scale ML models?
- Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
- Learn sophisticated graph architectures and how to train them on graph data in this crash course.
- So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
- Run large models on small devices using Quantization techniques.
- Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
- Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
- Learn how to scale and implement ML model training in this practical guide.
- Learn 5 techniques with implementation to reliably test ML models in production.
- Learn how to build and implement privacy-first ML systems using Federated Learning.
- Learn 6 techniques with implementation to compress ML models.
All these resources will help you cultivate key skills that businesses and companies care about the most.
SPONSOR US
Advertise to 600k+ data professionals
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.