Nov 9, 2024 RAG

A crash course on RAG systems - Part 2

Beginner-friendly and with implementation.

Avi Chawla

👉

TODAY'S ISSUE

TODAY’S DAILY DOSE OF DATA SCIENCE

A crash course on RAG systems - Part 2

Last week, we started a crash course on building RAG systems.

Part 2 is now available, where we are building on the foundations laid in Part 1.

Read here: A Crash Course on Building RAG Systems – Part 2 (With Implementation).

RAG crash course part 2

Why care?

Over the last few weeks, we have spent plenty of time understanding the key components of real-world NLP systems (like the deep dives on bi-encoders and cross-encoders for context pair similarity scoring).

RAG is another key NLP system that got massive attention due to one of the key challenges it solved around LLMs.

More specifically, if you know how to build a reliable RAG system, you can bypass the challenge and cost of fine-tuning LLMs.

That’s a considerable cost saving for enterprises.

And at the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

Thus, the objective of this crash course is to help you implement reliable RAG systems, understand the underlying challenges, and develop expertise in building RAG apps on LLMs, which every industry cares about now.

Read the first part here →
Read the second part here →

Of course, if you have never worked with LLMs, that’s okay. We cover everything in a practical and beginner-friendly way.

IN CASE YOU MISSED IT

Extend the context length of LLMs

GPT-3.5-turbo had a context window of 4,096 tokens.
Later, GPT-4 took that to 8,192-32k tokens.
Claude 2 reached 100,000 tokens.
Llama 3.1 → 128,000 tokens.
Gemini → 1M+ tokens.

We have been making great progress in extending the context window of LLMs.

But how?

We covered techniques that help us unlock larger context windows earlier this week.

Read the techniques to extend the context length of LLMs here →

IN CASE YOU MISSED IT

Building a 100% local multi-agent Internet research assistant with OpenAI Swarm & Llama 3.2

Recently, OpenAI released Swarm.

It’s an open-source framework designed to manage and coordinate multiple AI agents in a highly customizable way.

AI Agents are autonomous systems that can reason, think, plan, figure out the relevant sources and extract information from them when needed, take actions, and even correct themselves if something goes wrong.

We published a practical and hands-on demo of this in the newsletter. We built an internet research assistant app that:

Accepts a user query.
Searches the web about it.
And turns it into a well-crafted article.

The demo is shown below:

0:00

/0:45

Learn how to build this Agent here →

Published on Nov 9, 2024

A crash course on RAG systems - Part 2

TODAY’S DAILY DOSE OF DATA SCIENCE

​A crash course on RAG systems - Part 2

Why care?

IN CASE YOU MISSED IT

Extend the context length of LLMs

IN CASE YOU MISSED IT

Building a 100% local multi-agent Internet research assistant with OpenAI Swarm & Llama 3.2

A crash course on RAG systems - Part 2