RAG vs. Agentic RAG

...explained visually

👉
Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

TODAY'S ISSUE

TODAY’S DAILY DOSE OF DATA SCIENCE

RAG vs. Agentic RAG explained visually

These are some issues with the traditional ​​​​RAG system​​​​:

  1. These systems retrieve once and generate once. This means if the retrieved context isn't enough, the LLM can not dynamically search for more information.
  2. RAG systems may provide relevant context but don't reason through complex queries. If a query requires multiple retrieval steps, traditional RAG falls short.
  3. There's little adaptability. The LLM can't modify its strategy based on the problem at hand.

Due to this, Agentic RAG is becoming increasingly popular. Let's understand this today in more detail.

On a side note, we started a beginner-friendly crash course on RAGs recently with implementations:​ Read the first six parts here →​

Agentic RAG

Think of agents as someone who can actively think through a task—planning, adapting, and iterating until they arrive at the best solution, rathar than just following a defined set of instructions. The powerful capabilities of LLMs make this possible.

The workflow of agentic RAG is depicted below:

As shown above, the idea is to introduce agentic behaviors at each stage of RAG.

Let's understand this step-by-step:

Steps 1-2) The user inputs the query, and an agent rewrites it (removing spelling mistakes, simplifying it for embedding, etc.)

Step 3) Another agent decides whether it needs more details to answer the query.

  • Step 4) If not, the rewritten query is sent to the LLM as a prompt.
  • Step 5-8) If yes, another agent looks through the relevant sources it has access to (vector database, tools & APIs, and the internet) and decides which source should be useful. The relevant context is retrieved and sent to the LLM as a prompt.

Step 9) Either of the above two paths produces a response.

Step 10) A final agent checks if the answer is relevant to the query and context.

  • Step 11) If yes, return the response.
  • Step 12) If not, go back to Step 1. This procedure continues for a few iterations until the system admits it cannot answer the query.

This makes the RAG much more robust since, at every step, agentic behavior ensures that individual outcomes are aligned with the final goal.

That said, it is also important to note that building RAG systems typically boils down to design preferences/choices.

The diagram above is one of many blueprints that an agentic RAG system may possess. You can adapt it according to your specific use case.

Going ahead, we shall cover RAG-focused agentic workflows in ​our ongoing RAG crash course​ in much more detail.

Why care about RAG?

RAG is a key NLP system that got massive attention due to one of the key challenges it solved around LLMs.

More specifically, if you know how to build a reliable RAG system, you can bypass the challenge and cost of fine-tuning LLMs.

That’s a considerable cost saving for enterprises.

And at the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

Thus, the objective of this crash course is to help you implement reliable RAG systems, understand the underlying challenges, and develop expertise in building RAG apps on LLMs, which every industry cares about now.

Of course, if you have never worked with LLMs, that’s okay. We cover everything in a practical and beginner-friendly way.

👉 Over to you: What does your system design look like for Agentic RAG?

MODEL TRAINING OPTIMIZATION

​​​PyTorch Dataloader has two terrible default settings​

Consider the model training loop in PyTorch shown below:

  • Line 5 transfers the data to the GPU from the CPU.
  • Everything executes on the GPU after the data transfer, i.e., lines 7-15.

This means when the GPU is working, the CPU is idle, and when the CPU is working, the GPU is idle, as depicted below:

Ideally, you can transfer batch 2 when the GPU is training the model on batch 1.

Enabling this is quite simple in PyTorch.

First, define the DataLoader object with pin_memory=True and num_workers.

Next, during the data transfer step in the training loop, specify non_blocking=True:

Done!

Here's the speed comparison on MNIST:

  • Under normal settings, the model takes 43 seconds to train on 5 epochs.
  • But with updated settings, the same model trains in 9 seconds:

Of course, this isn't the only technique to accelerate model training.

​We covered 15 techniques (with implementation) to optimize M training here →​

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data in this crash course.
  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
  • Run large models on small devices using Quantization techniques.
  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
  • Learn how to scale and implement ML model training in this practical guide.
  • Learn 5 techniques with implementation to reliably test ML models in production.
  • Learn how to build and implement privacy-first ML systems using Federated Learning.
  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.