Building Trustworthy Agentic/RAG Workflows

...explained step-by-step with code.

👉
Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

TODAY'S ISSUE

Hands-on demo

​Building trustworthy Agentic/RAG workflows​

It is common for RAG systems to produce inaccurate/unhelpful responses.

Today, let’s look at how we can improve this using ​Cleanlab Codex​, which is commonly used in production systems to automatically detect and resolve RAG inaccuracies.

Tech stack:

Here's the workflow:

  • LLM processes the query to select a tool
  • Converts the query into the right format (text/SQL)
  • Executes the tool and fetch the output
  • Generates a response with enriched context
  • Validates the response using Cleanlab's Codex

The video below depicts this entire system in action:

0:00
/1:07

Now, let's see the code!

Set up LLM

We'll use the latest Qwen3, served via OpenRouter.

Ensure the LLM supports tool calling for seamless execution.

Set up SQL Query engine

A Natural Language to SQL Engine turns plain queries into SQL commands, enabling easy data interaction.

Set up RAG Query engine

  • Convert PDF, DOCX, or any document to Markdown for Vector Storage with Docling.
  • Query Engine fetches context from Milvus, combines it with the query, and sends it to LLM for a response.

Set up tools

Now, it's time to set up and use both the query engines we defined above as tools. Our Agent will then smartly route the query to it's right tools.

Cleanlab Codex Validation

Next, we integrate Cleanlab Codex to evaluate and monitor the RAG app in just a few lines of code:

Create an Agentic workflow

With everything set up, let's create our agentic routing workflow.

Kick off the workflow

With everything set, it's time to activate our workflow.

We begin by equipping LLM with two tools: Document & Text-to-SQL Query.

After that, we invoke the workflow:

Streamlit UI

To enhance user-friendliness, we present everything within a clean and interactive Streamlit UI.

Upon prompting, notice that the app displays a Trust Score on the generated response.

This is incredibly important for RAG/Agentic workflows that are quite susceptible to inaccuracies and hallucinations.

Along with this, we also get specific evaluation metrics along with detailed insights and reasoning for each test run:

​Here’s the Codex documentation →​

​And you can find the code for today’s issue in this GitHub repo →

​Beyond grid and random search​​

There are many issues with Grid search and random search.

  • They are computationally expensive due to exhaustive search.
  • The search is restricted to the specified hyperparameter range. But what if the ideal hyperparameter exists outside that range?
  • They can ONLY perform discrete searches, even if the hyperparameter is continuous.

Bayesian optimization solves this.

It’s fast, informed, and performant, as depicted below:

Learning about optimized hyperparameter tuning and utilizing it will be extremely helpful to you if you wish to build large ML models quickly.

​​Learn Bayesian Optimization from scratch here →

TRUSTWORTHY ML

​Build confidence in model's predictions with conformal predictions​

Conformal prediction has gained quite traction in recent years, which is evident from the Google trends results:

The reason is quite obvious.

ML models are becoming increasingly democratized lately. However, not everyone can inspect its predictions, like doctors or financial professionals.

Thus, it is the responsibility of the ML team to provide a handy (and layman-oriented) way to communicate the risk with the prediction.

For instance, if you are a doctor and you get this MRI, an output from the model that suggests that the person is normal and doesn’t need any treatment is likely pretty useless to you.

This is because a doctor's job is to do a differential diagnosis. Thus, what they really care about is knowing if there's a 10% percent chance that that person has cancer or 80%, based on that MRI.

​Conformal predictions​ solve this problem.

A somewhat tricky thing about conformal prediction is that it requires a slight shift in making decisions based on model outputs.

Nonetheless, this field is definitely something I would recommend keeping an eye on, no matter where you are in your ML career.

​Learn how to practically leverage conformal predictions in your model →​useless

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data in this crash course.
  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
  • Run large models on small devices using Quantization techniques.
  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
  • Learn how to scale and implement ML model training in this practical guide.
  • Learn 5 techniques with implementation to reliably test ML models in production.
  • Learn how to build and implement privacy-first ML systems using Federated Learning.
  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.