Building a Real-time Voice RAG Agent

TODAY’S DAILY DOSE OF DATA SCIENCE

Building a Real-time Voice RAG Agent

Typing to interact with AI applications can be a bit tedious and boring.

That is why real-time voice interactions will become more and more popular going ahead.

Today, let us show you how we built a real-time Voice RAG Agent, step-by-step.

Here’s an overview of what the app does:

Listens to real-time audio.
Transcribes it via AssemblyAI—a leading speech-to-text platform.
Uses your docs (via LlamaIndex) to craft an answer.
Speaks that answer back with Cartesia—a platform to generate seamless speech, power voice apps, and fine-tune your own voice models in near real-time.

The code is provided later in the article. Also, if you’d like, we have added a video below if you want to see this in action:

0:00

/6:00

Now let's jump into code!

Implementation

Real-time Voice RAG Agent

Now let's jump into code!

Set up environment and logging

This ensures we can load configurations from .env and keep track of everything in real-time.

Setup RAG

This is where your documents get indexed for search and retrieval, powered by LlamaIndex.

The Agent’s answer would be grounded to this knowledge base.

Setup Voice Activity Detection

We also want Voice Activity Detection (VAD) for a smooth real-time experience—so we’ll “prewarm” the Silero VAD model.

This helps us detect when someone is actually speaking.

The VoicePipelineAgent and Entry Point

This is where we bring it all together. The agent:

Listens to real-time audio.
Transcribes it using AssemblyAI.
Crafts an answer with your documents via LlamaIndex.
Speaks that answer back using Cartesia.

Run the app

Finally, we tie it all together. We run our agent with specifying the prewarm function and main entry point.

That’s it—your Real-Time Voice RAG Agent is ready to roll!

We added a video at the top if you want to see this in action!

The entire code is 100% open-source and available in this GitHub repo →

PRIVACY-PRESERVING ML

Train models on private data with federated learning

There’s so much data on your mobile phone right now — images, text messages, etc.

And this is just about one user—you.

But applications can have millions of users. The amount of data we can train ML models on is unfathomable.

The problem?

This data is private.

So consolidating this data into a single place to train a model.

The solution?

Federated learning is a smart way to address this challenge.

The core idea is to ship models to devices, train the model on the device, and retrieve the updates:

But this isn't as simple as it sounds.

1) Since the model is trained on the client side, how to reduce its size?

2) How do we aggregate different models received from the client side?

3) [IMPORTANT] Privacy-sensitive datasets are always biased with personal likings and beliefs. For instance, in an image-related task:

Some devices may only have pet images.
Some devices may only have car images.
Some people may love to travel, and may primarily have travel-related images.
How to handle such skewness in data distribution?

Learn how to implement federated learning systems (beginner-friendly) →

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop Industry ML skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn 5 techniques with implementation to reliably test ML models in production.
Learn how to build and implement privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Advertise to 600k+ data professionals

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →

The Full MLOps Blueprint: CI/CD Workflows

The Full MLOps Blueprint: Monitoring and Observability—Part B

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part A)

Building a Real-time Voice RAG Agent

TODAY’S DAILY DOSE OF DATA SCIENCE

Building a Real-time Voice RAG Agent

Implementation

Real-time Voice RAG Agent

Set up environment and logging

Setup RAG

Setup Voice Activity Detection

The VoicePipelineAgent and Entry Point

Run the app

PRIVACY-PRESERVING ML

​Train models on private data with federated learning​

No-Fluff Industry ML resources to

Succeed in DS/ML roles

SPONSOR US

Advertise to 600k+ data professionals

Read next

The Full MLOps Blueprint: CI/CD Workflows

The Full MLOps Blueprint: Monitoring and Observability—Part B

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part A)

Join the Daily Dose of Data Science Today!

Train models on private data with federated learning