

TODAY'S ISSUE
TODAY’S DAILY DOSE OF DATA SCIENCE
[Hands-on] RAG over audio files using AssemblyAI and DeepSeek-R1
While most RAG systems are built over text, plenty of data exists as speech (audio) and we need reliable ways to do RAG over them.
So today, let’s build a RAG app over audio files with DeepSeek-R1.
Here’s an overview of our app:

- Step 1) Takes an audio file and transcribes it using ​AssemblyAI​.
- Steps 2-3) Stores it in a Qdrant vector database.
- Steps 4-6) Queries the database to get context.
- Steps 7-8) Uses DeepSeek-R1 as the LLM to generate a response.

​AssemblyAI​ has always been my go-to for building speech-driven AI applications.
It’s an AI transcription platform that provides state-of-the-art AI models for any task related to speech & audio understanding.
Now let's jump into code!
The GitHub repo is linked towards the end of this issue.
Let's build
​Implementation​
To transcribe audio files, get an API key from ​AssemblyAI​ and store it in the `.env` file. ​Get the API key here →​

Next, we use ​AssemblyAI​ to transcribe audio with speaker labels. To do this:

- We set up the transcriber object.
- We enable speaker label detection in the config.
- We transcribe the audio using ​AssemblyAI​.
Moving on, we embed transcripts and store them in a vector database. To do this, we:

- Load the embedding model and generate embeddings.
- Connect to Qdrant and create a collection.
- Store the embeddings.
Now comes retrieval, where we query the vector database to retrieve sentences in the transcripts that are similar to the query:

- Convert the query into an embedding.
- Search the vector database.
- Retrieve the top results.
Finally, after retrieving the context:

- We construct a prompt.
- We use DeepSeek-R1 through Ollama to generate a response.
To make this accessible, we wrap the entire app in a Streamlit interface. It’s a simple UI where you can upload and chat with the audio file directly:
That was simple, wasn’t it?
The code is available here: ​RAG over audio files​.
A departing note
​Why AssemblyAI?
We first used ​AssemblyAI​ over two years ago, and in my experience, it has the most developer-friendly and intuitive SDKs to integrate speech AI into applications.
​AssemblyAI​ first trained Universal-1 on 12.5 million hours of audio, outperforming every other model in the industry (from Google, OpenAI, etc.) across 15+ languages.
Recently, they released ​Universal-2​, their most advanced speech-to-text model yet.
Here’s how ​Universal-2​ compares with Universal-1:
- 24% improvement in proper nouns recognition
- 21% improvement in alphanumeric accuracy
- 15% better text formatting
Its performance compared to other popular models in the industry is shown below:

Isn’t that impressive?
We love ​AssemblyAI’s​ mission of supporting developers in building next-gen voice applications in the simplest and most effective way possible.
They have already made a big dent in speech technology, and we're eager to see how they continue from here.
Get started with:

Their API docs are available here if you want to explore their services: ​AssemblyAI API docs​.
🙌 A big thanks to ​AssemblyAI​, who very kindly partnered with us on this demo and allowed us to showcase their industry-leading AI transcription services.
👉 Over to you: What would you use ​AssemblyAI​ for?
Thanks for reading!
THAT'S A WRAP
No-Fluff Industry ML resources to
Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!
- Can you reduce costs?
- Drive revenue?
- Can you scale ML models?
- Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
- Learn sophisticated graph architectures and how to train them on graph data in this crash course.
- So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
- Run large models on small devices using Quantization techniques.
- Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
- Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
- Learn how to scale and implement ML model training in this practical guide.
- Learn 5 techniques with implementation to reliably test ML models in production.
- Learn how to build and implement privacy-first ML systems using Federated Learning.
- Learn 6 techniques with implementation to compress ML models.
All these resources will help you cultivate key skills that businesses and companies care about the most.
SPONSOR US
Advertise to 600k+ data professionals
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.