MCP-powered RAG Over Complex Docs

👉

Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

Industry ML resources | Advertise

TODAY'S ISSUE

TODAY’S DAILY DOSE OF DATA SCIENCE

MCP-powered RAG Over Complex Docs

The possibilities with MCPs are endless.

Today, let us show you how we used MCP to power an RAG application over complex documents.

To give you more perspective, here’s our document:

Here’s our tech Stack:

Cursor IDE as the MCP client.
EyelevelAI's GroundX to build an MCP server that can process complex docs.

Here's how it works:

User interacts with the MCP client (Cursor IDE)
Client connects to the MCP server and selects a tool.
Tools leverage GroundX to do an advanced search over docs.
Search results are used by the client to generate responses.

If you prefer to watch, we have added a video below:

Implementation details

Code

Now, let's dive into the code! The GitHub repo with the code is linked later in the issue.

1) Setup server

First, we set up a local MCP server using FastMCP and provide a name.

2) Create GroundX Client

GroundX offers capabilities document search and retrieval capabilities for complex real-world documents. You need to get an API key here and store it in a .env file.

Once done, here's how to set up a client:

3) Create Ingestion tool

This tool is used to ingest new documents into the knowledge base.

The user just needs to provide a path to the document to be ingested:

4) Create Search tool

This tool leverages GroundX’s advanced capabilities to do search and retrieval from complex real-world documents.

Here's how to implement it:

5) Start the server

Starts an MCP server using standard input/output (stdio) as the transport mechanism:

6) Connect to Cursor

Inside your Cursor IDE, follow this:

Cursor → Settings → Cursor Settings → MCP

Then add and start your server like this:

Done!

Now, you can interact with these documents directly through your Cursor IDE.

The video below gives a walk-through of what it looks like:

You can test EyeLevel on your complex docs here →

We use EyeLevel on all complex use cases because they have built powerful enterprise-grade parsing systems that can intuitively chunk relevant content and understand what’s inside each chunk, whether it's text, images, or diagrams, as shown below:

As depicted above, the system takes an unstructured (text, tables, images, flow charts) input and parses it into a JSON format that LLMs can easily process to build RAGs over.

Also, find the code for this demo in this GitHub repo →

Thanks for reading!

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop Industry ML skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn 5 techniques with implementation to reliably test ML models in production.
Learn how to build and implement privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Advertise to 600k+ data professionals

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →

The Full MLOps Blueprint: Monitoring and Observability—Part A

The Full MLOps Blueprint: Model Deployment—Part E

The Full MLOps Blueprint: Model Deployment—Part D

MCP-powered RAG Over Complex Docs

TODAY’S DAILY DOSE OF DATA SCIENCE

MCP-powered RAG Over Complex Docs

Implementation details

Code

No-Fluff Industry ML resources to

Succeed in DS/ML roles

SPONSOR US

Advertise to 600k+ data professionals

Read next

The Full MLOps Blueprint: Monitoring and Observability—Part A

The Full MLOps Blueprint: Model Deployment—Part E

The Full MLOps Blueprint: Model Deployment—Part D

Join the Daily Dose of Data Science Today!