Building a Context Engineering Workflow

hands-on

Building a Context Engineering Workflow

A few days back, we discussed covering a demo on context engineering.

So today, we'll build a multi-agent research assistant using context engineering principles.

This Agent will gather its context across 4 sources: Documents, Memory, Web search, and Arxiv.

Here’s our workflow:

User submits query.
Fetch context from docs, web, arxiv API, and memory.
Pass the aggregated context to an agent for filtering.
Pass the filtered context to another agent to generate a response.
Save the final response to memory.

Tech stack:

Tensorlake to get RAG-ready data from complex docs
Zep for memory
Firecrawl for web search
Milvus for vector DB
CrewAI for orchestration

The code is linked later in the issue.

Let's go!

First, what is context engineering (CE)?

LLMs aren’t mind readers. They can only work with what you give them.

Prompt engineering primarily focuses on “magic words” with an expectation of getting a better response.

CE involves creating dynamic systems that offer:

The right info
The right tools
In the right format

This ensures the LLM can effectively complete the task.

Crew flow

We'll follow a top-down approach to understand the code.

Here's an outline of what our flow looks like:

Note that this is one of many blueprints to implement a context engineering workflow. Your pipeline will likely vary based on the use case.

Prepare data for RAG

We use Tensorlake to convert the document into RAG-ready markdown chunks for each section.

The extracted data can be directly embedded and stored in a vector DB without further processing.

Indexing and retrieval

Now that we have RAG-ready chunks along with the metadata, it's time to store them in a self-hosted Milvus vector database.

We retrieve the top-k most similar chunks to our query:

Build memory layer

Zep acts as the core memory layer of our workflow. It creates temporal knowledge graphs to organize and retrieve context for each interaction.

We use it to store and retrieve context from chat history and user data.

Firecrawl web search

We use Firecrawl web search to fetch the latest news and developments related to the user query.

Firecrawl's v2 endpoint provides 10x faster scraping, semantic crawling, and image search, turning any website into LLM-ready data.

ArXiv API search

To further support research queries, we use the arXiv API to retrieve relevant results from their data repository based on the user query.

Filter context

Now, we pass our combined context to the context evaluation agent that filters out irrelevant context.

This filtered context is then passed to the synthesizer agent that generates the final response.

Kick off the workflow

Finally, we kick off our context engineering workflow with a query.

Based on the query, we notice that the RAG tool, powered by Tensorlake, was the most relevant source for the LLM to generate a response.

We also translated this workflow into a streamlit app that:

0:00

/0:31

Provides citations with links and metadata.
Provides insights into relevant sources.

That said, the workflow explained in the thread is one of the many blueprints. Your implementation can vary.

In the project, we used:

Tensorlake:
- It lets you transform any unstructured doc into AI-ready data.
- GitHub Repo →
Zep:
- It lets you build human-like memory for your Agents.
- GitHub Repo →
Firecrawl:
- It lets you power LLM apps with clean data from the web.
- GitHub Repo →
Milvus:
- It gives a high-performance vector DB for scalable vector search.
- GitHub Repo →

You can find the code here →

Thanks for reading!

IN CASE YOU MISSED IT

Function calling & MCP for LLMs

Before MCPs became mainstream (or popular like they are right now), most AI workflows relied on traditional Function Calling.

Now, MCP (Model Context Protocol) is introducing a shift in how developers structure tool access and orchestration for Agents.

Here’s a visual that explains Function calling & MCP:

Learn more about it with visual and code in our recent issue here →

PRIVACY-PRESERVING ML

Train models on private data with federated learning

There’s so much data on your mobile phone right now — images, text messages, etc.

And this is just about one user—you.

But applications can have millions of users. The amount of data we can train ML models on is unfathomable.

The problem?

This data is private.

So consolidating this data into a single place to train a model.

The solution?

Federated learning is a smart way to address this challenge.

The core idea is to ship models to devices, train the model on the device, and retrieve the updates:

But this isn't as simple as it sounds.

1) Since the model is trained on the client side, how to reduce its size?

2) How do we aggregate different models received from the client side?

3) [IMPORTANT] Privacy-sensitive datasets are always biased with personal likings and beliefs. For instance, in an image-related task:

Some devices may only have pet images.
Some devices may only have car images.
Some people may love to travel, and may primarily have travel-related images.
How to handle such skewness in data distribution?

Learn how to implement federated learning systems (beginner-friendly) →

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop Industry ML skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn 5 techniques with implementation to reliably test ML models in production.
Learn how to build and implement privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Advertise to 600k+ data professionals

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →

The Full MLOps Blueprint: Monitoring and Observability—Part B

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part A)

The Full MLOps Blueprint: Monitoring and Observability—Part A

​Building a Context Engineering Workflow​

hands-on

​Building a Context Engineering Workflow​

First, what is context engineering (CE)?

Crew flow

Prepare data for RAG

Indexing and retrieval

Build memory layer

Firecrawl web search

ArXiv API search

Filter context

Kick off the workflow

IN CASE YOU MISSED IT

​Function calling & MCP for LLMs

PRIVACY-PRESERVING ML

​Train models on private data with federated learning

No-Fluff Industry ML resources to

Succeed in DS/ML roles

SPONSOR US

Advertise to 600k+ data professionals

Read next

The Full MLOps Blueprint: Monitoring and Observability—Part B

A Practical Deep Dive Into Memory Optimization for Agentic Systems (Part A)

The Full MLOps Blueprint: Monitoring and Observability—Part A

Join the Daily Dose of Data Science Today!

Building a Context Engineering Workflow

Building a Context Engineering Workflow

Function calling & MCP for LLMs

Train models on private data with federated learning