A Free Mini Crash Course on AI Agents!

..powered with MCP + Tools + Memory + Observability.

👉
Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

TODAY'S ISSUE

TODAY’S DAILY DOSE OF DATA SCIENCE

A Free Mini Crash Course on AI Agents!

We just released a free mini crash course on building AI Agents, which is a good starting point for anyone to learn about Agents and use them in real-world projects.

0:00
/33:08

Here's what it covers:

  • What is an AI Agent
  • Connecting Agents to tools
  • Overview of MCP
  • Replacing tools with MCP servers
  • Setting up observability and tracing

Everything is done with a 100% open-source tool stack:

It builds Agents based on the following definition:

An AI agent uses an LLM as its brain, has memory to retain context, and can take real-world actions through tools, like browsing the web, running code, etc.

In short, it thinks, remembers, and acts.

Here's an overview of the system we're building!

  • User sends a query
  • Assistant runs a web search via MCP
  • Query + results go to Memory Manager
  • Memory Manager stores context in Graphiti
  • Response agent crafts the final answer

The video above explains everything in detail.

​You can find the entire code in this GitHub repo →​

Let's learn more about LLM app evaluation below.

LLM evaluation

​Create eval metrics for LLM Apps in pure English​​

Standard metrics are usually not that helpful since LLMs can produce varying outputs while conveying the same message.

In fact, in many cases, it is also difficult to formalize an evaluation metric as a deterministic code.

​G-Eval​ is a task-agnostic LLM as a Judge metric in Opik that solves this.

The concept of LLM-as-a-judge involves using LLMs to evaluate and score various tasks and applications.

It allows you to specify a set of criteria for your metric (in English), after which it will use a Chain of Thought prompting technique to create evaluation steps and return a score.

Let’s look at a demo below.

First, import the GEval class and define a metric in natural language:

Done!

Next, invoke the score method to generate a score and a reason for that score. Below, we have a related context and output, which leads to a high score:

However, with unrelated context and output, we get a low score as expected:

Under the hood, G-Eval first uses the task introduction and evaluation criteria to outline an evaluation step.

Next, these evaluation steps are combined with the task to return a single score.

That said, you can easily self-host Opik, so your data stays where you want.

It integrates with nearly all popular frameworks, including CrewAI, LlamaIndex, LangChain, and HayStack.

If you want to dive further, we also published a practical guide on Opik to help you integrate evaluation and observability into your LLM apps (with implementation).

It has open access to all readers.

Start here: ​A Practical Guide to Integrate Evaluation and Observability into LLM Apps.

MCP

​An MCP server to control Jupyter Notebook in real-time

0:00
/0:23

Jupyter MCP Server lets you interact with Jupyter notebooks.

As shown in the video below, it lets you:

  • Create code cells
  • Execute code cells
  • Create markdown cells

The server currently offers 2 tools:

  • add_execute_code_cell: Add and execute a code cell in a Jupyter notebook.
  • add_markdown_cell: Add a markdown cell in a Jupyter notebook.

You can use this for several complex use cases where you can analyze full datasets by just telling Claude Desktop the file path.

It will use the MCP server to control the Jupyter Notebook and analyze the dataset.

One really good thing about this is that during this interaction, the cells are executed immediately.

Thus, Claude knows if a cell ran successfully. If not, it can self-adjust.

​GitHub repo →​

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data in this crash course.
  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
  • Run large models on small devices using Quantization techniques.
  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
  • Learn how to scale and implement ML model training in this practical guide.
  • Learn 5 techniques with implementation to reliably test ML models in production.
  • Learn how to build and implement privacy-first ML systems using Federated Learning.
  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.