TODAY'S ISSUE
TODAYβS DAILY DOSE OF DATA SCIENCE
Building a 100% local mini-ChatGPT using Llama 3.2 Visionβ
We built a mini-ChatGPT that runs locally on your computer and is powered by the open-source Llama3.2-vision model.
Here's a demo before we show you how we built it:
You can chat with it just like you would chat with ChatGPT, and provide multimodal prompts.
Hereβs what we used:
- Ollama for serving the open-source Llama3.2-vision model locally.
- βChainlitβ, which is an open-source tool that lets you build production-ready conversational AI apps in minutes.
The code is available on GitHub here: βLocal ChatGPTβ.
Let's build it.
We'll assume you are familiar with multimodal prompting. If you are not, we covered in βPart 5 of our RAG crash courseβ (open-access).
We begin with the import statements and define the start_chat
method, which is invoked as soon as a new chat session starts:
We use the @cl.on_chat_start
decorator in the above method.
Next, we define another method which will be invoked to generate a response from the LLM:
- The user inputs a prompt.
- We add it to the interaction history.
- We generate a response from the LLM.
- We store the LLM response in the interaction history.
Finally, we define the main
method:
Done!
Run the app as follows:
This launches the app shown below:
The code is available on GitHub here: βLocal ChatGPTβ.
We launched this repo recently, wherein weβll publish the code for such hands-on AI engineering newsletter issues.
This repository will be dedicated to:
- In-depth tutorials on LLMs and RAGs.
- Real-world AI agent applications.
- Examples to implement, adapt, and scale in your projects.
Find it here: βAI Engineering Hubβ (and do star it).
π Over to you: What more functionalities would you like to see in the above app?
IN CASE YOU MISSED IT
βLoRA/QLoRAβExplained From a Business Lens
Consider the size difference between BERT-large and GPT-3:
I have fine-tuned BERT-large several times on a single GPU using traditional fine-tuning:
But this is impossible with GPT-3, which has 175B parameters. That's 350GB of memory just to store model weights under float16 precision.
This means that if OpenAI used traditional fine-tuning within its fine-tuning API, it would have to maintain one model copy per user:
- If 10 users fine-tuned GPT-3 β they need 3500 GB to store model weights.
- If 1000 users fine-tuned GPT-3 β they need 350k GB to store model weights.
- If 100k users fine-tuned GPT-3 β they need 35 million GB to store model weights.
And the problems don't end there:
- OpenAI bills solely based on usage. What if someone fine-tunes the model for fun or learning purposes but never uses it?
- Since a request can come anytime, should they always keep the fine-tuned model loaded in memory? Wouldn't that waste resources since several models may never be used?
ββLoRAββ (+ ββQLoRA and other variantsββ) neatly solved this critical business problem.
THAT'S A WRAP
No-Fluff Industry ML resources to
Succeed in DS/ML roles
At the end of the day, all businesses care about impact. Thatβs it!
- Can you reduce costs?
- Drive revenue?
- Can you scale ML models?
- Predict trends before they happen?
We have discussed several other topics (with implementations) in the past that align with such topics.
Here are some of them:
- Learn sophisticated graph architectures and how to train them on graph data in this crash course.
- So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
- Run large models on small devices using Quantization techniques.
- Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
- Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
- Learn how to scale and implement ML model training in this practical guide.
- Learn 5 techniques with implementation to reliably test ML models in production.
- Learn how to build and implement privacy-first ML systems using Federated Learning.
- Learn 6 techniques with implementation to compress ML models.
All these resources will help you cultivate key skills that businesses and companies care about the most.
SPONSOR US
Advertise to 450k+ data professionals
Our newsletter puts your products and services directly in front of an audience that matters β thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.