Skip to main content
LLMs

4 Ways to Run LLMs Locally

Llama-3, DeepSeek, Phi, and many many more.

Avi Chawla
Avi Chawla
👉

Being able to run LLMs also has many upsides:

  • Privacy since your data never leaves your machine.
  • Testing things locally before moving to the cloud and more.

Here are four ways to run LLMs locally.


#1) Ollama

Running a model through Ollama is as simple as executing this command:

To get started, install Ollama with a single command:

Done!

Now, you can download any of the supported models using these commands:

For programmatic usage, you can also install the Python package of Ollama or its integration with orchestration frameworks like Llama Index or CrewAI:

We heavily used Ollama in our RAG crash course if you want to dive deeper.

The video below shows the usage of ollama run deepseek-r1 command:

0:00
/0:20

#2) LMStudio

LMStudio can be installed as an app on your computer.

The app does not collect data or monitor your actions. Your data stays local on your machine. It’s free for personal use.

It offers a ChatGPT-like interface, allowing you to load and eject models as you chat. This video shows its usage:

0:00
/1:59

Just like Ollama, LMStudio supports several LLMs as well.


#3) vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

With just a few lines of code, you can locally run LLMs (like DeepSeek) in an OpenAI-compatible format:


#4) LlamaCPP

LlamaCPP enables LLM inference with minimal setup and good performance.

Here’s DeepSeek-R1 running on a Mac Studio:

0:00
/1:59

And these were four ways to run LLMs locally on your computer.

If you don’t want to get into the hassle of local setups, â€‹SambaNova’s fastest inference can be integrated into your existing LLM apps in just three lines of code:

Also, if you want to dive into building LLM apps, our full RAG crash course discusses RAG from basics to beyond:

👉 Over to you: Which method do you find the most useful?

Thanks for reading!

Published on Feb 10, 2025