Time Complexity of 10 ML Algorithms

TODAY’S DAILY DOSE OF DATA SCIENCE

Time Complexity of 10 ML Algorithms

This visual depicts the run-time complexity of the 10 most popular ML algorithms.

Why care?

Everyone is a big fan of sklearn implementations.

It takes just two (max three) lines of code to run any ML algorithm with sklearn.

However, due to this simplicity, most people often overlook the core understanding of an algorithm and the data-specific conditions that allow us to use an algorithm.

For instance, you cannot use SVM or t-SNE on a big dataset:

SVM’s run-time grows cubically with the total number of samples.
t-SNE’s run-time grows quadratically with the total number of samples.

Another advantage of understanding the run-time is that it helps us understand how an algorithm works end-to-end.

That said, we made a few assumptions in the above table:

In a random forest, all decision trees may have different depths. We have assumed them to be equal.
- Then, we find the k-smallest distances from this list.
- The run-time to determine the k-smallest values may depend on the implementation.
  - Sorting and selecting the k-smallest values will be O(nlogn).
  - But if we use a priority queue, it will take O(nlog(k)).
In t-SNE, there’s a learning step. However, the major run-time comes from computing the pairwise similarities in the high-dimensional space. You can learn how t-SNE works here: tSNE article.

During inference in kNN, we first find the distance to all data points. This gives a list of distances of size n (total samples).

Today, as an exercise, I would encourage you to derive these run-time complexities yourself.

This activity will give you confidence in algorithmic understanding.

👉 Over to you: Can you tell the inference run-time of KMeans Clustering?

Thanks for reading!

IN CASE YOU MISSED IT

Build a Reasoning Model Like DeepSeek-R1

If you have used DeepSeek-R1 (or any other reasoning model), you must have seen that they autonomously allocate thinking time before producing a response.

Last week, we shared how to embed reasoning capabilities into any LLM.

We trained our own reasoning model like DeepSeek-R1 (with code).

To do this, we used:

UnslothAI for efficient fine-tuning.
Llama 3.1-8B as the LLM to add reasoning capabilities to.

Find the implementation and detailed walkthrough newsletter here →

ROADMAP

From local ML to production ML

Once a model has been trained, we move to productionizing and deploying it.

If ideas related to production and deployment intimidate you, here’s a quick roadmap for you to upskill (assuming you know how to train a model):

First, you would have to compress the model and productionize it. Read these guides:
- Reduce their size with Model Compression techniques.
- Supercharge PyTorch Models With TorchScript.
- If you use sklearn, learn how to optimize them with tensor operations.
Next, you move to deployment. Here’s a beginner-friendly hands-on guide that teaches you how to deploy a model, manage dependencies, set up model registry, etc.
Although you would have tested the model locally, it is still wise to test it in production. There are risk-free (or low-risk) methods to do that. Learn what they are and how to implement them here.

This roadmap should set you up pretty well, even if you have NEVER deployed a single model before since everything is practical and implementation-driven.

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop Industry ML skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn 5 techniques with implementation to reliably test ML models in production.
Learn how to build and implement privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Advertise to 600k+ data professionals

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →

The Full MCP Blueprint: Testing, Security, and Sandboxing in MCPs (Part B)

The Full MCP Blueprint: Testing, Security and Sandboxing in MCPs (Part A)

The Full MCP Blueprint: Integrating Sampling into MCP Workflows

Time Complexity of 10 ML Algorithms

TODAY’S DAILY DOSE OF DATA SCIENCE

Time Complexity of 10 ML Algorithms

Why care?

IN CASE YOU MISSED IT

​Build a Reasoning Model Like DeepSeek-R1​​

ROADMAP

From local ML to production ML

No-Fluff Industry ML resources to

Succeed in DS/ML roles

SPONSOR US

Advertise to 600k+ data professionals

Read next

The Full MCP Blueprint: Testing, Security, and Sandboxing in MCPs (Part B)

The Full MCP Blueprint: Testing, Security and Sandboxing in MCPs (Part A)

The Full MCP Blueprint: Integrating Sampling into MCP Workflows

Join the Daily Dose of Data Science Today!

Build a Reasoning Model Like DeepSeek-R1