Categorization of Clustering Algorithms

...explained in a single frame.

Categorization of Clustering Algorithms
πŸ‘‰
Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

TODAY'S ISSUE

TOGETHER WITH DYNAMIQ

πŸ€– ​Develop Agentic AI/LLM apps 10x faster [open-source]​

​Dynamiq​ is a completely open-source, low-code, and all-in-one Gen AI framework for developing LLM applications with AI Agents and RAGs.

Here’s what stood out for me about ​Dynamiq​:

  • It seamlessly orchestrates multiple AI agents.
  • It facilitates RAG applications.
  • It easily manages complex LLM workflows.
  • It has a highly intuitive API.

All this makes it 10x easier to build production-ready AI applications.

If you're an AI Engineer, ​Dynamiq​ will save you hours of tedious orchestrations!

​Start building agentic AI/LLM apps today →​

TODAY’S DAILY DOSE OF DATA SCIENCE

Categorization of clustering algorithms

There’s a whole world of clustering algorithms beyond KMeans, which a data scientist must be familiar with.

In the following visual, we have summarized 6 different types of clustering algorithms:

1) Centroid-based: Cluster data points based on proximity to centroids.

2) Connectivity-based: Cluster points based on proximity between clusters.

3) Density-based: Cluster points based on their density. It is more robust to clusters with varying densities and shapes than centroid-based clustering.

  • DBSCAN is a popular algorithm here, but it has high run-time.
  • ​DBSCAN++​ solves this.
  • It is a faster and more scalable alternative to DBSCAN.
  • We covered both DBSCAN and DBSCAN++ in detail ​here​.

4) Graph-based: Cluster points based on graph distance.

5) Distribution-based: Cluster points based on their likelihood of belonging to the same distribution.

6) Compression-based: Transform data to a lower dimensional space and then perform clustering.

πŸ‘‰ Over to you: What other clustering algorithms will you include here?

CRASH COURSE (30 MINS)

Build robust decision-making systems with causal inference​​

β€œBecause” is possibly one of the most powerful words in business decision-making.

  • Our customer satisfaction improved because we introduced personalized recommendations.
  • The energy consumption dropped because of the new efficiency standards implemented.

Backing any observation/insights with causality gives so much ability to confidently use the word β€œbecause” in business/regular discussions.

Identifying these causal relationships is vital because these relationships typically require an additional inspection and statistical analysis that goes beyond the typical correlation analysis (which anyone can do).

​Learn how to develop causal inference-driven systems →​

It uncovers:

  • the details of causality
  • why it is difficult
  • what is counterfactual learning
  • four common techniques to determine causal impacts
  • learn some of the most widely used techniques in causal inference
  • and more about my personal experience using it in my projects

MODEL OPTIMIZATION

Model compression to optimize models for production

Model accuracy alone (or an equivalent performance metric) rarely determines which model will be deployed.

Much of the engineering effort goes into making the model production-friendly.

Because typically, the model that gets shipped is NEVER solely determined by performance β€” a misconception that many have.

Instead, we also consider several operational and feasibility metrics, such as:

  • Inference Latency: Time taken by the model to return a prediction.
  • Model size: The memory occupied by the model.
  • Ease of scalability, etc.

For instance, consider the image below. It compares the accuracy and size of a large neural network I developed to its pruned (or reduced/compressed) version:

Looking at these results, don’t you strongly prefer deploying the model that is 72% smaller, but is still (almost) as accurate as the large model?

Of course, this depends on the task but in most cases, it might not make any sense to deploy the large model when one of its largely pruned versions performs equally well.

We discussed and implemented 6 model compression techniques in the article ​here​, which ML teams regularly use to save 1000s of dollars in running ML models in production.

​Learn how to compress models before deployment with implementation β†’

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?
  • Drive revenue?
  • Can you scale ML models?
  • Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data in this crash course.
  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
  • Run large models on small devices using Quantization techniques.
  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
  • Learn how to scale and implement ML model training in this practical guide.
  • Learn 5 techniques with implementation to reliably test ML models in production.
  • Learn how to build and implement privacy-first ML systems using Federated Learning.
  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Our newsletter puts your products and services directly in front of an audience that matters β€” thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today β†’


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.