The Mathematics Behind RBF Kernel

...and why it so powerful

The Mathematics Behind RBF Kernel

A couple of days back, I wrote about kernels and why the kernel trick is called a โ€œtrick.โ€

To recap, the kernel provides a way to compute the dot product between two vectors, X and Y, in some high-dimensional space without projecting the vectors to that space.

In that post, we looked at the polynomial kernel and saw that it computes the dot product of a 2-dimensional vector in a 6-dimensional space without explicitly visiting that space.

Today, I want to continue that discussion and talk about the RBF kernel, another insanely powerful kernel, which is also the default kernel in a support vector classifier class implemented by sklearn:


To begin, the mathematical expression of the RBF kernel is depicted below (and consider that we have just a 1-dimensional feature vector):

You may remember from high school mathematics that the exponential function is defined as follows:

Expanding the square term in the RBF kernel expression, we get:

Distributing the gamma term and expanding the exponential term using the exponent rule, we get:

Next, we apply the exponential expansion to the last term and get the following:

Almost done.

Notice closely that the exponential expansion above can be rewritten as the dot product between the following two vectors:

And there you go.

We get our projection function:

It is evident that this function maps the 1-dimensional input to an infinite-dimensional feature space.

This shows that the RBF kernel function we chose earlier computes the dot product in an infinite-dimensional space without explicitly visiting that space.

This is why the RBF kernel is considered so powerful, allowing it to easily model highly complex decision boundaries.

Here, I want to remind you that even though the kernel is equivalent to the dot product between two infinite-dimensional vectors, we NEVER compute that dot product, so the computation complexity is never compromised.

That is why the kernel trick is called a โ€œtrick.โ€ In other words, it allows us to operate in high-dimensional spaces without explicitly computing the coordinates of the data in that space.

Isnโ€™t that cool?

Did you like the mathematical details here? If yes, we covered the mathematical foundations in such an intuitive and beginner-friendly way of many concepts here:

๐Ÿ‘‰ Over to you: Can you tell a major pain point of the kernel trick algorithms?


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! Youโ€™ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.