Why Do We Use log-loss To Train Logistic Regression?

The origin of log-loss.

Why Do We Use log-loss To Train Logistic Regression?
👉

Do you remember the time you first learned about logistic regression?

In most cases, its loss function — log-loss, is introduced out of nowhere, without proper intuition, understanding, and most importantly, without answering the question “Why log-loss”?

$$ \text{log-loss} = - \sum_{i=1}^{N} y_{i} \cdot log(\hat y_{i}) + (1-y_{i}) \cdot log(1 - \hat y_{i}) $$

In fact, while drafting this blog, I did a quick Google search, and disappointingly, NONE of the top-ranked results discussed this.

Results for “Logistic Regression Guide”

The question is: Why do we specifically minimize the log-loss to train logistic regression? What is so special about it? Where does it come from?

$$ \text{log-loss} = - \sum_{i=1}^{N} y_{i} \cdot log(\hat y_{i}) + (1-y_{i}) \cdot log(1 - \hat y_{i}) $$

Let’s dive in!

Background

Before understanding the origin and utility of this loss function in logistic regression, it is immensely crucial to know how we model data while using logistic regression.

In other words, let’s understand how we frame its modeling mathematically.


Join the Daily Dose of Data Science Today!

A daily column with insights, observations, tutorials, and best practices on data science.

Get Started!
Join the Daily Dose of Data Science Today!

Great! You’ve successfully signed up. Please check your email.

Welcome back! You've successfully signed in.

You've successfully subscribed to Daily Dose of Data Science.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.