What is Temperature in LLMs?

👉

Hey! Enjoy our free data science newsletter! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.

Industry ML resources | Advertise

TODAY'S ISSUE

TOGETHER WITH FIRECRAWL

🔥Turn ANY website into LLM-ready data [Open-source]

AI systems love neatly formatted data—Markdowns, Structured data, HTML etc.

And now it is easier than ever to produce LLM-digestible data!

Firecrawl is an open-source framework that takes a URL, crawls it, and converts it into a clean markdown or structured format. Star the repo below:

FireCrawl GitHub

Why Firecrawl?

LLM-ready formats → Markdown, HTML, Structured data, metadata.
Handles the hard stuff → proxies, anti-bots, dynamic content.
Customizable → exclude tags, custom headers, max depth.
Reliable → gets the data you need, no matter what.
Batching → scrape thousands of URLs at once
Media parsing → PDFs, DOCX, images
Actions → click, scroll, input, wait.

If you prefer FireCrawl's managed service, you can use the code "DDODS" for a 10% discount code here →

Thanks to Firecrawl for partnering with us today.

TODAY's DAILY DOSE OF DATA SCIENCE

What is temperature in LLMs?

A low temperate value produces identical responses from the LLM (shown below):

But a high temperate value produces gibberish.

What exactly is temperature in LLMs?

Let’s understand this today!

Traditional classification models use softmax to generate the final prediction from logits over all classes. In LLMs, the output layer spans the entire vocabulary.

The difference is that a traditional classification model predicts the class with the highest softmax score, which makes it deterministic.

But LLMs sample the prediction from these softmax probabilities:

Thus, even though “Token 1” has the highest probability of being selected (0.86), it may not be chosen as the next token since we are sampling.

Temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process:

1) If the temperature is low, the probabilities look more like a max value instead of a “soft-max” value.

This means the sampling process will almost certainly choose the token with the highest probability.
This makes the generation process look greedy and (almost) deterministic.

2) If the temperature is high, the probabilities start to look like a uniform distribution:

This means the sampling process may select any token.
This makes the generation process random and heavily stochastic.

A quick note: In practice, the model can generate different outputs even if temperature=0. This is because there are still several other sources of randomness, such as race conditions in multithreaded code.

Here are some best practices for using temperature:

Set a low temperature value to generate predictable responses.
Set a high temperature value to generate more random and creative responses.
An extremely high temperature value rarely has any real utility, as we saw at the top.

And this explains the objective behind temperature in LLMs.

That said, any AI system will only be as good as the data going in.

FireCrawl helps you ensure that your AI systems always receive neatly formatted data—Markdowns, Structured data, HTML, etc.

FireCrawl GitHub

If you prefer FireCrawl's managed service, you can use the code “DDODS” for a 10% discount code here →

👉 Over to you: How do you determine an ideal value of temperature?

IN CASE YOU MISSED IT

RAG crash course

RAG is a key NLP system that got massive attention due to one of the key challenges it solved around LLMs.

More specifically, if you know how to build a reliable RAG system, you can bypass the challenge and cost of fine-tuning LLMs.

That’s a considerable cost saving for enterprises.

And at the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

Thus, the objective of this crash course is to help you implement reliable RAG systems, understand the underlying challenges, and develop expertise in building RAG apps on LLMs, which every industry cares about now.

In Part 1, we explored the foundational components of RAG systems, the typical RAG workflow, and the tool stack, and also learned the implementation.
In Part 2, we understood how to evaluate RAG systems (with implementation).
In Part 3, we learned techniques to optimize RAG systems and handle millions/billions of vectors (with implementation).
In Part 4, we understood multimodality and covered techniques to build RAG systems on complex docs—ones that have images, tables, and texts (with implementation):

In Part 5, we understood the fundamental building blocks of multimodal RAG systems that will help us improve what we built in Part 4.
In Part 6, we utilized learnings from Part 5 to build a more extensive and capable multimodal RAG system.

Of course, if you have never worked with LLMs, that’s okay. We cover everything in a practical and beginner-friendly way.

THAT'S A WRAP

No-Fluff Industry ML resources to

Succeed in DS/ML roles

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop Industry ML skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data in this crash course.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn 5 techniques with implementation to reliably test ML models in production.
Learn how to build and implement privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Advertise to 600k+ data professionals

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., around the world.

Get in touch today →

The Full MCP Blueprint: Testing, Security, and Sandboxing in MCPs (Part B)

The Full MCP Blueprint: Testing, Security and Sandboxing in MCPs (Part A)

The Full MCP Blueprint: Integrating Sampling into MCP Workflows

What is Temperature in LLMs?

TOGETHER WITH FIRECRAWL

🔥​Turn ANY website into LLM-ready data [Open-source]

TODAY's DAILY DOSE OF DATA SCIENCE

What is temperature in LLMs?​​

IN CASE YOU MISSED IT

​RAG crash course

No-Fluff Industry ML resources to

Succeed in DS/ML roles

SPONSOR US