What is Temperature in LLMs?
Predictable ↔ Random.
Predictable ↔ Random.
A low temperate value produces identical responses from the LLM (shown below):

But a high temperate value produces gibberish.

What exactly is temperature in LLMs?
Let’s understand this today!
Traditional classification models use softmax to generate the final prediction from logits over all classes. In LLMs, the output layer spans the entire vocabulary.

The difference is that a traditional classification model predicts the class with the highest softmax score, which makes it deterministic.
But LLMs sample the prediction from these softmax probabilities:

Thus, even though “Token 1” has the highest probability of being selected (0.86), it may not be chosen as the next token since we are sampling.
Temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process:

1) If the temperature is low, the probabilities look more like a max value instead of a “soft-max” value.

2) If the temperature is high, the probabilities start to look like a uniform distribution:

A quick note: In practice, the model can generate different outputs even if temperature=0. This is because there are still several other sources of randomness, such as race conditions in multithreaded code.
Here are some best practices for using temperature:
And this explains the objective behind temperature in LLMs.
👉 Over to you: How do you determine an ideal value of temperature?
Thanks for reading!