We often say that LLMs are stochastic parrots, but why? Where does the stochasticity come from? Well, this is because most closed-source LLMs like ChatGPT or Claude do not output the most probable words, but they sample the words based on the probabilities provided by the models.
Have you heard of the concept of temperature for Large Language models? This is the parameter that allows us to adjust how the sampling is performed.
The method to sample text is called Multinomial Sampling Generation. When an LLM generates logits, we need to take the Softmax transformation to obtain probabilities.
The Softmax transformation ensures that the values are bounded within [0, 1]. It also accentuates the largest value while reducing the other values. That is why it is called the “soft maximum“ function.
Having probabilities allows us to sample the words based on the predicted probabilities.
If we sample based on probabilities, different words may be selected at each iteration.
The problem with the Softmax transformation is that we are very dependent on the specific analytical form of that function. To induce more flexibility, we can introduce the temperature parameter.
Low temperature will induce a behavior close to the greedy approach, whereas high temperature will lead to uniformly random sampling.
The term “temperature“ is used because this Softmax function is known in physics as the Boltzmann or Gibbs distribution. It provides the distribution of the energy levels of a group of particles.
The multinomial distribution has a few advantages:
Diversity and Creativity
Reduced Repetitiveness
Better Exploration of the Model's Capabilities
Useful for Certain Applications
But also a few problems:
Reduced Coherence
Unpredictability
Quality Control
Difficulty in Controlling Output
Dependency on Temperature Setting
Less Suitable for Certain Tasks
Watch the video for more information!
SPONSOR US
Get your product in front of more than 62,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - tens of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing damienb@theaiedge.io.
Very useful article! Thank you!