The Token Sampling Pipeline

How an LLM turns raw logits into one next token: temperature scaling, top-k clipping, top-p nucleus filtering, softmax renormalization, and a multinomial draw.

Read this as Which tokens are still eligible right before the draw?
Failure Trap
Treating temperature, top-k, and top-p as independent knobs instead of an ordered filter chain.
Decision Rule
Apply temperature first, filter the tail, renormalize, then sample or argmax deliberately.
The token sampling pipeline A six-frame walkthrough showing how raw logits become one sampled next token. The same ranked tokens pass through temperature scaling, top-k clipping, top-p nucleus filtering, softmax renormalization, and a final multinomial draw. 1. Logits Raw scores, not probabilities Token Logit Paris 4.1 Lyon 3.6 Berlin 2.2 Rome 1.0 Mars 0.4 One score per vocabulary token 2. Divide by T T below 1 sharpens gaps scaled = logits / 0.7 Paris 5.9 Lyon 5.1 Berlin 3.1 Rome 1.4 Temperature changes shape before filters 3. Top-k clip Keep the k highest scores k = 4 Fixed cap keeps four Paris keep Lyon keep Berlin keep Rome keep Mars drop 4. Top-p nucleus Keep until cumulative p passes 0.90 Sorted Cumulative Paris 0.52 Lyon 0.79 Berlin 0.92 p = 0.90 Rome drop The nucleus adapts to the distribution 5. Softmax Renormalize survivors to sum to 1 Token Probability Paris 52% Lyon 28% Berlin 12% survivors sum -> 1.00 6. Multinomial draw Sample one token from probabilities Paris 52% Lyon Other Next token "Lyon" Higher probability helps; it does not guarantee
1 / ?

Start with Logits

The model does not output probabilities. Its final vocabulary projection produces logits: raw scores, one for each possible next token.

  • Higher logit means the token is more favored by the model
  • The numbers can be any real value, not a 0-to-1 probability
  • The sampling pipeline turns this score vector into one token

Divide by Temperature

Temperature rescales the logits before probability conversion. Lower temperatures sharpen the distribution; higher temperatures flatten it.

  • T < 1 makes likely tokens dominate more strongly
  • T > 1 gives lower-ranked tokens more chance
  • T = 0 is a special greedy argmax branch, not sampling

Clip with Top-k

Top-k applies a fixed cap: keep only the highest-scoring k tokens and remove the rest from consideration.

  • It cuts off the long tail of weak candidates
  • The cap is fixed even when the distribution shape changes
  • Production examples often use it as a coarse safety filter

Clip with Top-p

Top-p, or nucleus sampling, sorts candidates by probability mass and keeps tokens until their cumulative probability crosses the chosen threshold.

  • It keeps fewer tokens when one candidate dominates
  • It keeps more tokens when the model is uncertain
  • This adaptive behavior is why top-p is a strong default

Softmax Renormalizes

After clipping, softmax converts the surviving scores into probabilities and renormalizes them so the remaining options sum to 1.

  • Dropped tokens get no probability mass
  • Survivors split the full probability budget
  • The result is a valid categorical distribution

Draw One Token

A multinomial draw samples one token from the final distribution. The highest-probability token is most likely, but it is not guaranteed unless the distribution has collapsed to argmax.

  • The selected token is appended to the context
  • The model runs again to produce the next logit vector
  • This repeats one token at a time for the whole response