Transformer Forward Pass

01 / The one-sentence essence

The whole transformer forward pass in one timeline — a chef hearing a customer say "light and tangy" and replying "shrimp ceviche with lime". The 6 phases up top mark where in the chef's brain we currently are: trained knowledge, tokenizing the order, embedding + position, attention, decoding the first reply word, and KV-cached generation of the rest.
01trained1 / 1002tokenize03positional04attend05decode06kv cacheKV
Walkthroughcustomer "light and tangy" → chef "shrimp ceviche with lime"Trained→ Tokenize
01 · vocabularythe words the chef knows
··
··
··
··
··
··
··
02 · token embeddingshis feel for each word
light········
and········
tangy········
shrimp········
ceviche········
with········
lime········
03 · personal styleafter "light and tangy" — top 5 first replies
first token after "light and tangy" :
shrimp·
fish·
ceviche·
try·
our·
Picture the model as a chef who's spent years training. Before any customer walks in, three things already sit in his head — a vocabulary of food concepts, a feel for each one, and his personal style. These won't change during service; everything in Phase 02–06 is just the chef using them to answer one specific customer.
step
0 / 72
phase
01 Trained
phase progress
1 / 10
phase
1 / 6
0 / 72