Transformer Forward Pass
01 / The one-sentence essence
The whole transformer forward pass in one timeline — a chef hearing a customer say "light and tangy" and replying "shrimp ceviche with lime". The 6 phases up top mark where in the chef's brain we currently are: trained knowledge, tokenizing the order, embedding + position, attention, decoding the first reply word, and KV-cached generation of the rest.
Walkthroughcustomer "light and tangy" → chef "shrimp ceviche with lime"Trained→ Tokenize
·→·
·→·
·→·
·→·
·→·
·→·
·→·
light········
and········
tangy········
shrimp········
ceviche········
with········
lime········
first token after "light and tangy" :
Picture the model as a chef who's spent years training. Before any customer walks in, three things already sit in his head — a vocabulary of food concepts, a feel for each one, and his personal style. These won't change during service; everything in Phase 02–06 is just the chef using them to answer one specific customer.
step
0 / 72
phase
01 Trained
phase progress
1 / 10
phase
1 / 6
0 / 72
02 / Further Reading
videoLet's build GPT: from scratch, in code, spelled outTwo-hour walkthrough that codes everything you just watched, in PyTorch.→postThe Illustrated TransformerDiagrams covering the same five-stage path. Read after watching.→codenanoGPT — the canonical reference implementation~300 lines of PyTorch implementing what you just animated.→