Transformer 順伝播

01 / 一文で言うと

transformer 順伝播の全体を単一タイムラインで走らせる —— 料理人が客の "light and tangy" を聞いて "shrimp ceviche with lime" と返す場面。上部の 6 段は料理人の頭の中で今どこにいるかを示す:学習済みの知識、注文のトークン化、位置エンコード、アテンション、最初の応答語のデコード、KV キャッシュで後続を高速生成。

全工程客 "light and tangy" → 料理人 "shrimp ceviche with lime"学習済み→ トークン化

01 · 語彙料理人が知っている言葉

·→·

02 · 埋め込み各言葉に対する彼の感覚

light········

and········

tangy········

shrimp········

ceviche········

with········

lime········

03 · 個人スタイル"light and tangy" のあとの先頭、上位 5

"light and tangy" の次の先頭トークン:

shrimp·

fish·

ceviche·

try·

our·

モデルは長年修行を積んだ料理人だと思ってほしい。客が入る前から、彼の頭の中には 3 つのものが既にある:知っている言葉(食材・味・料理名)、各言葉に対する感覚、そして自分の個人スタイル。これらは接客中に変わらない;Phase 02–06 は彼がこの 3 つを使って具体的な客に応える過程。

ステップ

0 / 72

フェーズ

01 学習済み

フェーズ進捗

1 / 10

フェーズ

1 / 6

0 / 72

02 / Further Reading

videoLet's build GPT: from scratch, in code, spelled out — Andrej KarpathyTwo-hour walkthrough that codes everything you just watched, in PyTorch.→postThe Illustrated Transformer — Jay AlammarDiagrams covering the same five-stage path. Read after watching.→codenanoGPT — the canonical reference implementation — Karpathy~300 lines of PyTorch implementing what you just animated.→