Study Notes: Stanford CS336 Language Modeling from Scratch [11]
The complete end-to-end journey — building a Transformer language model from scratch and training it on TinyStories, covering BPE tokenization, multi-head attention, training loop design, and text generation.