Study Notes: Stanford CS336 Language Modeling from Scratch [11]

The complete end-to-end journey — building a Transformer language model from scratch and training it on TinyStories, covering BPE tokenization, multi-head attention, training loop design, and text generation.

Nov 16, 2025 CS336, Training

Study Notes: Stanford CS336 Language Modeling from Scratch [10]

Assembling a complete training pipeline for Transformer LMs — from AdamW optimizer and learning rate scheduling to memory-efficient data loading, checkpointing, and decoding strategies.

Nov 2, 2025 CS336, Training

Study Notes: Stanford CS336 Language Modeling from Scratch [9]

Implementing Softmax, Log-Softmax, and Cross-Entropy from scratch in PyTorch, with key mathematical tricks for numerical stability and essential tensor operations explained.

Oct 19, 2025 CS336, Training

Study Notes: Stanford CS336 Language Modeling from Scratch [8]

Planning LLM training — deep dive into cross-entropy loss, perplexity, SGD vs AdamW optimizers, memory requirements, computational cost estimation, learning rate schedules, and gradient clipping.

Oct 5, 2025 CS336, Training

Study Notes: Stanford CS336 Language Modeling from Scratch [7]

Dissecting the computational anatomy of GPT-2 models — understanding where FLOPs go during inference, how scaling changes the picture, and which optimization techniques matter most.

Sep 28, 2025 CS336, Transformers

Study Notes: Stanford CS336 Language Modeling from Scratch [6]

A survey of popular Transformer architectures — comparing encoder-decoder, decoder-only, and encoder-only designs across GPT, BERT, T5, LLaMA, and modern trends in the field.

Sep 17, 2025 CS336, Transformers

Study Notes: Stanford CS336 Language Modeling from Scratch [5]

Building a Transformer language model from scratch in PyTorch — covering every component from multi-head attention and RoPE embeddings to SwiGLU activations and the complete model assembly.

Sep 13, 2025 CS336, Transformers

Study Notes: Stanford CS336 Language Modeling from Scratch [4]

Demystifying GPT-2's pre-tokenization regex pattern — how one carefully crafted regex handles text from multiple languages, scripts, and symbol sets for BPE tokenization.

Aug 10, 2025 CS336, Tokenization

Study Notes: Stanford CS336 Language Modeling from Scratch [3]

Building a BPE tokenizer from scratch and training it on the TinyStories dataset, demonstrating how BPE achieves impressive compression ratios through iterative byte-pair merging.

Jul 26, 2025 CS336, Tokenization

Study Notes: Stanford CS336 Language Modeling from Scratch [2]

A concise walkthrough of Byte Pair Encoding (BPE) tokenization, covering Unicode, UTF-8 encoding, and how BPE builds a vocabulary by iteratively merging the most frequent byte pairs.

Jul 22, 2025 CS336, Tokenization