Study Notes: Stanford CS336 Language Modeling from Scratch [6]
A survey of popular Transformer architectures — comparing encoder-decoder, decoder-only, and encoder-only designs across GPT, BERT, T5, LLaMA, and modern trends in the field.
A survey of popular Transformer architectures — comparing encoder-decoder, decoder-only, and encoder-only designs across GPT, BERT, T5, LLaMA, and modern trends in the field.
Building a Transformer language model from scratch in PyTorch — covering every component from multi-head attention and RoPE embeddings to SwiGLU activations and the complete model assembly.
Demystifying GPT-2's pre-tokenization regex pattern — how one carefully crafted regex handles text from multiple languages, scripts, and symbol sets for BPE tokenization.
Building a BPE tokenizer from scratch and training it on the TinyStories dataset, demonstrating how BPE achieves impressive compression ratios through iterative byte-pair merging.
A concise walkthrough of Byte Pair Encoding (BPE) tokenization, covering Unicode, UTF-8 encoding, and how BPE builds a vocabulary by iteratively merging the most frequent byte pairs.
Getting started with Stanford CS336 — setting up the local development environment and kicking off a self-paced journey through language modeling from scratch with one hour of study per day.

Using QLoRA and HuggingFace PEFT to fine-tune Falcon-7B for message summarization on SamSum dataset, with GPT-3.5-turbo as an automated evaluator for generated summaries.

Building a kitchen companion using OpenAI's Assistant API with text-to-speech and image generation, showcasing a holistic approach to conversational AI with just a few lines of code.
A conversational simulation environment powered by LLM agents, inspired by ChatArena, where AI agents engage in task-oriented or casual conversations.
Running a quantized Llama2 model on Apple M1 Pro for on-device question answering, with LangChain and FAISS for local RAG without relying on cloud APIs.