Han's Generative AI Quest

Streaming Chat in the Browser: SSE, React, and Schema-Constrained Suggestion Chips

Post 7 of the Pepper & Carrot AI flipbook series. Post 6 left a spoiler-safe chat pipeline you could only reach with curl. Now we put it in the browser: tokens stream over Server-Sent Events into a React chat panel, the user picks page or wiki mode per message, and two follow-up suggestion chips render below each answer — generated by a second model call, constrained to a JSON schema, and validated server-side before a single chip reaches the DOM. Plus a light wiki ingestion path so wiki mode has something to say.

The RAG Layer: Spoiler-Safe Retrieval Without Trusting the Prompt

Post 6 of the Pepper & Carrot AI flipbook series. The flipbook from Post 5 knows which page you're on. Now we build the chat pipeline that answers questions about that page — and we make spoiler safety a property of the database query, not a line in the prompt. Build a RetrievalService whose Chroma filter is derived from server-side reading progress, wire it into a FastAPI chat endpoint, drive it with curl, and prove the boundary holds even when the user tries to jailbreak it. No chat UI yet — that's Post 7.

From Database to Browser: A REST API and a Real Flipbook

Post 5 of the Pepper & Carrot AI flipbook series. With one episode sitting in Postgres + LocalStorage from Post 4, it's time to surface it. Build two typed FastAPI routes that resolve relative storage keys into absolute URLs at response time, and wire up a real page-flipping flipbook with React + StPageFlip — single page in portrait, two-page spread in landscape. By the end you have an episode picker plus a flipbook rendering real data from your local backend.

Claude Skills as an Ingestion Tool: When the Best Vision Model Is the One Driving Your Editor

Post 4 of the Pepper & Carrot AI flipbook series. The comic is images, not text — so before any RAG can happen, every page needs a description. This post walks through using a Claude Code skill as the vision provider for the ingestion pipeline of this portfolio-project specifically: no per-call API cost beyond the Claude Code subscription, auditable JSON artifacts on disk, same Claude model as Anthropic's hosted vision API. By the end, one full episode is ingested into Postgres + ChromaDB + local storage. The right vision provider is context-specific — local VLM, hosted API, and Claude Code each win under different constraints (budget, whether the pipeline runs unattended, throughput) — and the post includes a decision matrix mapping each constraint to the right choice.