Han's Generative AI Quest

A World Graph Built by a Second Skill: Spoiler-Aware Knowledge Graph Overlay

Post 9 of the Pepper & Carrot AI flipbook series. The chat layer answers questions about pages and the wiki; now we add a third affordance — a spoiler-aware knowledge graph of the comic's world, rendered as an in-reader overlay — plus a third Claude Code skill that closes the loop back to the chat. A second skill walks the wiki sources + the per-page description JSONs and writes a durable YAML pair. A FastAPI route filters the graph with a Postgres row-value comparison so an edge whose own debut is past the reader's cursor doesn't leak even when both of its endpoints are visible. Two response modes — focus (on-page characters + 1-hop structural neighbors) and full (the whole spoiler- safe world) — share the same boundary. A React + xyflow overlay renders circular avatar nodes with kind-based SVG fallbacks, a kind-filter bar, kind-colored edges that brighten on the selected node, soft fade-in for newly-revealed entities, and an "Ask in wiki mode" click that round-trips back through the chat panel. A third skill — summarize-wiki — authors one tight ~150-word .md per entity so that "Ask in wiki mode" for a minor character like Truffel or a coven like Magmah actually works against qwen2.5:7b, instead of the 30 KB multi-entity articles blowing past the prompt-hardening guarantees from Post 8.

Making Small Models Behave: Wiki Mode and the Long Road to Concise Answers

Post 8 of the Pepper & Carrot AI flipbook series. Post 7 left a streaming chat panel and an honest admission: the engineering guarantees structure and safety, but it doesn't guarantee taste. This post is the prompt-engineering pass that closes that gap on a 7B local model — a markdown stripper on every piece of text entering the prompt, a closed-world grounding contract, a page-mode anti-recitation block, a strict response-format cap, a much sharper suggestion-chip prompt with bad/good examples, and react-markdown in the chat panel as the belt-and-suspenders safety net.

Streaming Chat in the Browser: SSE, React, and Schema-Constrained Suggestion Chips

Post 7 of the Pepper & Carrot AI flipbook series. Post 6 left a spoiler-safe chat pipeline you could only reach with curl. Now we put it in the browser: tokens stream over Server-Sent Events into a React chat panel, the user picks page or wiki mode per message, and two follow-up suggestion chips render below each answer — generated by a second model call, constrained to a JSON schema, and validated server-side before a single chip reaches the DOM. Plus a light wiki ingestion path so wiki mode has something to say.

The RAG Layer: Spoiler-Safe Retrieval Without Trusting the Prompt

Post 6 of the Pepper & Carrot AI flipbook series. The flipbook from Post 5 knows which page you're on. Now we build the chat pipeline that answers questions about that page — and we make spoiler safety a property of the database query, not a line in the prompt. Build a RetrievalService whose Chroma filter is derived from server-side reading progress, wire it into a FastAPI chat endpoint, drive it with curl, and prove the boundary holds even when the user tries to jailbreak it. No chat UI yet — that's Post 7.

From Database to Browser: A REST API and a Real Flipbook

Post 5 of the Pepper & Carrot AI flipbook series. With one episode sitting in Postgres + LocalStorage from Post 4, it's time to surface it. Build two typed FastAPI routes that resolve relative storage keys into absolute URLs at response time, and wire up a real page-flipping flipbook with React + StPageFlip — single page in portrait, two-page spread in landscape. By the end you have an episode picker plus a flipbook rendering real data from your local backend.

Claude Skills as an Ingestion Tool: When the Best Vision Model Is the One Driving Your Editor

Post 4 of the Pepper & Carrot AI flipbook series. The comic is images, not text — so before any RAG can happen, every page needs a description. This post walks through using a Claude Code skill as the vision provider for the ingestion pipeline of this portfolio-project specifically: no per-call API cost beyond the Claude Code subscription, auditable JSON artifacts on disk, same Claude model as Anthropic's hosted vision API. By the end, one full episode is ingested into Postgres + ChromaDB + local storage. The right vision provider is context-specific — local VLM, hosted API, and Claude Code each win under different constraints (budget, whether the pipeline runs unattended, throughput) — and the post includes a decision matrix mapping each constraint to the right choice.