Han's Generative AI Quest

Pepper & Carrot AI-powered flipbook · Part 16 of 16 — Skip the GPU: A Managed-API Deploy on Anthropic + Voyage

Post 16 of the Pepper & Carrot AI flipbook series — the alternative deploy. Post 14 put the reading companion on five clouds with a Modal GPU serving Ollama, because the series is about local-first inference. This post takes the same app and ships it without a GPU at all: chat on the Anthropic Messages API, embeddings on Voyage AI. The whole point is that it's a *configuration* change, not a code change — the provider abstraction from Post 4 was built for exactly this, and the only new code in the repo is documentation. The interesting parts are the trade the swap makes (cost and latency for the local-first thesis) and the one real gotcha nobody warns you about: a managed embeddings model lives in a different vector space, so the search index has to be rebuilt before the first deploy or retrieval silently returns garbage.

Pepper & Carrot AI-powered flipbook · Part 15 of 16 — Shipping It: Containerize, Deploy to Fly + Pages, and Verify

Post 15 of the Pepper & Carrot AI flipbook series — the deploy itself. Post 14 provisioned the five backing services and built the container; this post turns that container into a public URL. The FastAPI backend ships to Fly.io behind a scale-to-zero machine, the React frontend ships to Cloudflare Pages with a single build-time env var, and a layer-by-layer verification walkthrough confirms Modal, Neon, R2, Fly, and Pages all talk to each other end to end. The honest part is the cold start — the first answer after idle takes 15–30 seconds, the price the architecture pays for $0 idle — and this post is honest about where that cost sits and how a fire-and-forget warmup would hide it.

Pepper & Carrot AI-powered flipbook · Part 14 of 16 — Going to Production: Provisioning Modal, Neon, and R2

Post 14 of the Pepper & Carrot AI flipbook series — the provisioning half of the deploy. The flipbook, the spoiler-safe RAG, the world graph all run beautifully on the developer laptop the first twelve posts built around. This one stands up the three stateful backing services the cloud build needs — Modal for the GPU-served Ollama, Neon for managed Postgres, Cloudflare R2 for the image bytes — and builds the two-stage container that bakes the small data and streams the big data. The provider abstractions from Post 4 finally cash in: the backend doesn't notice that Ollama moved off localhost, the storage swap is one env var, the database URL is one secret. The new code is small (a boto3-backed R2Storage finally lands behind the Post 4 Protocol, a Dockerfile, three short infra scripts) — the harder work is the architectural judgement about which seams to draw and which five services to fan out across. Post 15 takes the container public.

Pepper & Carrot AI-powered flipbook · Part 13 of 16 — Rendering the World Graph: A React-Flow Overlay and Summary-First Wiki

Post 13 of the Pepper & Carrot AI flipbook series. Post 12 produced a spoiler-safe world-graph API from the extract-world-graph skill; this post renders it. A React + @xyflow/react overlay draws circular avatar nodes with kind-based SVG fallbacks, a kind-filter bar, kind-colored edges that brighten on the selected node, a focus-vs-full mode toggle, and a soft fade-in for entities revealed by the latest page flip. An "Ask in wiki mode" click round-trips back through the chat panel — and a third skill, summarize-wiki, authors one tight ~150-word summary per entity so the small local model answers cleanly instead of drowning in 30 KB of multi-entity articles.

Pepper & Carrot AI-powered flipbook · Part 12 of 16 — A World Graph Built by a Skill: Extraction and a Spoiler-Safe API

Post 12 of the Pepper & Carrot AI flipbook series. A second Claude Code skill — extract-world-graph — walks the wiki sources and the per-page description JSONs and writes a durable YAML pair (entities + relationships) that a pydantic loader upserts into Postgres. Then a FastAPI route serves the graph through a spoiler filter expressed as a Postgres row-value comparison — (episode_debut, page_debut) <= (current_episode, current_page) — so an edge whose own debut is past the reader's cursor can't leak even when both of its endpoints are visible. Ten hermetic tests against in-memory SQLite pin the boundary.

Pepper & Carrot AI-powered flipbook · Part 11 of 16 — Making Small Models Behave: Wiki Mode and the Long Road to Concise Answers

Post 11 of the Pepper & Carrot AI flipbook series. Post 10 left a streaming chat panel and an honest admission: the engineering guarantees structure and safety, but it doesn't guarantee taste. This post is the prompt-engineering pass that closes that gap on a 7B local model — a markdown stripper on every piece of text entering the prompt, a closed-world grounding contract, a page-mode anti-recitation block, a strict response-format cap, a much sharper suggestion-chip prompt with bad/good examples, and react-markdown in the chat panel as the belt-and-suspenders safety net.

Pepper & Carrot AI-powered flipbook · Part 10 of 16 — Streaming Chat in the Browser: SSE, React, and Schema-Constrained Suggestion Chips

Post 10 of the Pepper & Carrot AI flipbook series. Post 9 left a spoiler-safe chat pipeline you could only reach with curl. Now we put it in the browser: tokens stream over Server-Sent Events into a React chat panel, the user picks page or wiki mode per message, and two follow-up suggestion chips render below each answer — generated by a second model call, constrained to a JSON schema, and validated server-side before a single chip reaches the DOM. Plus a light wiki ingestion path so wiki mode has something to say.

Pepper & Carrot AI-powered flipbook · Part 9 of 16 — The RAG Layer: Spoiler-Safe Retrieval Without Trusting the Prompt

Post 9 of the Pepper & Carrot AI flipbook series. The flipbook from Post 8 knows which page you're on. Now we build the chat pipeline that answers questions about that page — and we make spoiler safety a property of the database query, not a line in the prompt. Build a RetrievalService whose Chroma filter is derived from server-side reading progress, wire it into a FastAPI chat endpoint, drive it with curl, and prove the boundary holds even when the user tries to jailbreak it. No chat UI yet — that's Post 10.

Pepper & Carrot AI-powered flipbook · Part 8 of 16 — A Real Flipbook in the Browser: React + StPageFlip

Post 8 of the Pepper & Carrot AI flipbook series. Post 7 built two typed REST routes that return episode JSON with absolute image URLs; this post renders that data as a real page-flipping flipbook in the browser. A Vite + React + TypeScript frontend, a hand-rolled types.ts that mirrors the Pydantic models, an episode picker, and a wrapping StPageFlip — single page in portrait, two-page spread in landscape. Ends with appendices on the SQLAlchemy idioms behind the handlers and how Settings reads your .env.

Pepper & Carrot AI-powered flipbook · Part 7 of 16 — From Database to JSON: A Typed REST API

Post 7 of the Pepper & Carrot AI flipbook series. With one episode sitting in Postgres + LocalStorage from Post 5, it's time to surface it. Build two typed FastAPI routes — list episodes and episode detail — that resolve relative storage keys into absolute URLs at response time via the Storage Protocol, with the OpenAPI spec as the wire-format contract. By the end you have two endpoints returning episode JSON your browser can consume.