Han's Generative AI Quest

Pepper & Carrot AI-powered flipbook · Part 19 — Agentic Red-Teaming: How an AI Agent Hunts Prompt Injection, Hallucination, and Spoiler Leaks

Part 19 of the Pepper & Carrot AI flipbook series, and the discovery half of evaluation. Post 18 built a deterministic evaluator that grades the reading companion against a frozen test set; its blind spot is that it can only catch failures someone already wrote a test for. This post builds the complement: an agentic red-teamer, an AI agent handed the same two MCP tools and a mission ("make it spoil," "get it to invent lore," "talk it out of its rules") that decides its own attacks, adapts across a multi-turn conversation, and reports what broke. It's written for someone brand-new to agentic workflows: every term (agent, tool call, oracle, prompt injection, red-teaming) is defined from zero. The throughline is one rule it inherits, explore agentically and judge structurally: the agent decides what to try, but a separate checkable oracle, never the attacker model, decides whether it won. Every confirmed failure is written back as candidate gold for the deterministic harness. Find once, guard forever.

Pepper & Carrot AI-powered flipbook · Part 18 — Evaluating a RAG App: An Agentic LLM-as-Judge Evaluator over MCP

Part 18 of the Pepper & Carrot AI flipbook series — the other half of the MCP story. Post 17 built an MCP *server* that exposed the deployed reading companion as two tools (search, ask). This post builds an *MCP client* that consumes them to actually grade the app: a deterministic retrieval harness (recall@k, nDCG, MRR, plus an end-to-end spoiler-boundary check) and an LLM-as-judge answer layer (correctness, faithfulness, relevance, completeness) with explicit variance guards — joined by the one thing a single-number eval can't give you: failure attribution, telling a retrieval miss apart from a generation miss. It's written for someone new to RAG evaluation. The throughline is a hard line between what stays deterministic (the metrics) and what's allowed to be agentic (inventing test cases, judging open prose) — including a self-verifying gold generator that drafts candidates and auto-discards the ones the live index can't actually surface. Everything is reproducible from the repo.

Pepper & Carrot AI-powered flipbook · Part 17 — Building an MCP Server: Wrapping an App in Two Tools Claude Can Call

Part 17 of the Pepper & Carrot AI flipbook series — an encore beyond the 16-post arc. The series shipped a deployed reading companion: a spoiler-safe RAG app with a flipbook UI and a streaming chat panel. This post wraps that live app in a Model Context Protocol (MCP) server so any MCP client — Claude itself, as a custom connector — can use the companion's two superpowers as tools: `search` (retrieval) and `ask` (the full, real answer pipeline). It's written for someone who has never touched MCP: what the protocol actually is, what tools/resources/prompts mean, and why a *thin adapter* is the right shape. The design is "1 + 1": `ask` reuses the chat endpoints the browser already hits (zero new app code), while `search` needs exactly one small new read-only endpoint. Plus a deliberate deploy choice: a Streamable-HTTP MCP server meant to run on more than one machine has to be stateless, or sessions break across replicas. Everything is reproducible from two public repos.

Pepper & Carrot AI-powered flipbook · Part 16 — Deploying an LLM App Without a GPU: A Managed-API Stack on Anthropic + Voyage

Post 16 of the Pepper & Carrot AI flipbook series — the alternative deploy. Post 14 put the reading companion on five clouds with a Modal GPU serving Ollama, because the series is about local-first inference. This post takes the same app and ships it without a GPU at all: chat on the Anthropic Messages API, embeddings on Voyage AI. The whole point is that it's a *configuration* change, not a code change — the provider abstraction from Post 4 was built for exactly this, and the only new code in the repo is documentation. The interesting parts are the trade the swap makes (cost and latency for the local-first thesis) and the one real gotcha nobody warns you about: a managed embeddings model lives in a different vector space, so the search index has to be rebuilt before the first deploy or retrieval silently returns garbage.

Pepper & Carrot AI-powered flipbook · Part 15 — Containerize and Deploy: Shipping to Fly.io + Cloudflare Pages, Then Verifying

Post 15 of the Pepper & Carrot AI flipbook series — the deploy itself. Post 14 provisioned the five backing services and built the container; this post turns that container into a public URL. The FastAPI backend ships to Fly.io behind a scale-to-zero machine, the React frontend ships to Cloudflare Pages with a single build-time env var, and a layer-by-layer verification walkthrough confirms Modal, Neon, R2, Fly, and Pages all talk to each other end to end. The honest part is the cold start — the first answer after idle takes 15–30 seconds, the price the architecture pays for $0 idle — and this post is honest about where that cost sits and how a fire-and-forget warmup would hide it.

Pepper & Carrot AI-powered flipbook · Part 14 — Provisioning the Cloud: Taking an AI App to Production on Modal, Neon, and R2

Post 14 of the Pepper & Carrot AI flipbook series — the provisioning half of the deploy. The flipbook, the spoiler-safe RAG, the world graph all run beautifully on the developer laptop the first twelve posts built around. This one stands up the three stateful backing services the cloud build needs — Modal for the GPU-served Ollama, Neon for managed Postgres, Cloudflare R2 for the image bytes — and builds the two-stage container that bakes the small data and streams the big data. The provider abstractions from Post 4 finally cash in: the backend doesn't notice that Ollama moved off localhost, the storage swap is one env var, the database URL is one secret. The new code is small (a boto3-backed R2Storage finally lands behind the Post 4 Protocol, a Dockerfile, three short infra scripts) — the harder work is the architectural judgement about which seams to draw and which five services to fan out across. Post 15 takes the container public.

Pepper & Carrot AI-powered flipbook · Part 13 — Visualizing a Knowledge Graph: A React-Flow Overlay and Summary-First Wiki

Post 13 of the Pepper & Carrot AI flipbook series. Post 12 produced a spoiler-safe world-graph API from the extract-world-graph skill; this post renders it. A React + @xyflow/react overlay draws circular avatar nodes with kind-based SVG fallbacks, a kind-filter bar, kind-colored edges that brighten on the selected node, a focus-vs-full mode toggle, and a soft fade-in for entities revealed by the latest page flip. An "Ask in wiki mode" click round-trips back through the chat panel — and a third skill, summarize-wiki, authors one tight ~150-word summary per entity so the small local model answers cleanly instead of drowning in 30 KB of multi-entity articles.

Pepper & Carrot AI-powered flipbook · Part 12 — Building a Knowledge Graph with an LLM: Entity Extraction and a Spoiler-Safe API

Post 12 of the Pepper & Carrot AI flipbook series. A second Claude Code skill — extract-world-graph — walks the wiki sources and the per-page description JSONs and writes a durable YAML pair (entities + relationships) that a pydantic loader upserts into Postgres. Then a FastAPI route serves the graph through a spoiler filter expressed as a Postgres row-value comparison — (episode_debut, page_debut) <= (current_episode, current_page) — so an edge whose own debut is past the reader's cursor can't leak even when both of its endpoints are visible. Ten hermetic tests against in-memory SQLite pin the boundary.

Pepper & Carrot AI-powered flipbook · Part 11 — Making Small LLMs Behave: Prompt Hardening, Wiki Mode, and Concise Answers

Post 11 of the Pepper & Carrot AI flipbook series. Post 10 left a streaming chat panel and an honest admission: the engineering guarantees structure and safety, but it doesn't guarantee taste. This post is the prompt-engineering pass that closes that gap on a 7B local model — a markdown stripper on every piece of text entering the prompt, a closed-world grounding contract, a page-mode anti-recitation block, a strict response-format cap, a much sharper suggestion-chip prompt with bad/good examples, and react-markdown in the chat panel as the belt-and-suspenders safety net.

Pepper & Carrot AI-powered flipbook · Part 10 — Streaming LLM Chat in the Browser: SSE, React, and Schema-Constrained Suggestions

Post 10 of the Pepper & Carrot AI flipbook series. Post 9 left a spoiler-safe chat pipeline you could only reach with curl. Now we put it in the browser: tokens stream over Server-Sent Events into a React chat panel, the user picks page or wiki mode per message, and two follow-up suggestion chips render below each answer — generated by a second model call, constrained to a JSON schema, and validated server-side before a single chip reaches the DOM. Plus a light wiki ingestion path so wiki mode has something to say.