Skip the GPU: A Managed-API Deploy on Anthropic + Voyage
Post 11 of the Pepper & Carrot AI flipbook series — the alternative deploy. Post 10 put the reading companion on five clouds with a Modal GPU serving Ollama, because the series is about local-first inference. This post takes the same app and ships it without a GPU at all: chat on the Anthropic Messages API, embeddings on Voyage AI. The whole point is that it's a *configuration* change, not a code change — the provider abstraction from Post 3 was built for exactly this, and the only new code in the repo is documentation. The interesting parts are the trade the swap makes (cost and latency for the local-first thesis) and the one real gotcha nobody warns you about: a managed embeddings model lives in a different vector space, so the search index has to be rebuilt before the first deploy or retrieval silently returns garbage.