Storybook Library API
AI-powered photorealistic illustration pipeline for children with visual disabilities
Description
Cortical Visual Impairment (CVI) is a visual processing disability where brain can process realistic photos much easier than illustrations, and is the leading cause of visual impairments in the US. Children with CVI have an inability to process abstract shapes like drawings and cartoons into meaningful object associations, making it difficult to learn how to read and limiting their ability to thrive. Storybook Library solves this by generating photorealistic images from literature, with limited characters on a pure black background.
This is achieved as a multi-step process that involves: parsing the text with an LLM to identify characters, clothing, scenes and themes, rendering these photo-realistically using an AI image model, monitoring and adjusting prompts for safety, and maintaining character consistency and continuity from scene to scene using a database record. Designed, architected, and developed from concept to delivery for educators working across 12 school districts in central Connecticut.
Visual Demo
Key Technical Decisions
LLM text preprocessing before illustration
Raw story text is preprocessed through GPT-4.1-mini before image generation. The LLM extracts scene boundaries, identifies characters present, and generates detailed visual descriptions — converting narrative prose into structured illustration prompts. This separation ensures the image model receives consistent, visually-oriented instructions rather than raw literary text.
Character consistency through context propagation
A significant challenge with AI illustration is that each render is independent, so a character can look completely different from one page to the next. The pipeline solves this by maintaining a persistent character registry per book. Character descriptions are stored in the database and evolve — when a character's appearance is approved after regeneration, the reference description updates to match. Every image prompt includes the full character context for all characters in that scene, plus reference images from previously approved renders.
Separate API and Worker services
Cost control pattern — the deployed API is intentionally read-only. The AI pipeline runs locally where costs can be monitored. The public endpoint only serves pre-processed content.
Upstash Redis polling with sorted-set delayed jobs
Upstash has no blocking pop (BLPOP). The worker polls with a 5-second interval and uses a sorted set with timestamp scores for rate limit backoff, promoting delayed jobs when their wait time expires.
Hand-rolled image regeneration UX
Admin can click any thumbnail to open a lightbox, edit the AI prompt, regenerate, then approve or reject in a side-by-side comparison view. Rejected images are deleted from R2; approved ones replace the original.
Architecture
Frontend (Vercel, static HTML) → API (Fly.io, Hono/TypeScript) → Supabase (Postgres) + Upstash Redis (job queue) → Worker (Fly.io) → OpenAI API + Cloudflare R2
Tech Stack
By the Numbers
15+ fairy tales catalogued across 2 public domain collections
4 classic novels with chapter-by-chapter processing plans
3-attempt safety fallback with semantic prompt escalation
9 database migrations tracking schema evolution
Full E2E test suite with Playwright (14 test flows)