Shorts Factory
2025active7-step AI pipeline that turns a topic into a polished educational video using Gemini (Deep Research, Interactions API) and Veo 3.1.
// GitHub
// Problem
Making one Wendover-style educational video takes a full day: research, narrative, script, voiceover, storyboard, visuals, assembly. I wanted high-production content without the 'AI slop' flooding YouTube.
// Solution
A structured pipeline mirroring professional production: research, narrative, script, voiceover, storyboard, images, clips. Each step has approval gates. AI assists; humans refine. Five videos now take the time one used to.
// What I Built
Next.js 16 app with 7 pipeline steps. Research uses Gemini's Deep Research + Interactions API for cited briefs. A custom script classifier trained on viral patterns drives the beat mapping and narrative structure. Script generates voiceover text with style cloning via swipe files. Voiceover has a teleprompter with real-time levels. Storyboard breaks scripts into timed frames. Images use Gemini 2.5 Flash. Clips animate frames with Veo 3.1. Copilot agents guide each transition; approved steps become immutable.
// Screenshots
Pipeline overview
Research step
Script editor
Video generation
// Technologies
Google Gemini (Deep Research + Interactions API)
Stateful orchestration via Interactions API with Deep Research for topic ideation and cited briefs.
Google Veo 3.1 + Nano Banana
Image-to-video generation with motion prompts, tuned for quality/cost tradeoffs across 5-8 second clips.
Next.js 16 + React 19 + TanStack Query
App Router with RSC, REST mutations, and client-side caching via query key factories.
PostgreSQL + Drizzle ORM
Type-safe database with atomic JSONB operations for concurrent generation.
Google Cloud Storage
Stores audio, images, and video clips with public URL generation.
// Lessons Learned
- 01Made three videos manually before writing code. The workflow became the spec.
- 02Style transfer via example swipe files beats generic prompts every time.
- 03Locking approved steps prevents cascading inconsistencies through the pipeline.
- 04Veo 3.1 understands motion prompts like 'camera pulls back to reveal'—feels directed, not procedural.
- 05AI handles tedious work; humans make creative decisions. The 7-step structure enforces this separation.