Back to projects

Shorts Factory

2025active

7-step AI pipeline that turns a topic into a polished educational video using Gemini (Deep Research, Interactions API) and Veo 3.1.

// GitHub

View Repository
1364 commits
Last commit 2 days ago
TypeScript

// Problem

Making one Wendover-style educational video takes a full day: research, narrative, script, voiceover, storyboard, visuals, assembly. I wanted high-production content without the 'AI slop' flooding YouTube.

// Solution

A structured pipeline mirroring professional production: research, narrative, script, voiceover, storyboard, images, clips. Each step has approval gates. AI assists; humans refine. Five videos now take the time one used to.

// What I Built

Next.js 16 app with 7 pipeline steps. Research uses Gemini's Deep Research + Interactions API for cited briefs. A custom script classifier trained on viral patterns drives the beat mapping and narrative structure. Script generates voiceover text with style cloning via swipe files. Voiceover has a teleprompter with real-time levels. Storyboard breaks scripts into timed frames. Images use Gemini 2.5 Flash. Clips animate frames with Veo 3.1. Copilot agents guide each transition; approved steps become immutable.

// Screenshots

Pipeline overview

Research step

Script editor

Video generation

// Technologies

Google Gemini (Deep Research + Interactions API)

Stateful orchestration via Interactions API with Deep Research for topic ideation and cited briefs.

Google Veo 3.1 + Nano Banana

Image-to-video generation with motion prompts, tuned for quality/cost tradeoffs across 5-8 second clips.

Next.js 16 + React 19 + TanStack Query

App Router with RSC, REST mutations, and client-side caching via query key factories.

PostgreSQL + Drizzle ORM

Type-safe database with atomic JSONB operations for concurrent generation.

Google Cloud Storage

Stores audio, images, and video clips with public URL generation.

// Lessons Learned

  • 01Made three videos manually before writing code. The workflow became the spec.
  • 02Style transfer via example swipe files beats generic prompts every time.
  • 03Locking approved steps prevents cascading inconsistencies through the pipeline.
  • 04Veo 3.1 understands motion prompts like 'camera pulls back to reveal'—feels directed, not procedural.
  • 05AI handles tedious work; humans make creative decisions. The 7-step structure enforces this separation.