Shorts Factory

2025active

7-step AI pipeline that turns a topic into a polished educational video using Gemini (Deep Research, Interactions API) and Veo 3.1.

// GitHub

View Repository

1364 commits

Last commit 2 days ago

TypeScript

// Problem

Making one Wendover-style educational video takes a full day: research, narrative, script, voiceover, storyboard, visuals, assembly. I wanted high-production content without the 'AI slop' flooding YouTube.

// Solution

A structured pipeline mirroring professional production: research, narrative, script, voiceover, storyboard, images, clips. Each step has approval gates. AI assists; humans refine. Five videos now take the time one used to.

// What I Built

Next.js 16 app with 7 pipeline steps. Research uses Gemini's Deep Research + Interactions API for cited briefs. A custom script classifier trained on viral patterns drives the beat mapping and narrative structure. Script generates voiceover text with style cloning via swipe files. Voiceover has a teleprompter with real-time levels. Storyboard breaks scripts into timed frames. Images use Gemini 2.5 Flash. Clips animate frames with Veo 3.1. Copilot agents guide each transition; approved steps become immutable.

// Screenshots

Pipeline overview

Research step

Script editor

Video generation

// Technologies

Google Gemini (Deep Research + Interactions API)

Stateful orchestration via Interactions API with Deep Research for topic ideation and cited briefs.

Google Veo 3.1 + Nano Banana

Image-to-video generation with motion prompts, tuned for quality/cost tradeoffs across 5-8 second clips.

Next.js 16 + React 19 + TanStack Query

App Router with RSC, REST mutations, and client-side caching via query key factories.

PostgreSQL + Drizzle ORM

Type-safe database with atomic JSONB operations for concurrent generation.

Google Cloud Storage

Stores audio, images, and video clips with public URL generation.

// Lessons Learned

01Made three videos manually before writing code. The workflow became the spec.
02Style transfer via example swipe files beats generic prompts every time.
03Locking approved steps prevents cascading inconsistencies through the pipeline.
04Veo 3.1 understands motion prompts like 'camera pulls back to reveal'—feels directed, not procedural.
05AI handles tedious work; humans make creative decisions. The 7-step structure enforces this separation.

← Back to all projects