A concrete production plan for creating a 75-second launch video for Agent Orchestrator using the video-use Claude Code skill.
Target deliverable: final.mp4, 1920×1080 @ 24fps, ~75 seconds, social-ready audio (-14 LUFS / -1 dBTP / LRA 11).
Pipeline: record raw takes → drop in folder → invoke Claude → skill transcribes, packs, cuts, grades, animates, subtitles, and self-evaluates.
video-use Works Under the HoodWrite this on paper first. Memorize roughly. Don't read verbatim — speak naturally. The skill will cut the umms and false starts at word-boundary precision, so don't worry about perfection.
"You've been running one coding agent at a time. Claude Code, Codex, Aider — one window, one task, one wait. You're leaving ten-x on the table."
"Every time CI fails, you context-switch back. Every time review comments land, you context-switch back. Your agent is blocked, and you're the bottleneck. That's not how this scales."
"Agent Orchestrator spawns any coding agent — Claude Code, Codex, Aider, OpenCode — in its own git worktree, with its own PR, running in parallel. Eight agents at once. One dashboard. They fix CI failures themselves. They address review comments themselves. They manage their own PRs."
"Every abstraction is a plugin slot. Runtime, agent, workspace, tracker, SCM, notifier — swap any of them. tmux or process. GitHub or Linear or GitLab. Desktop notifications or Slack. It's your stack."
"One config file.
ao spawn. Watch eight agents ship eight PRs while you drink coffee. CI is green by the time you're back."
"Agent Orchestrator. MIT licensed. github.com/ComposioHQ/agent-orchestrator. Ship more in parallel."
Total: ~75s. Record 3–5 takes per beat. Short beats = more takes (they're cheap). Long beats = 3 takes is fine.
render.py's -r 24 outputShoot 3–5 takes of each beat. Slate each take ("Beat 1, take 1" — clap hands on camera for a sync point).
| Clip name | Beat | # takes | What to capture |
|---|---|---|---|
TH_hook_01..05 |
Hook | 5 | Beat 1 script, direct to camera, confident tone |
TH_problem_01..05 |
Problem | 5 | Beat 2 script, slightly frustrated tone |
TH_solution_01..05 |
Solution | 5 | Beat 3 script, energetic, confident |
TH_benefit_01..03 |
Benefit | 3 | Beat 4 script, matter-of-fact |
TH_example_01..03 |
Example | 3 | Beat 5 script, playful ("drink coffee") |
TH_cta_01..05 |
CTA | 5 | Beat 6 script, clean delivery, hit the URL |
Before you stop recording, record 30 seconds of room tone (total silence, camera rolling). The skill may not need it, but if it does, you'll have it.
File names become the source IDs in takes_packed.md. Use short unique names:
TH_hook_01.mp4
TH_hook_02.mp4
...
TH_cta_05.mp4
Record these separately in a clean OS session. No notifications, no chat apps, no browser tabs cluttered.
Set up a test repo with 2–3 real issues on GitHub you can spawn agents against. This gives you genuine dashboard activity, not mocks.
# In your AO workspace, open the dashboard:
pnpm dev
# Dashboard at http://localhost:3000
| Clip name | Duration | Content |
|---|---|---|
SC_dashboard_wide_01 |
30s | Dashboard with 4–8 active sessions, cards across the kanban columns. Slowly scroll/pan. |
SC_dashboard_wide_02 |
30s | Second take, different state distribution (more in pr_open, ci_failed) |
SC_session_card_zoom |
15s | Close crop of one session card updating its status badge |
SC_spawn_cli_01 |
20s | Terminal. ao spawn --agent claude-code --issue 42. Let it run, show logs. |
SC_spawn_cli_02 |
20s | Same but with a different agent (--agent codex) |
SC_terminal_attach |
25s | Click a session card → terminal view (DirectTerminal.tsx) → agent typing code |
SC_state_transition_01 |
20s | Session card in working → pr_open transition on screen |
SC_state_transition_02 |
20s | Session card in ci_failed → mergeable → merged |
SC_pr_on_github |
15s | Actual GitHub PR page opened by an agent. Scroll through description + commits. |
SC_ci_green |
10s | GitHub Actions checks turning green on the PR |
SC_config_yaml |
15s | VS Code open on agent-orchestrator.yaml. Zoomed. Scroll through plugin slots. |
SC_plugin_tree |
10s | VS Code sidebar showing packages/plugins/ directory, 20+ plugins visible |
Tip: Record 30% more than you think you need. Short B-roll clips (2–4s) are what you'll actually use.
Only if you want a cinematic feel:
| Clip name | Duration | Content |
|---|---|---|
BR_laptop_typing_01 |
15s | Overhead or side shot of hands typing |
BR_coffee_cup_01 |
10s | Coffee cup near keyboard (matches "drink coffee" line) |
BR_desk_wide_01 |
20s | Wide shot of workspace, laptop screen visible but illegible |
Shoot these with shallow depth of field if you can.
These five animations will be built by parallel sub-agents during the edit (Hard Rule 10 — never sequential). You just tell Claude to build them.
| Slot | What it shows | Where it plays | Duration | Tool |
|---|---|---|---|---|
| 1 Parallel counter | "1 agent" → "8 agents" with ease-out-cubic, orange accent | Over HOOK beat end | 3s | PIL |
| 2 Plugin slot grid | 8 plugin cards reveal one-by-one (Runtime/Agent/Workspace/Tracker/SCM/Notifier/Terminal/Lifecycle) | Over BENEFIT beat | 7s | PIL |
| 3 State machine | Manim rendering of spawning → working → pr_open → ci_failed/review → merged |
Over SOLUTION beat mid | 10s | Manim |
| 4 Parallel vs serial timeline | Two bar timelines, one sequential, one 8-parallel. Same total work. | Over SOLUTION beat end | 5s | PIL |
| 5 CTA tail card | Logo, URL, MIT badge | Over CTA beat | 5s | PIL |
Palette (reused from the video-use shipped launch video — matches AO's dev-tool aesthetic):
#0A0A0A near-black#FF5A00 orange#6E6E6E dim gray/System/Library/Fonts/Menlo.ttc index 1)Create this exact structure on disk. Put <videos_dir> anywhere — desktop, Dropbox, wherever.
~/ao-launch-video/
├── TH_hook_01.mp4
├── TH_hook_02.mp4
├── TH_hook_03.mp4
├── ... (all talking head takes)
├── SC_dashboard_wide_01.mov
├── SC_dashboard_wide_02.mov
├── ... (all screen captures)
├── BR_laptop_typing_01.mp4
└── (B-roll, optional)
Flat. No subfolders. The skill puts its output in ~/ao-launch-video/edit/.
Before invoking Claude, verify:
# ffmpeg installed
ffmpeg -version
# yt-dlp installed (optional, for online sources)
yt-dlp --version
# ElevenLabs API key set
cat ~/.agents/skills/video-use/.env
# Should contain: ELEVENLABS_API_KEY=...
# Python deps installed
cd ~/.agents/skills/video-use
pip install -e .
Cost note: transcribing ~40 minutes of raw takes via ElevenLabs Scribe will cost a few dollars. Not free.
cd ~/ao-launch-video
claude
Then paste this prompt:
Use the video-use skill. Edit these raw takes into a launch video for
Agent Orchestrator (AO), an open-source platform for spawning parallel
AI coding agents.
Target: ~75 seconds, 1920×1080 @ 24fps, YouTube/landing-page hero.
Archetype: Tech launch. HOOK → PROBLEM → SOLUTION → BENEFIT → EXAMPLE → CTA.
Sources:
- TH_*.mp4 files are talking-head takes of the script (multiple takes
per beat — pick the best)
- SC_*.mov files are screen captures of the dashboard, CLI, and GitHub
- BR_*.mp4 files are optional B-roll
Grade: warm_cinematic preset.
Subtitles: 2-word UPPERCASE chunks, bold-overlay style. Burn them LAST.
Animations (build 5 in parallel sub-agents, palette: near-black bg,
#FF5A00 accent, dim gray labels, Menlo Bold):
1. PIL counter: "1 agent" → "8 agents", ease-out-cubic, 3s.
Plays at the end of HOOK.
2. PIL plugin grid: 8 cards revealing sequentially (Runtime, Agent,
Workspace, Tracker, SCM, Notifier, Terminal, Lifecycle). 7s.
Plays over BENEFIT.
3. Manim state machine: nodes for spawning, working, pr_open,
ci_failed, review_pending, mergeable, merged. Edges animating
through a happy-path traversal. 10s. Plays mid-SOLUTION.
4. PIL timeline comparison: two horizontal bars — "sequential: 8
tasks back-to-back" vs "parallel: 8 tasks stacked". Same total
work, 1/8 the wall time. 5s. End of SOLUTION.
5. PIL CTA card: "Agent Orchestrator",
"github.com/ComposioHQ/agent-orchestrator", "MIT", orange accent,
Menlo Bold. 5s. Plays over CTA.
Must-preserve: the URL at the end must be fully legible for 2s+.
Pacing: tight on hooks, breathing room between beats, 400–600ms
speaker handoffs.
Inventory the sources, pack the transcripts, propose a strategy in
plain English, and wait for my OK before cutting.
After the first preview, you'll want to iterate. Natural-language feedback works:
The skill re-plans, re-renders, and re-evaluates. Never re-transcribes (cached per source). Persists decisions to project.md so next session picks up where you left off.
video-use Works Under the Hoodraw takes ─► transcribe.py (ElevenLabs Scribe, word-level) ─► transcripts/*.json
│
▼
pack_transcripts.py ─► takes_packed.md
│
▼
LLM reasons + asks + proposes strategy
│
(you confirm the plain-English plan)
│
▼
editor sub-agent ─► edl.json
│
▼
render.py pipeline
│
▼
per-segment extract (grade + 30ms fades baked in)
│
lossless concat ─► base.mp4
│
overlays (PTS-shifted) then subs LAST
│
▼
loudnorm 2-pass (-14 LUFS / -1 dBTP / LRA 11)
│
▼
final.mp4
│
▼
self-eval loop (timeline_view at every cut boundary,
max 3 retries, flag to user if still broken)
The LLM never watches the video — it reads a packed text transcript (takes_packed.md, ~12KB per hour of footage) and only peeks at visuals (timeline_view PNGs) at decision points. Same idea as browser-use giving the LLM a structured DOM instead of a screenshot, applied to video.
-c copy concat, not single-pass filtergraph.setpts=PTS-STARTPTS+T/TB to align frame 0 to window start.<videos_dir>/edit/.~/ao-launch-video/edit/
├── project.md ← memory; appended every session
├── takes_packed.md ← phrase-level transcripts
├── edl.json ← cut decisions
├── transcripts/<name>.json ← cached raw Scribe JSON
├── animations/slot_<id>/ ← per-animation source + render
├── clips_graded/ ← per-segment extracts with grade + fades
├── master.srt ← output-timeline subtitles
├── verify/ ← self-eval frames / timeline PNGs
├── preview.mp4
└── final.mp4