Agent Orchestrator — Launch Video Production Guide

A concrete production plan for creating a 75-second launch video for Agent Orchestrator using the video-use Claude Code skill.

Target deliverable: final.mp4, 1920×1080 @ 24fps, ~75 seconds, social-ready audio (-14 LUFS / -1 dBTP / LRA 11).

Pipeline: record raw takes → drop in folder → invoke Claude → skill transcribes, packs, cuts, grades, animates, subtitles, and self-evaluates.


Table of Contents

  1. The Script
  2. Recording Session A — Talking Head
  3. Recording Session B — Screen Captures
  4. Recording Session C — Optional Extras
  5. What the Skill Will Generate
  6. Folder Setup Before Editing
  7. Pre-flight Checks
  8. Invoke the Edit
  9. Iteration Loop
  10. Critical Don'ts
  11. How video-use Works Under the Hood

Part 1 — The Script (75s target)

Write this on paper first. Memorize roughly. Don't read verbatim — speak naturally. The skill will cut the umms and false starts at word-boundary precision, so don't worry about perfection.

Beat 1 — HOOK (0:00–0:08, ~8s)

"You've been running one coding agent at a time. Claude Code, Codex, Aider — one window, one task, one wait. You're leaving ten-x on the table."

Beat 2 — PROBLEM (0:08–0:22, ~14s)

"Every time CI fails, you context-switch back. Every time review comments land, you context-switch back. Your agent is blocked, and you're the bottleneck. That's not how this scales."

Beat 3 — SOLUTION (0:22–0:40, ~18s)

"Agent Orchestrator spawns any coding agent — Claude Code, Codex, Aider, OpenCode — in its own git worktree, with its own PR, running in parallel. Eight agents at once. One dashboard. They fix CI failures themselves. They address review comments themselves. They manage their own PRs."

Beat 4 — BENEFIT (0:40–0:52, ~12s)

"Every abstraction is a plugin slot. Runtime, agent, workspace, tracker, SCM, notifier — swap any of them. tmux or process. GitHub or Linear or GitLab. Desktop notifications or Slack. It's your stack."

Beat 5 — EXAMPLE (0:52–1:08, ~16s)

"One config file. ao spawn. Watch eight agents ship eight PRs while you drink coffee. CI is green by the time you're back."

Beat 6 — CTA (1:08–1:15, ~7s)

"Agent Orchestrator. MIT licensed. github.com/ComposioHQ/agent-orchestrator. Ship more in parallel."

Total: ~75s. Record 3–5 takes per beat. Short beats = more takes (they're cheap). Long beats = 3 takes is fine.


Part 2 — Recording Session A: Talking Head

Gear checklist

Room setup

Camera settings

Audio settings

Shot list — Talking Head

Shoot 3–5 takes of each beat. Slate each take ("Beat 1, take 1" — clap hands on camera for a sync point).

Clip name Beat # takes What to capture
TH_hook_01..05 Hook 5 Beat 1 script, direct to camera, confident tone
TH_problem_01..05 Problem 5 Beat 2 script, slightly frustrated tone
TH_solution_01..05 Solution 5 Beat 3 script, energetic, confident
TH_benefit_01..03 Benefit 3 Beat 4 script, matter-of-fact
TH_example_01..03 Example 3 Beat 5 script, playful ("drink coffee")
TH_cta_01..05 CTA 5 Beat 6 script, clean delivery, hit the URL

Before you stop recording, record 30 seconds of room tone (total silence, camera rolling). The skill may not need it, but if it does, you'll have it.

Naming convention

File names become the source IDs in takes_packed.md. Use short unique names:

TH_hook_01.mp4
TH_hook_02.mp4
...
TH_cta_05.mp4

Part 3 — Recording Session B: Screen Captures

Record these separately in a clean OS session. No notifications, no chat apps, no browser tabs cluttered.

Setup

Preparation

Set up a test repo with 2–3 real issues on GitHub you can spawn agents against. This gives you genuine dashboard activity, not mocks.

# In your AO workspace, open the dashboard:
pnpm dev
# Dashboard at http://localhost:3000

Shot list — Screen Captures

Clip name Duration Content
SC_dashboard_wide_01 30s Dashboard with 4–8 active sessions, cards across the kanban columns. Slowly scroll/pan.
SC_dashboard_wide_02 30s Second take, different state distribution (more in pr_open, ci_failed)
SC_session_card_zoom 15s Close crop of one session card updating its status badge
SC_spawn_cli_01 20s Terminal. ao spawn --agent claude-code --issue 42. Let it run, show logs.
SC_spawn_cli_02 20s Same but with a different agent (--agent codex)
SC_terminal_attach 25s Click a session card → terminal view (DirectTerminal.tsx) → agent typing code
SC_state_transition_01 20s Session card in workingpr_open transition on screen
SC_state_transition_02 20s Session card in ci_failedmergeablemerged
SC_pr_on_github 15s Actual GitHub PR page opened by an agent. Scroll through description + commits.
SC_ci_green 10s GitHub Actions checks turning green on the PR
SC_config_yaml 15s VS Code open on agent-orchestrator.yaml. Zoomed. Scroll through plugin slots.
SC_plugin_tree 10s VS Code sidebar showing packages/plugins/ directory, 20+ plugins visible

Tip: Record 30% more than you think you need. Short B-roll clips (2–4s) are what you'll actually use.


Part 4 — Recording Session C: Optional Extras

Only if you want a cinematic feel:

Clip name Duration Content
BR_laptop_typing_01 15s Overhead or side shot of hands typing
BR_coffee_cup_01 10s Coffee cup near keyboard (matches "drink coffee" line)
BR_desk_wide_01 20s Wide shot of workspace, laptop screen visible but illegible

Shoot these with shallow depth of field if you can.


Part 5 — What the Skill Will Generate (you don't record these)

These five animations will be built by parallel sub-agents during the edit (Hard Rule 10 — never sequential). You just tell Claude to build them.

Slot What it shows Where it plays Duration Tool
1 Parallel counter "1 agent" → "8 agents" with ease-out-cubic, orange accent Over HOOK beat end 3s PIL
2 Plugin slot grid 8 plugin cards reveal one-by-one (Runtime/Agent/Workspace/Tracker/SCM/Notifier/Terminal/Lifecycle) Over BENEFIT beat 7s PIL
3 State machine Manim rendering of spawning → working → pr_open → ci_failed/review → merged Over SOLUTION beat mid 10s Manim
4 Parallel vs serial timeline Two bar timelines, one sequential, one 8-parallel. Same total work. Over SOLUTION beat end 5s PIL
5 CTA tail card Logo, URL, MIT badge Over CTA beat 5s PIL

Palette (reused from the video-use shipped launch video — matches AO's dev-tool aesthetic):


Part 6 — Folder Setup Before Editing

Create this exact structure on disk. Put <videos_dir> anywhere — desktop, Dropbox, wherever.

~/ao-launch-video/
├── TH_hook_01.mp4
├── TH_hook_02.mp4
├── TH_hook_03.mp4
├── ... (all talking head takes)
├── SC_dashboard_wide_01.mov
├── SC_dashboard_wide_02.mov
├── ... (all screen captures)
├── BR_laptop_typing_01.mp4
└── (B-roll, optional)

Flat. No subfolders. The skill puts its output in ~/ao-launch-video/edit/.


Part 7 — Pre-flight Checks

Before invoking Claude, verify:

# ffmpeg installed
ffmpeg -version

# yt-dlp installed (optional, for online sources)
yt-dlp --version

# ElevenLabs API key set
cat ~/.agents/skills/video-use/.env
# Should contain: ELEVENLABS_API_KEY=...

# Python deps installed
cd ~/.agents/skills/video-use
pip install -e .

Cost note: transcribing ~40 minutes of raw takes via ElevenLabs Scribe will cost a few dollars. Not free.


Part 8 — Invoke the Edit

cd ~/ao-launch-video
claude

Then paste this prompt:

Use the video-use skill. Edit these raw takes into a launch video for
Agent Orchestrator (AO), an open-source platform for spawning parallel
AI coding agents.

Target: ~75 seconds, 1920×1080 @ 24fps, YouTube/landing-page hero.

Archetype: Tech launch. HOOK → PROBLEM → SOLUTION → BENEFIT → EXAMPLE → CTA.

Sources:
- TH_*.mp4 files are talking-head takes of the script (multiple takes
  per beat — pick the best)
- SC_*.mov files are screen captures of the dashboard, CLI, and GitHub
- BR_*.mp4 files are optional B-roll

Grade: warm_cinematic preset.

Subtitles: 2-word UPPERCASE chunks, bold-overlay style. Burn them LAST.

Animations (build 5 in parallel sub-agents, palette: near-black bg,
#FF5A00 accent, dim gray labels, Menlo Bold):

  1. PIL counter: "1 agent" → "8 agents", ease-out-cubic, 3s.
     Plays at the end of HOOK.

  2. PIL plugin grid: 8 cards revealing sequentially (Runtime, Agent,
     Workspace, Tracker, SCM, Notifier, Terminal, Lifecycle). 7s.
     Plays over BENEFIT.

  3. Manim state machine: nodes for spawning, working, pr_open,
     ci_failed, review_pending, mergeable, merged. Edges animating
     through a happy-path traversal. 10s. Plays mid-SOLUTION.

  4. PIL timeline comparison: two horizontal bars — "sequential: 8
     tasks back-to-back" vs "parallel: 8 tasks stacked". Same total
     work, 1/8 the wall time. 5s. End of SOLUTION.

  5. PIL CTA card: "Agent Orchestrator",
     "github.com/ComposioHQ/agent-orchestrator", "MIT", orange accent,
     Menlo Bold. 5s. Plays over CTA.

Must-preserve: the URL at the end must be fully legible for 2s+.

Pacing: tight on hooks, breathing room between beats, 400–600ms
speaker handoffs.

Inventory the sources, pack the transcripts, propose a strategy in
plain English, and wait for my OK before cutting.

Part 9 — The Iteration Loop

After the first preview, you'll want to iterate. Natural-language feedback works:

The skill re-plans, re-renders, and re-evaluates. Never re-transcribes (cached per source). Persists decisions to project.md so next session picks up where you left off.


Part 10 — Critical Don'ts


Appendix A — How video-use Works Under the Hood

raw takes ─► transcribe.py (ElevenLabs Scribe, word-level) ─► transcripts/*.json
                                                                   │
                                                                   ▼
                                       pack_transcripts.py ─► takes_packed.md
                                                                   │
                                                                   ▼
                                              LLM reasons + asks + proposes strategy
                                                                   │
                                                  (you confirm the plain-English plan)
                                                                   │
                                                                   ▼
                                                  editor sub-agent ─► edl.json
                                                                   │
                                                                   ▼
                                                         render.py pipeline
                                                                   │
                                                                   ▼
                                   per-segment extract (grade + 30ms fades baked in)
                                                                   │
                                                         lossless concat ─► base.mp4
                                                                   │
                                                     overlays (PTS-shifted) then subs LAST
                                                                   │
                                                                   ▼
                                         loudnorm 2-pass (-14 LUFS / -1 dBTP / LRA 11)
                                                                   │
                                                                   ▼
                                                            final.mp4
                                                                   │
                                                                   ▼
                                 self-eval loop (timeline_view at every cut boundary,
                                 max 3 retries, flag to user if still broken)

Key principle

The LLM never watches the video — it reads a packed text transcript (takes_packed.md, ~12KB per hour of footage) and only peeks at visuals (timeline_view PNGs) at decision points. Same idea as browser-use giving the LLM a structured DOM instead of a screenshot, applied to video.

Hard rules (non-negotiable, from SKILL.md)

  1. Subtitles are applied LAST in the filter chain, after every overlay.
  2. Per-segment extract → lossless -c copy concat, not single-pass filtergraph.
  3. 30ms audio fades at every segment boundary.
  4. Overlays use setpts=PTS-STARTPTS+T/TB to align frame 0 to window start.
  5. Master SRT uses output-timeline offsets.
  6. Never cut inside a word.
  7. Pad every cut edge (30–200ms working window).
  8. Word-level verbatim ASR only, never SRT/phrase mode.
  9. Cache transcripts per source.
  10. Parallel sub-agents for multiple animations, never sequential.
  11. Strategy confirmation before execution.
  12. All session outputs in <videos_dir>/edit/.

Output directory structure

~/ao-launch-video/edit/
├── project.md               ← memory; appended every session
├── takes_packed.md          ← phrase-level transcripts
├── edl.json                 ← cut decisions
├── transcripts/<name>.json  ← cached raw Scribe JSON
├── animations/slot_<id>/    ← per-animation source + render
├── clips_graded/            ← per-segment extracts with grade + fades
├── master.srt               ← output-timeline subtitles
├── verify/                  ← self-eval frames / timeline PNGs
├── preview.mp4
└── final.mp4