Agent Orchestrator — Launch Video Production Guide

A concrete production plan for creating a 75-second launch video for Agent Orchestrator using the video-use Claude Code skill.

Target deliverable: final.mp4, 1920×1080 @ 24fps, ~75 seconds, social-ready audio (-14 LUFS / -1 dBTP / LRA 11).

Pipeline: record raw takes → drop in folder → invoke Claude → skill transcribes, packs, cuts, grades, animates, subtitles, and self-evaluates.

The Script
Recording Session A — Talking Head
Recording Session B — Screen Captures
Recording Session C — Optional Extras
What the Skill Will Generate
Folder Setup Before Editing
Pre-flight Checks
Invoke the Edit
Iteration Loop
Critical Don'ts
How video-use Works Under the Hood

Part 1 — The Script (75s target)

Write this on paper first. Memorize roughly. Don't read verbatim — speak naturally. The skill will cut the umms and false starts at word-boundary precision, so don't worry about perfection.

Beat 1 — HOOK (0:00–0:08, ~8s)

"You've been running one coding agent at a time. Claude Code, Codex, Aider — one window, one task, one wait. You're leaving ten-x on the table."

Beat 2 — PROBLEM (0:08–0:22, ~14s)

"Every time CI fails, you context-switch back. Every time review comments land, you context-switch back. Your agent is blocked, and you're the bottleneck. That's not how this scales."

Beat 3 — SOLUTION (0:22–0:40, ~18s)

"Agent Orchestrator spawns any coding agent — Claude Code, Codex, Aider, OpenCode — in its own git worktree, with its own PR, running in parallel. Eight agents at once. One dashboard. They fix CI failures themselves. They address review comments themselves. They manage their own PRs."

Beat 4 — BENEFIT (0:40–0:52, ~12s)

"Every abstraction is a plugin slot. Runtime, agent, workspace, tracker, SCM, notifier — swap any of them. tmux or process. GitHub or Linear or GitLab. Desktop notifications or Slack. It's your stack."

Beat 5 — EXAMPLE (0:52–1:08, ~16s)

"One config file. ao spawn. Watch eight agents ship eight PRs while you drink coffee. CI is green by the time you're back."

Beat 6 — CTA (1:08–1:15, ~7s)

"Agent Orchestrator. MIT licensed. github.com/ComposioHQ/agent-orchestrator. Ship more in parallel."

Total: ~75s. Record 3–5 takes per beat. Short beats = more takes (they're cheap). Long beats = 3 takes is fine.

Part 2 — Recording Session A: Talking Head

Gear checklist

Camera: phone on tripod is fine (iPhone 14+ or equivalent). 4K@24fps if possible.
Mic: lav clipped to shirt, OR shotgun ~60cm overhead out of frame. Avoid built-in camera mic.
Headphones: during recording so you catch your own pops/sibilance.
Lighting: one key light 45° to your left, one fill light opposite or a white wall bounce. No overhead kitchen light (ugly shadows under eyes).

Room setup

Background 2m behind you, slightly out of focus.
Phone/computer away from you (no fan noise, no notifications).
Close windows (traffic, birds).
Test record 10s, play back in headphones. Listen for hum, hiss, echo. Fix before proceeding.

Camera settings

Resolution: 3840×2160 (4K) or 1920×1080
Frame rate: 24fps (cinematic) — match render.py's -r 24 output
Shutter: 1/48 for 24fps (180° shutter rule)
ISO: lowest that gets clean exposure
White balance: manually set (not auto — it drifts between takes)
Focus: manual, locked on your eyes
Framing: upper third, eyes on the top-third line, 30% headroom. Centered OR rule-of-thirds.

Audio settings

Sample rate: 48kHz
Bit depth: 24-bit
Level: peaks at −12dB, average around −18dB. Never clip.
Monitor with headphones while recording.

Shot list — Talking Head

Shoot 3–5 takes of each beat. Slate each take ("Beat 1, take 1" — clap hands on camera for a sync point).

Clip name	Beat	# takes	What to capture
`TH_hook_01`..`05`	Hook	5	Beat 1 script, direct to camera, confident tone
`TH_problem_01`..`05`	Problem	5	Beat 2 script, slightly frustrated tone
`TH_solution_01`..`05`	Solution	5	Beat 3 script, energetic, confident
`TH_benefit_01`..`03`	Benefit	3	Beat 4 script, matter-of-fact
`TH_example_01`..`03`	Example	3	Beat 5 script, playful ("drink coffee")
`TH_cta_01`..`05`	CTA	5	Beat 6 script, clean delivery, hit the URL

Before you stop recording, record 30 seconds of room tone (total silence, camera rolling). The skill may not need it, but if it does, you'll have it.

Naming convention

File names become the source IDs in takes_packed.md. Use short unique names:

TH_hook_01.mp4
TH_hook_02.mp4
...
TH_cta_05.mp4

Part 3 — Recording Session B: Screen Captures

Record these separately in a clean OS session. No notifications, no chat apps, no browser tabs cluttered.

Setup

Tool: macOS screen recorder (Cmd+Shift+5) OR OBS at 60fps for smooth motion
Resolution: record at native display (2560×1600, 2880×1800, whatever your Mac is) — we downscale to 1080p
Framerate: 60fps if available (enables clean 24fps or slow-mo in the edit)
Cursor: clean, no highlighting software unless you want a subtle ring
Audio: NO audio on these — we use talking-head audio throughout
Clean the UI: close every extra tab, clear notifications, empty the dock if possible

Preparation

Set up a test repo with 2–3 real issues on GitHub you can spawn agents against. This gives you genuine dashboard activity, not mocks.

# In your AO workspace, open the dashboard:
pnpm dev
# Dashboard at http://localhost:3000

Shot list — Screen Captures

Clip name	Duration	Content
`SC_dashboard_wide_01`	30s	Dashboard with 4–8 active sessions, cards across the kanban columns. Slowly scroll/pan.
`SC_dashboard_wide_02`	30s	Second take, different state distribution (more in `pr_open`, `ci_failed`)
`SC_session_card_zoom`	15s	Close crop of one session card updating its status badge
`SC_spawn_cli_01`	20s	Terminal. `ao spawn --agent claude-code --issue 42`. Let it run, show logs.
`SC_spawn_cli_02`	20s	Same but with a different agent (`--agent codex`)
`SC_terminal_attach`	25s	Click a session card → terminal view (`DirectTerminal.tsx`) → agent typing code
`SC_state_transition_01`	20s	Session card in `working` → `pr_open` transition on screen
`SC_state_transition_02`	20s	Session card in `ci_failed` → `mergeable` → `merged`
`SC_pr_on_github`	15s	Actual GitHub PR page opened by an agent. Scroll through description + commits.
`SC_ci_green`	10s	GitHub Actions checks turning green on the PR
`SC_config_yaml`	15s	VS Code open on `agent-orchestrator.yaml`. Zoomed. Scroll through plugin slots.
`SC_plugin_tree`	10s	VS Code sidebar showing `packages/plugins/` directory, 20+ plugins visible

Tip: Record 30% more than you think you need. Short B-roll clips (2–4s) are what you'll actually use.

Part 4 — Recording Session C: Optional Extras

Only if you want a cinematic feel:

Clip name	Duration	Content
`BR_laptop_typing_01`	15s	Overhead or side shot of hands typing
`BR_coffee_cup_01`	10s	Coffee cup near keyboard (matches "drink coffee" line)
`BR_desk_wide_01`	20s	Wide shot of workspace, laptop screen visible but illegible

Shoot these with shallow depth of field if you can.

Part 5 — What the Skill Will Generate (you don't record these)

These five animations will be built by parallel sub-agents during the edit (Hard Rule 10 — never sequential). You just tell Claude to build them.

Slot	What it shows	Where it plays	Duration	Tool
1 Parallel counter	"1 agent" → "8 agents" with ease-out-cubic, orange accent	Over HOOK beat end	3s	PIL
2 Plugin slot grid	8 plugin cards reveal one-by-one (Runtime/Agent/Workspace/Tracker/SCM/Notifier/Terminal/Lifecycle)	Over BENEFIT beat	7s	PIL
3 State machine	Manim rendering of `spawning → working → pr_open → ci_failed/review → merged`	Over SOLUTION beat mid	10s	Manim
4 Parallel vs serial timeline	Two bar timelines, one sequential, one 8-parallel. Same total work.	Over SOLUTION beat end	5s	PIL
5 CTA tail card	Logo, URL, MIT badge	Over CTA beat	5s	PIL

Palette (reused from the video-use shipped launch video — matches AO's dev-tool aesthetic):

Background #0A0A0A near-black
Accent #FF5A00 orange
Labels #6E6E6E dim gray
Font: Menlo Bold (/System/Library/Fonts/Menlo.ttc index 1)

Part 6 — Folder Setup Before Editing

Create this exact structure on disk. Put <videos_dir> anywhere — desktop, Dropbox, wherever.

~/ao-launch-video/
├── TH_hook_01.mp4
├── TH_hook_02.mp4
├── TH_hook_03.mp4
├── ... (all talking head takes)
├── SC_dashboard_wide_01.mov
├── SC_dashboard_wide_02.mov
├── ... (all screen captures)
├── BR_laptop_typing_01.mp4
└── (B-roll, optional)

Flat. No subfolders. The skill puts its output in ~/ao-launch-video/edit/.

Part 7 — Pre-flight Checks

Before invoking Claude, verify:

# ffmpeg installed
ffmpeg -version

# yt-dlp installed (optional, for online sources)
yt-dlp --version

# ElevenLabs API key set
cat ~/.agents/skills/video-use/.env
# Should contain: ELEVENLABS_API_KEY=...

# Python deps installed
cd ~/.agents/skills/video-use
pip install -e .

Cost note: transcribing ~40 minutes of raw takes via ElevenLabs Scribe will cost a few dollars. Not free.

Part 8 — Invoke the Edit

cd ~/ao-launch-video
claude

Then paste this prompt:

Use the video-use skill. Edit these raw takes into a launch video for
Agent Orchestrator (AO), an open-source platform for spawning parallel
AI coding agents.

Target: ~75 seconds, 1920×1080 @ 24fps, YouTube/landing-page hero.

Archetype: Tech launch. HOOK → PROBLEM → SOLUTION → BENEFIT → EXAMPLE → CTA.

Sources:
- TH_*.mp4 files are talking-head takes of the script (multiple takes
  per beat — pick the best)
- SC_*.mov files are screen captures of the dashboard, CLI, and GitHub
- BR_*.mp4 files are optional B-roll

Grade: warm_cinematic preset.

Subtitles: 2-word UPPERCASE chunks, bold-overlay style. Burn them LAST.

Animations (build 5 in parallel sub-agents, palette: near-black bg,
#FF5A00 accent, dim gray labels, Menlo Bold):

  1. PIL counter: "1 agent" → "8 agents", ease-out-cubic, 3s.
     Plays at the end of HOOK.

  2. PIL plugin grid: 8 cards revealing sequentially (Runtime, Agent,
     Workspace, Tracker, SCM, Notifier, Terminal, Lifecycle). 7s.
     Plays over BENEFIT.

  3. Manim state machine: nodes for spawning, working, pr_open,
     ci_failed, review_pending, mergeable, merged. Edges animating
     through a happy-path traversal. 10s. Plays mid-SOLUTION.

  4. PIL timeline comparison: two horizontal bars — "sequential: 8
     tasks back-to-back" vs "parallel: 8 tasks stacked". Same total
     work, 1/8 the wall time. 5s. End of SOLUTION.

  5. PIL CTA card: "Agent Orchestrator",
     "github.com/ComposioHQ/agent-orchestrator", "MIT", orange accent,
     Menlo Bold. 5s. Plays over CTA.

Must-preserve: the URL at the end must be fully legible for 2s+.

Pacing: tight on hooks, breathing room between beats, 400–600ms
speaker handoffs.

Inventory the sources, pack the transcripts, propose a strategy in
plain English, and wait for my OK before cutting.

Part 9 — The Iteration Loop

After the first preview, you'll want to iterate. Natural-language feedback works:

"Hook feels slow — tighten by 2 seconds."
"Use Take 3 for the SOLUTION beat, not Take 1 — more energy."
"Make the state-machine animation 12s, not 10s, it's hard to follow."
"The plugin grid reveals too fast — 1.2s per card instead of 0.8s."
"Swap in the coffee cup B-roll over 'drink coffee'."
"Subtitles are covering the plugin grid — lift them."

The skill re-plans, re-renders, and re-evaluates. Never re-transcribes (cached per source). Persists decisions to project.md so next session picks up where you left off.

Part 10 — Critical Don'ts

Don't change frame rate mid-session. Pick 24 or 30, stick with it.
Don't use the built-in camera mic. You'll regret it.
Don't record in a room with a fridge/AC running — the hum destroys the transcript.
Don't skip the 10s of room tone at the end.
Don't edit footage yourself before feeding it to the skill. Hand over raw takes.
Don't re-record just because you stumbled — that's what the skill is for.
Don't try to match takes visually before editing. Consistent lighting + background does all the work.

Appendix A — How `video-use` Works Under the Hood

raw takes ─► transcribe.py (ElevenLabs Scribe, word-level) ─► transcripts/*.json
                                                                   │
                                                                   ▼
                                       pack_transcripts.py ─► takes_packed.md
                                                                   │
                                                                   ▼
                                              LLM reasons + asks + proposes strategy
                                                                   │
                                                  (you confirm the plain-English plan)
                                                                   │
                                                                   ▼
                                                  editor sub-agent ─► edl.json
                                                                   │
                                                                   ▼
                                                         render.py pipeline
                                                                   │
                                                                   ▼
                                   per-segment extract (grade + 30ms fades baked in)
                                                                   │
                                                         lossless concat ─► base.mp4
                                                                   │
                                                     overlays (PTS-shifted) then subs LAST
                                                                   │
                                                                   ▼
                                         loudnorm 2-pass (-14 LUFS / -1 dBTP / LRA 11)
                                                                   │
                                                                   ▼
                                                            final.mp4
                                                                   │
                                                                   ▼
                                 self-eval loop (timeline_view at every cut boundary,
                                 max 3 retries, flag to user if still broken)

Key principle

The LLM never watches the video — it reads a packed text transcript (takes_packed.md, ~12KB per hour of footage) and only peeks at visuals (timeline_view PNGs) at decision points. Same idea as browser-use giving the LLM a structured DOM instead of a screenshot, applied to video.

Hard rules (non-negotiable, from SKILL.md)

Subtitles are applied LAST in the filter chain, after every overlay.
Per-segment extract → lossless -c copy concat, not single-pass filtergraph.
30ms audio fades at every segment boundary.
Overlays use setpts=PTS-STARTPTS+T/TB to align frame 0 to window start.
Master SRT uses output-timeline offsets.
Never cut inside a word.
Pad every cut edge (30–200ms working window).
Word-level verbatim ASR only, never SRT/phrase mode.
Cache transcripts per source.
Parallel sub-agents for multiple animations, never sequential.
Strategy confirmation before execution.
All session outputs in <videos_dir>/edit/.

Output directory structure

~/ao-launch-video/edit/
├── project.md               ← memory; appended every session
├── takes_packed.md          ← phrase-level transcripts
├── edl.json                 ← cut decisions
├── transcripts/<name>.json  ← cached raw Scribe JSON
├── animations/slot_<id>/    ← per-animation source + render
├── clips_graded/            ← per-segment extracts with grade + fades
├── master.srt               ← output-timeline subtitles
├── verify/                  ← self-eval frames / timeline PNGs
├── preview.mp4
└── final.mp4