PR #1466 — Session Report

Storage Redesign

A complete overhaul of Agent Orchestrator's storage layer — migrating from hash-based directories to project-scoped JSON metadata, fixing critical data-loss bugs in migration and rollback, and patching lifecycle edge cases discovered along the way.

Branch storage-redesign
Files Changed 84
Diff +4,669 / −1,786
Tests 869 passing
Date 2026-04-22
01 — Overview

What changed

The old storage layout scattered session data across hash-based directories (~/.agent-orchestrator/{12-hex-hash}-{projectId}/) with key=value flat-file metadata and a storageKey indirection in global config. Four competing hash methods made collision resolution fragile.

The new layout uses projects/{projectId}/ with JSON metadata files, a single orchestrator record per project, and status computed from lifecycle state rather than stored as a separate field.

20
Commits
14
Issues Fixed
3
Review Rounds
84
Files Touched
869
Tests Passing
BeforeAfter
~/.agent-orchestrator/{hash}-{name}/sessions/{id} ~/.agent-orchestrator/projects/{name}/sessions/{id}.json
Key=value flat files (status=working\nbranch=main) JSON metadata with typed fields
storageKey in global config, 4 hash algorithms Direct projectId, no hashing
status stored and read independently status derived from lifecycle via deriveLegacyStatus()
Tmux names: {hash}-{prefix}-{num} Tmux names: {prefix}-{num}
02 — Critical Fixes

Migration & rollback safety

Three rounds of code review surfaced critical data-loss risks in the migration path. Each was verified against the actual codebase before fixing.

Critical
Stray worktrees not recursed
Default workspace plugin stores worktrees at ~/.worktrees/{projectId}/{sessionId}/ (two levels deep). moveStrayWorktrees() only scanned top-level entries, treating {projectId} dirs as session names. All nested worktrees were orphaned.
storage-v2.ts → moveStrayWorktrees()
Critical
Rollback destroys post-migration sessions
rmSync(projectDir, { recursive: true }) deleted the entire projects/{id}/ directory during rollback — including sessions created after migration that had no .migrated counterpart.
storage-v2.ts → rollbackStorage()
Critical
Status dual truth
Both a stored status field and lifecycle-derived status existed. readMetadata preferred the stored value, meaning stale statuses overrode the canonical lifecycle state machine.
metadata.ts → readMetadata()
Critical
Orchestrator extraction broke runtime reads
Migration extracted orchestrator sessions to orchestrator.json, but the runtime reads all sessions from sessions/. Orchestrator sessions became invisible after migration.
storage-v2.ts → migrateProject()
High
Worktree path rewrite without existence check
Session metadata worktree paths were blindly rewritten to the new V2 location without verifying the directory was actually moved there. Paths pointed nowhere.
storage-v2.ts → migrateProject()
High
Pre-lifecycle sessions lost status
Sessions created before the lifecycle system had no statePayload. Migration dropped their status field, causing readMetadata to fall through to "unknown".
storage-v2.ts → convertKeyValueToJson()

How rollback safety works now

// Before renaming .migrated dirs back, count sessions that // exist only in projects/{id}/ (created after migration) const postMigrationSessions = countPostMigrationSessions( projectDir, migratedDirs.filter(d => d.projectId === projectId), ); if (postMigrationSessions > 0) { log(`Warning: ${postMigrationSessions} session(s) created after migration`); log(` Skipping deletion. Remove manually after verifying.`); } else { rmSync(projectDir, { recursive: true }); }
03 — Discovered Bug

Restore → instant re-kill

While investigating a live session (ao-90), we discovered that restoring a session whose PR was already merged would immediately re-terminate it. The terminal would disappear from the dashboard but the tmux process stayed alive.

The broken flow

PR merged Session killed User clicks Restore spawning (1 poll) merged ×
restore() kept old lifecycle with pr.state = "merged" → first poll re-detected it → immediate termination

The fix

User clicks Restore Reset PR to "none" working New PR auto-detected
On restore, if pr.state is terminal (merged/closed), reset to "none" with reason "cleared_on_restore"
// Reset terminal PR state so lifecycle manager doesn't // immediately re-terminate the session if (restoredLifecycle.pr.state === "merged" || restoredLifecycle.pr.state === "closed") { restoredLifecycle.pr.state = "none"; restoredLifecycle.pr.reason = "cleared_on_restore"; restoredLifecycle.pr.number = null; restoredLifecycle.pr.url = null; }
04 — Discovered Bug

Restored workers lose permissionless mode

getRestoreCommand() in the Claude Code plugin only added --dangerously-skip-permissions for orchestrator sessions. Worker sessions with permissionless config silently lost the flag on restore, causing agents to stall on permission prompts mid-session.

// Before (broken): only orchestrators got the flag const isOrchestrator = session.metadata?.["role"] === "orchestrator"; if (isOrchestrator && (permissionMode === "permissionless" || ...)) // After (fixed): matches getLaunchCommand behavior if (permissionMode === "permissionless" || permissionMode === "auto-edit") parts.push("--dangerously-skip-permissions");

This was Claude Code-only. Other agent plugins (Codex, Aider, OpenCode) handled permissions correctly in their restore commands.

05 — Review Comments

Everything else we fixed

Review
parseTmuxNameV2 too restrictive
Regex /^[a-zA-Z][a-zA-Z0-9_]*$/ didn't allow hyphens in prefix, but sessionPrefix validation allows [a-zA-Z0-9_-]+. Names like my-app-1 failed to parse.
paths.ts:263
Review
Stale SessionMetadata docs
Comments still described "flat key=value files" format and tmuxName "includes hash". Updated to reflect JSON format and hash-free naming.
types.ts:1618-1636
Review
DELETE returns wrong removedStorageDir
Always returned true even when no directory existed. Now checks existsSync() before deletion.
api/projects/[id]/route.ts:271
Review
registerFlatConfig unsafe prefix
Derived sessionPrefix from raw directory basename without sanitizing. Folder names like my.app produced invalid prefixes. Added same sanitization as config-generator.ts.
cli/commands/start.ts:130
CI
Unused import & stale args
SessionStatus import left unused in metadata.ts. Two call sites in lifecycle-manager.ts still passed the removed previousStatus parameter.
metadata.ts, lifecycle-manager.ts
Review
Archive path alignment
Migration initially wrote archives to projects/{id}/archive/ but runtime uses sessions/archive/. Aligned to sessions/archive/ everywhere.
storage-v2.ts, paths.ts
06 — Commit Log

Full timeline

Every commit on the storage-redesign branch, from the initial implementation through three rounds of review.

ad6c9f4b fix(agent-claude-code): add --dangerously-skip-permissions for all restored sessions plugin
896468a2 fix(core): address PR review comments — parser, docs, delete route, prefix sanitization core
e9d9c762 fix(core): remove stale previousStatus args and unused SessionStatus import core
fc6fd88b fix(core): reset terminal PR state on session restore core
ff61ee97 fix(core): fix stray worktree recursion, rollback data loss, and worktree path rewrite core
b8867941 fix(core): eliminate status dual truth, fix jsonFields whitelist, add rollback dry-run and tests core
3d6f70eb fix(core): fix 3 critical migration issues core
1b404d2f fix(core): remove unused readMetadata import in lifecycle test core
7484c001 fix(core): address storage redesign review findings core
d22f0c6f fix(core): reset lifecycle on restore and keep killed sessions in active metadata core
721a8726 fix(core): update metadata hooks for JSON format and .json extension core
8df05b7a fix(core): address review findings — worktree paths, archive location, recovery log core
1972fa30 fix(cli): auto-register flat local config in ao start cli
cc75471e fix(core): address PR review comments core
07 — For Reviewers

How to review this PR

Quick verification

Migration dry run

Critical files to focus on

FileWhy it matters
metadata.tsAll storage flows through here. JSON serialization, status derivation on read.
storage-v2.ts1065-line migration module. Conversion, rollback, stray worktree handling.
session-manager.ts~20 path/metadata call sites updated. Restore lifecycle reset.
lifecycle-state.tsStatus derivation from lifecycle state+reason. No more previousStatus.
global-config.ts305 lines removed. storageKey system completely stripped.
paths.tsNew V2 path functions alongside deprecated old ones.
types.tsSessionMetadata restructured. CanonicalPRReason extended.

What to test manually

08 — Design Decisions

Why we did it this way

Status is computed, never stored

deriveLegacyStatus(lifecycle) is the single source of truth. writeMetadata still includes status for initial writes (sessions without a lifecycle yet), but readMetadata always prefers lifecycle-derived status when a lifecycle exists.

Migration is one-way with escape hatch

New code reads JSON only. No lazy dual-format reading. The migration command converts old key=value to JSON. --rollback restores .migrated directories but now checks for post-migration sessions before deleting V2 data.

PR state reset on restore, not grace period

We considered a time-based grace period after restore, but it was a band-aid. Instead, terminal PR states are cleared on restore: the old PR is done, and if the agent creates a new one, auto-detect picks it up.

detectingAttempts stays as string

A reviewer flagged this as a bug, but it's intentional. buildTransitionMetadataPatch returns Record<string, string>, and all consumers parse with Number.parseInt(). The string format is consistent across the entire read/write pipeline.