S
PR
R

TRINITY

AO Canonical Lifecycle Architecture v2

WHY THIS MATTERS +11,564 / -1,415 lines · 109 files · 40% tests
Three-axis lifecycle Session, PR, and runtime are now independent truth domains
No more silent failures Probe errors → detecting state, not silent masking
Agent self-reporting ao acknowledge + ao report commands
Zero-migration compat v1 sessions synthesized automatically, no data changes
How to Review Risk Assessment Architecture
00

How to Review This PR

#

This is a +11,564 / -1,415 line diff touching 109 files. Here's the reading order that makes it manageable.

1

Start: packages/core/src/types.ts +142

The contract. Defines CanonicalSessionLifecycle, all states/reasons for the three axes, and ActivitySignal. Everything else implements this.

5 min · foundation
2

Then: packages/core/src/lifecycle-state.ts +480 new

The migration story. parseCanonicalLifecycle() handles v1→v2 synthesis. deriveLegacyStatus() handles v2→v1. This is backward compat in one file.

10 min · critical path
3

Then: packages/core/src/lifecycle-status-decisions.ts +398 new

Pure decision functions. resolveProbeDecision() handles signal disagreement. createDetectingDecision() handles retry budgets. All testable in isolation.

10 min · pure logic
4

Then: packages/core/src/lifecycle-manager.ts +694 / -156

The biggest single change. Focus on determineStatus() which now uses a commit() closure. Skip imports — the meat is in the polling loop changes.

15 min · behavioral core
5

Then: packages/core/src/agent-report.ts +606 new

Agent self-reporting (ao acknowledge, ao report). Review if you care about agents declaring their phase explicitly. New CLI commands too.

8 min · new feature
6

Web: packages/web/src/components/SessionDetail.tsx +102 / -789

Massive refactor. Extracted into SessionTruthPanel, SessionReportAuditPanel, SessionDetailPRCard. Net deletion — that's good.

10 min · UI changes

Skip: Mechanical changes (~70 files)

Plugin tests, changesets, helper updates. All add lifecycle and activitySignal to satisfy the type checker. Trust CI here.

~70 files · CI-verified

Reviewer Checklist

01

Risk Assessment

#

📊 Why 11,000+ Lines?

Breakdown by category
38%
34%
12%
7%
Tests +4,377 (43 files)
Core +3,894 (22 files)
Web +1,330 (14 files)
Docs +801 (2 files)
Other +349 (19 files)

The "scary" number is actually healthy. ~40% is test coverage. The core implementation is 6 new modules (additive, not rewrites). Deletions are minimal (-1,415) — this is net new behavior, not risky rewrites. The web changes are dashboard updates to surface the new state model.

Top 10 files by additions
lifecycle-manager.ts+694
lifecycle-manager.test.ts+680
agent-report.ts+606
agent-report.test.ts+566
recovery-validator.test.ts+520
lifecycle-state.ts+480
lifecycle-transition.test.ts+398
lifecycle-status-decisions.ts+398
SessionDetailPRCard.tsx+349
lifecycle-transition.ts+302
HIGH IMPACT Blast Radius

Every session load path is affected.

The Session interface now requires lifecycle and activitySignal. Every call to sessionFromMetadata(), refreshSession(), or direct Session construction must provide these.

Files touched: session-manager.ts, lifecycle-manager.ts, all plugin tests

MEDIUM IMPACT Behavioral Change

Probe failures are no longer silent.

Old behavior: isAlive() errors were caught and returned true. Sessions with probe failures appeared "alive."

New behavior: Probe failures enter detecting state. After 3 attempts or 5 minutes, they escalate to stuck.

Impact: Sessions that were silently "working" may now show as "detecting" or "stuck."

LOW IMPACT New Events

New pr.closed event added.

When a PR transitions from "open" to "closed", a new event fires. The default reaction is notify.

Impact: Users will see new notifications for closed PRs. No action required.

🔄 Rollback Plan

If metadata corruption:

The system always writes both v1 flat keys AND v2 statePayload. Rolling back to pre-Trinity code will ignore statePayload and read the flat keys. No data loss.

If detecting state causes issues:

The detecting → stuck escalation can be effectively disabled by setting DETECTING_MAX_ATTEMPTS to 999. This is a code change, not a config change.

If dashboard breaks:

Web components have fallbacks for missing lifecycle data. If session.lifecycle is undefined, they fall back to session.status.

✅ What Won't Break

  • Existing sessions: v1 metadata is synthesized into v2 lifecycle automatically
  • Existing reactions: deriveLegacyStatus() maps new lifecycle to old status strings
  • Existing integrations: The API still returns status field alongside new lifecycle
  • Spawn/kill/restore: All tested, all work with new lifecycle writes
02

Problem Statement

#

AO's session lifecycle was built on a single SessionStatus enum that conflated three distinct concerns: what the agent session is doing, what state the PR is in, and whether the runtime process is alive.

A session could be ci_failed — but that tells you the PR state, not what the agent is doing (probably fixing CI). A session could be stuck — but that doesn't tell you why (probe failed? agent idle? process dead?).

Before
  • Single status string
  • No persisted reason
  • No signal confidence
  • Probe failures silently masked
  • No agent self-reporting
After
  • Three independent axes
  • Explicit state + reason + timestamps
  • Confidence-bearing ActivitySignal
  • New detecting state
  • ao acknowledge + ao report
03

The Three-Axis Canonical Lifecycle

#

The redesign introduces a CanonicalSessionLifecycle object with three independent truth domains:

Session

What the agent is doing

Tracks workflow state independently of PR or runtime. Includes kind (worker/orchestrator), state, reason, and lifecycle timestamps.

not_started working idle needs_input stuck detecting done terminated

PR

Pull request lifecycle

Tracks PR state from GitHub/GitLab. Includes state, reason (ci_failing, approved, etc.), PR number, URL, and observation timestamp.

none open merged closed

Runtime

Process liveness

Tracks what is known about the agent process. Includes state, reason, observation timestamp, RuntimeHandle, and tmux session name.

unknown alive exited missing probe_failed
CanonicalSessionLifecycle
CanonicalSessionLifecycle (version: 2)
├── session          // What the agent session is doing
│   ├── kind         // "worker" | "orchestrator"
│   ├── state        // "not_started" | "working" | "idle" | "needs_input" | "stuck" | "detecting" | "done" | "terminated"
│   ├── reason       // e.g. "task_in_progress", "fixing_ci", "probe_failure"
│   ├── startedAt    // ISO timestamp
│   ├── completedAt  // ISO timestamp
│   ├── terminatedAt // ISO timestamp
│   └── lastTransitionAt // ISO timestamp
├── pr               // What state the pull request is in
│   ├── state        // "none" | "open" | "merged" | "closed"
│   ├── reason       // e.g. "in_progress", "ci_failing", "approved", "merge_ready"
│   ├── number       // PR number
│   ├── url          // PR URL
│   └── lastObservedAt // ISO timestamp
└── runtime          // What is known about the process
    ├── state        // "unknown" | "alive" | "exited" | "missing" | "probe_failed"
    ├── reason       // e.g. "process_running", "tmux_missing", "probe_error"
    ├── lastObservedAt // ISO timestamp
    ├── handle       // RuntimeHandle object
    └── tmuxName     // tmux session name

Session State Transitions

→ spawn
not_started
↓ agent starts
done
← completed
working
→ blocked
needs_input
waiting
probe fail
idle timeout
idle
detecting
stuck
detecting → stuck (3 attempts / 5 min budget)
terminated
← killed
Active Attention Error Terminal
04

New Files & Responsibilities

#
lifecycle-state.ts 333 lines

The core of the migration path. Handles parsing, synthesis, and persistence of the canonical lifecycle.

  • createInitialCanonicalLifecycle(kind, now) — Creates a fresh v2 lifecycle for new sessions
  • parseCanonicalLifecycle(meta, options) — Reads metadata and returns a CanonicalSessionLifecycle
  • synthesizeCanonicalLifecycle(meta, options) — Bridges v1 → v2 by reverse-engineering from flat keys
  • deriveLegacyStatus(lifecycle, fallback?) — Maps three-axis lifecycle back to flat SessionStatus

This is the entire migration story. Any session, whether created before or after this PR, gets a valid CanonicalSessionLifecycle when loaded. No data migration step required. No breaking change to existing metadata files.

lifecycle-status-decisions.ts 388 lines

Pure decision logic extracted into testable functions with explicit inputs and outputs.

  • DETECTING_MAX_ATTEMPTS (3) and DETECTING_MAX_DURATION_MS (5 minutes)
  • hashEvidence(evidence) — SHA-256 hash for change detection
  • createDetectingDecision(input) — Stay in detecting or escalate to stuck
  • resolveProbeDecision(input) — Handle runtime/process disagreement
lifecycle-transition.ts 312 lines

Centralized transition boundary defining the interface for all lifecycle mutations.

  • TransitionSource — "poll", "agent_report", "spawn", "restore", "kill", etc.
  • TransitionResult — Captures before/after lifecycle and status
  • applyDecisionToLifecycle(lifecycle, decision, nowIso) — Mutates lifecycle in place
activity-signal.ts 125 lines

Activity confidence model wrapping activity state with metadata.

  • Signal states: valid, stale, null, unavailable, probe_failure
  • Freshness windows: Strong (<60s), Weak (<5min), Stale (>5min)
  • Signal sources: native, terminal, runtime, none
agent-report.ts 559 lines

Complete agent self-reporting system with CLI commands and validation.

Reportable State Maps To Description
started session: working/agent_acknowledged Agent picked up the task
working session: working/task_in_progress Generic progress signal
fixing_ci session: working/fixing_ci Responding to CI failure
addressing_reviews session: working/resolving_review_comments Responding to review comments
needs_input session: needs_input/awaiting_user_input Needs human decision
pr_created session: idle/pr_created Agent created a PR
report-watcher.ts 254 lines

Background trigger system monitoring agent reports for anomalies.

Trigger Condition Threshold
no_acknowledge Agent never ran ao acknowledge 10 minutes
stale_report Agent hasn't reported in a while 30 minutes
agent_needs_input Agent reported needs_input N/A
05

Behavioral Changes

#
Before After Impact
status: "pr_open" pr.state: "open", pr.reason: "in_progress" PR Axis
status: "ci_failed" pr.reason: "ci_failing", session.reason: "fixing_ci" Session PR
status: "stuck" session.state: "stuck" OR "detecting" New State
Probe failure → silently "alive" Probe failure → detecting state, 3-attempt budget Breaking
isAlive() error → returns true runtime.state: "probe_failed" Runtime
No agent self-report ao acknowledge + ao report New Feature
activity: "idle" | null ActivitySignal with state + confidence + source Enhanced
PR closed → status: "killed" pr.state: "closed", pr.reason: "closed_unmerged" PR Axis

Major File Changes

lifecycle-manager.ts +688 / -153

The main polling loop now uses a commit() closure that applies decisions to the lifecycle object. Evidence tracking on every transition. PR state changes fire dedicated events.

session-manager.ts +181 / -27

Spawn creates CanonicalSessionLifecycle at spawn time. Restore checks lifecycle PR state. Kill writes terminated lifecycle to metadata.

packages/web +700 lines

New SessionTruthPanel displaying three-axis lifecycle. New SessionReportAuditPanel for agent report trail. Attention zones now check lifecycle axes first.
06

Migration & Compatibility

#

No data migration step is required. The system handles both v1 and v2 metadata transparently through synthesis and dual-write.

parseCanonicalLifecycle() checks for statePayload in metadata:

  • If present (v2): parses and normalizes it
  • If absent (v1): calls synthesizeCanonicalLifecycle() to reverse-engineer from flat keys

Every write produces both:

  • statePayload (v2 JSON blob)
  • Flat keys like status, pr, tmuxName (v1 compatibility)

deriveLegacyStatus() maps three-axis lifecycle back to flat SessionStatus:

  1. PR state (merged → "merged", closed → "idle")
  2. Session state (stuck → "stuck", detecting → "detecting")
  3. PR reasons (ci_failing → "ci_failed")
  4. Session reasons (fixing_ci → "ci_failed")
  5. Fallback to previous status

Status Mapping Tables

v1 Status session.state session.reason pr.state
spawning not_started spawn_requested none
working working task_in_progress from meta["pr"]
needs_input needs_input awaiting_user_input from meta["pr"]
stuck stuck probe_failure from meta["pr"]
errored terminated error_in_process from meta["pr"]
killed terminated manually_killed from meta["pr"]
done done research_complete from meta["pr"]
merged idle merged_waiting_decision from meta["pr"]
Lifecycle State Derived SessionStatus
pr.state = merged merged
pr.state = closed idle
session.state = stuck stuck
session.state = needs_input needs_input
session.state = detecting detecting
session.state = terminated killed
session.state = done done
pr.reason = ci_failing ci_failed
pr.reason = changes_requested changes_requested
pr.reason = merge_ready mergeable
session.reason = fixing_ci ci_failed
07

Test Coverage

#

Here are scenarios that were impossible to test before — and now have dedicated test cases.

BEFORE: Untestable AFTER: Tested

"Probe failure should not silently appear alive"

Old behavior: isAlive() throws → caught → returns true. Session looks alive when it might be dead.

lifecycle-status-decisions.test.ts:142 expect(result.sessionState).toBe("detecting")
BEFORE: Untestable AFTER: Tested

"Runtime alive + process dead should not immediately kill"

Old behavior: Binary decision. New behavior: Enter detecting, retry 3 times over 5 minutes before escalating.

lifecycle-status-decisions.test.ts:89 expect(resolveProbeDecision({...}).decision).toBe("detecting")
BEFORE: N/A AFTER: New Feature

"Agent can report fixing_ci and lifecycle updates correctly"

New feature: ao report fixing_ci sets session.state: "working", session.reason: "fixing_ci".

agent-report.test.ts:234 expect(result.lifecycle.session.reason).toBe("fixing_ci")
BEFORE: Implicit AFTER: Explicit

"v1 sessions synthesize into valid v2 lifecycle"

Backward compat: A session with only status=ci_failed in metadata gets a full lifecycle with PR and session axes.

lifecycle-state.test.ts:45 expect(synthesized.pr.reason).toBe("in_progress")
BEFORE: Silent AFTER: Evented

"PR closed fires dedicated event"

New event: When pr.state transitions from "open" to "closed", a pr.closed event fires.

lifecycle-manager.test.ts:512 expect(events).toContainEqual({type: "pr.closed", ...})
3,500+ new test lines
27 test files touched
8 new test suites
00
Review Guide
How to review this PR — reading order
01
Risk Assessment
Blast radius, rollback plan
02
Architecture
Three-axis canonical lifecycle
03
New Files
Files and their responsibilities
04
Migration
Backward compatibility guide
05
Test Coverage
New scenarios and test stories