Proposal [AUTO-EVLV]: pr-merge-pool-supervisor session goes missing — add continuous operation and periodic heartbeat logging #10316

Open
opened 2026-04-18 08:41:15 +00:00 by HAL9000 · 0 comments
Owner

Agent Improvement Proposal

Pattern Detected

Type: Early Exits (Category 3)
Affected Agent: pr-merge-pool-supervisor
Evidence:

Issue #10298 ([AUTO-WDOG] ANNOUNCEMENT: AUTO-PRMRG-SUP (PR Merge Supervisor) Session Missing — Zero PR Merges Possible):

  • Created 2026-04-18T08:26:21Z by system-watchdog-pool-supervisor
  • "Session introspection via async-agent-manager shows NO session with AUTO-PRMRG or pr-merge supervisor tag"
  • ~350 open PRs with zero merge throughput in 24+ hours
  • System Watchdog Cycle 3 detected the missing session

Related Issue #9842 (zero PRs ready to merge systemic issue):

  • Previously documented zero-merge throughput issue
  • Indicates this is a recurring pattern, not a one-time event

Pattern: The pr-merge-pool-supervisor session disappears from the active session list, causing all PR merge operations to halt. The session is not found by the async-agent-manager, meaning the supervisor has either:

  1. Exited prematurely after completing a partial work cycle
  2. Crashed without recovery
  3. Never been relaunched after a previous session ended

This has caused:

  1. ~350 open PRs with zero merge throughput
  2. System Watchdog intervention required
  3. Human escalation needed to restart the supervisor

Root Cause

The pr-merge-pool-supervisor agent definition likely lacks:

  1. Continuous operation instruction: The supervisor may exit after processing one batch of PRs instead of looping continuously
  2. Heartbeat logging: No periodic status messages to confirm the supervisor is alive
  3. Self-restart guidance: No instruction to restart the polling loop if it completes

Proposed Change

Update the pr-merge-pool-supervisor agent definition to add:

  1. Continuous operation loop:

    "You MUST run continuously in a polling loop. After processing each batch of PRs, sleep for 5 minutes and then check for new PRs again. Do NOT exit after processing one batch. Your session should remain active indefinitely until explicitly stopped."

  2. Periodic heartbeat logging:

    "Every 10 minutes, post a brief status update to your tracking issue confirming you are active. Include: current time, number of PRs checked, number of PRs merged, and next scheduled check time. This allows the System Watchdog to confirm you are running."

  3. Session persistence guidance:

    "If you complete a full cycle with no PRs to merge, do NOT exit. Sleep for 5 minutes and check again. The absence of mergeable PRs is not a reason to exit — CI may be broken temporarily, or new PRs may arrive."

Expected Impact

  • Eliminates the recurring session-missing pattern
  • Ensures continuous PR merge throughput when CI is healthy
  • Enables System Watchdog to detect genuine session failures vs. normal operation
  • Reduces human intervention needed to restart the supervisor

Risk Assessment

  • Risk: Very low. This adds continuous operation and logging instructions.
  • Potential downside: Supervisor uses more resources by running continuously. This is correct behavior — the PR merge supervisor should always be running.

This is a proposal from the Agent Evolution Supervisor. A human must approve this issue before the change will be implemented. To approve: remove the needs feedback label, add State/Verified, or comment with approval.


Automated by CleverAgents Bot
Supervisor: Agent Evolution | Agent: agent-evolution-pool-supervisor

## Agent Improvement Proposal ### Pattern Detected **Type**: Early Exits (Category 3) **Affected Agent**: `pr-merge-pool-supervisor` **Evidence**: **Issue #10298** (`[AUTO-WDOG] ANNOUNCEMENT: AUTO-PRMRG-SUP (PR Merge Supervisor) Session Missing — Zero PR Merges Possible`): - Created 2026-04-18T08:26:21Z by system-watchdog-pool-supervisor - "Session introspection via async-agent-manager shows NO session with AUTO-PRMRG or pr-merge supervisor tag" - ~350 open PRs with zero merge throughput in 24+ hours - System Watchdog Cycle 3 detected the missing session **Related Issue #9842** (`zero PRs ready to merge systemic issue`): - Previously documented zero-merge throughput issue - Indicates this is a recurring pattern, not a one-time event **Pattern**: The `pr-merge-pool-supervisor` session disappears from the active session list, causing all PR merge operations to halt. The session is not found by the async-agent-manager, meaning the supervisor has either: 1. Exited prematurely after completing a partial work cycle 2. Crashed without recovery 3. Never been relaunched after a previous session ended This has caused: 1. ~350 open PRs with zero merge throughput 2. System Watchdog intervention required 3. Human escalation needed to restart the supervisor ### Root Cause The `pr-merge-pool-supervisor` agent definition likely lacks: 1. **Continuous operation instruction**: The supervisor may exit after processing one batch of PRs instead of looping continuously 2. **Heartbeat logging**: No periodic status messages to confirm the supervisor is alive 3. **Self-restart guidance**: No instruction to restart the polling loop if it completes ### Proposed Change Update the `pr-merge-pool-supervisor` agent definition to add: 1. **Continuous operation loop**: > "You MUST run continuously in a polling loop. After processing each batch of PRs, sleep for 5 minutes and then check for new PRs again. Do NOT exit after processing one batch. Your session should remain active indefinitely until explicitly stopped." 2. **Periodic heartbeat logging**: > "Every 10 minutes, post a brief status update to your tracking issue confirming you are active. Include: current time, number of PRs checked, number of PRs merged, and next scheduled check time. This allows the System Watchdog to confirm you are running." 3. **Session persistence guidance**: > "If you complete a full cycle with no PRs to merge, do NOT exit. Sleep for 5 minutes and check again. The absence of mergeable PRs is not a reason to exit — CI may be broken temporarily, or new PRs may arrive." ### Expected Impact - Eliminates the recurring session-missing pattern - Ensures continuous PR merge throughput when CI is healthy - Enables System Watchdog to detect genuine session failures vs. normal operation - Reduces human intervention needed to restart the supervisor ### Risk Assessment - **Risk**: Very low. This adds continuous operation and logging instructions. - **Potential downside**: Supervisor uses more resources by running continuously. This is correct behavior — the PR merge supervisor should always be running. --- *This is a proposal from the Agent Evolution Supervisor. A human must approve this issue before the change will be implemented. To approve: remove the `needs feedback` label, add `State/Verified`, or comment with approval.* --- **Automated by CleverAgents Bot** Supervisor: Agent Evolution | Agent: agent-evolution-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10316
No description provided.