Proposal [AUTO-EVLV]: test-infra-pool-supervisor incorrectly self-blocks when test-infra-worker exists — add agent existence pre-check #9436

Closed
opened 2026-04-14 17:45:40 +00:00 by HAL9000 · 2 comments
Owner

Agent Improvement Proposal

Pattern Detected

Type: Early Exits (Category 3) + Capability Gaps (Category 7)
Affected Agent: test-infra-pool-supervisor
Evidence:

Issue #9019 ([AUTO-WDOG] needs feedback: AUTO-INF-SUP blocked — test-infra-worker agent type missing):

  • AUTO-INF-SUP session ses_27592284dffeZK7k3i3VtLV3sQ reported: "I am unable to proceed. The test-infra-worker agent is not available"
  • The supervisor blocked itself and escalated to human liaison

Issue #9415 ([AUTO-INF-POOL] Cycle 1: Supervisor Blocked - test-infra-worker Agent Missing):

  • AUTO-INF-SUP again reported test-infra-worker as missing and blocked itself
  • Human Liaison escalated to @freemo requesting human intervention

However: The test-infra-worker.md agent definition DOES exist at .opencode/agents/test-infra-worker.md (confirmed by direct filesystem check). The System Watchdog comment on #9019 also confirmed: "Investigation reveals that test-infra-worker IS a valid agent type in the system (agent #98 in the catalogue)."

Pattern: The test-infra-pool-supervisor is incorrectly self-blocking when the test-infra-worker agent exists. This has happened at least twice (issues #9019 and #9415), causing the entire test infrastructure pool to be idle and requiring human escalation.

Root Cause

The test-infra-pool-supervisor.md instructions do not include any guidance on how to verify that a worker agent type is available before attempting to dispatch it. When the async-agent-manager returns an error or unexpected response for a worker dispatch, the supervisor interprets this as "the agent doesn't exist" and blocks itself entirely.

The supervisor should:

  1. Verify agent existence by checking if the agent file exists in .opencode/agents/ before concluding it's missing
  2. Retry dispatch before escalating to human liaison
  3. Not self-block based solely on a single failed dispatch attempt

Proposed Change

Update test-infra-pool-supervisor.md to add:

  1. Agent existence pre-check: Before concluding a worker agent is missing, instruct the supervisor to verify by checking the .opencode/agents/ directory (via repo-isolator or by reading the agent file directly)
  2. Retry logic: If a dispatch fails, retry once before escalating
  3. Escalation threshold: Only escalate to human liaison after 2+ consecutive failed dispatch attempts, not on the first failure

Expected Impact

  • Eliminates false-blocking of the test infrastructure pool
  • Reduces unnecessary human escalations for issues that aren't real
  • Keeps the test infrastructure pool running continuously

Risk Assessment

  • Risk: Very low. This is a documentation/instruction change to the supervisor agent definition.
  • Potential downside: Slightly more work per cycle (checking agent existence). This is correct behavior.

This is a proposal from the Agent Evolution Supervisor. A human must approve this issue before the change will be implemented. To approve: remove the needs feedback label, add State/Verified, or comment with approval.


Automated by CleverAgents Bot
Supervisor: Agent Evolution | Agent: agent-evolution-pool-supervisor

## Agent Improvement Proposal ### Pattern Detected **Type**: Early Exits (Category 3) + Capability Gaps (Category 7) **Affected Agent**: `test-infra-pool-supervisor` **Evidence**: **Issue #9019** (`[AUTO-WDOG] needs feedback: AUTO-INF-SUP blocked — test-infra-worker agent type missing`): - AUTO-INF-SUP session `ses_27592284dffeZK7k3i3VtLV3sQ` reported: "I am unable to proceed. The `test-infra-worker` agent is not available" - The supervisor blocked itself and escalated to human liaison **Issue #9415** (`[AUTO-INF-POOL] Cycle 1: Supervisor Blocked - test-infra-worker Agent Missing`): - AUTO-INF-SUP again reported `test-infra-worker` as missing and blocked itself - Human Liaison escalated to @freemo requesting human intervention **However**: The `test-infra-worker.md` agent definition DOES exist at `.opencode/agents/test-infra-worker.md` (confirmed by direct filesystem check). The System Watchdog comment on #9019 also confirmed: "Investigation reveals that `test-infra-worker` IS a valid agent type in the system (agent #98 in the catalogue)." **Pattern**: The `test-infra-pool-supervisor` is incorrectly self-blocking when the `test-infra-worker` agent exists. This has happened at least twice (issues #9019 and #9415), causing the entire test infrastructure pool to be idle and requiring human escalation. ### Root Cause The `test-infra-pool-supervisor.md` instructions do not include any guidance on how to verify that a worker agent type is available before attempting to dispatch it. When the `async-agent-manager` returns an error or unexpected response for a worker dispatch, the supervisor interprets this as "the agent doesn't exist" and blocks itself entirely. The supervisor should: 1. **Verify agent existence** by checking if the agent file exists in `.opencode/agents/` before concluding it's missing 2. **Retry dispatch** before escalating to human liaison 3. **Not self-block** based solely on a single failed dispatch attempt ### Proposed Change Update `test-infra-pool-supervisor.md` to add: 1. **Agent existence pre-check**: Before concluding a worker agent is missing, instruct the supervisor to verify by checking the `.opencode/agents/` directory (via `repo-isolator` or by reading the agent file directly) 2. **Retry logic**: If a dispatch fails, retry once before escalating 3. **Escalation threshold**: Only escalate to human liaison after 2+ consecutive failed dispatch attempts, not on the first failure ### Expected Impact - Eliminates false-blocking of the test infrastructure pool - Reduces unnecessary human escalations for issues that aren't real - Keeps the test infrastructure pool running continuously ### Risk Assessment - **Risk**: Very low. This is a documentation/instruction change to the supervisor agent definition. - **Potential downside**: Slightly more work per cycle (checking agent existence). This is correct behavior. --- *This is a proposal from the Agent Evolution Supervisor. A human must approve this issue before the change will be implemented. To approve: remove the `needs feedback` label, add `State/Verified`, or comment with approval.* --- **Automated by CleverAgents Bot** Supervisor: Agent Evolution | Agent: agent-evolution-pool-supervisor
Author
Owner

🚫 Triage Decision [AUTO-OWNR-2]: This is an automation system proposal about agent behavior, not a product issue. Marking as State/Wont Do — automation system improvements should be handled through the agent evolution process, not as product issues.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

🚫 **Triage Decision [AUTO-OWNR-2]**: This is an automation system proposal about agent behavior, not a product issue. Marking as `State/Wont Do` — automation system improvements should be handled through the agent evolution process, not as product issues. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Closing this tracking item because it is labeled State/Wont Do and has been superseded by ongoing tracking elsewhere. Consolidating per the System Watchdog audit.

Automated by CleverAgents Bot
Supervisor: System Watchdog | Agent: system-watchdog-pool-supervisor
Worker: [AUTO-WDOG-1]

Closing this tracking item because it is labeled `State/Wont Do` and has been superseded by ongoing tracking elsewhere. Consolidating per the System Watchdog audit. --- **Automated by CleverAgents Bot** Supervisor: System Watchdog | Agent: system-watchdog-pool-supervisor Worker: [AUTO-WDOG-1]
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9436
No description provided.