Proposal: improve ca-uat-tester — add worker completion reporting and supervisor timeout handling #2030

Open
opened 2026-04-03 02:43:34 +00:00 by freemo · 1 comment
Owner

Agent Improvement Proposal

Pattern Detected

Type: Workflow fix — UAT workers never report completion, supervisor shows 0% indefinitely
Affected Agent: ca-uat-tester (Pool Supervisor + Worker Mode)
Evidence: Across multiple UAT supervisor sessions during v3.7.0, the supervisor consistently reports 0% area completion despite workers being active and filing bugs:

Report Cycle Active Workers Areas Completed Bugs Filed
#1459 61 16/16 0/16 39
#1478 141 16/16 0/16 30
#1578 361 16/16 0/16 9
#1585 391 16/16 0/16 9
#1586 401 16/16 0/16 9
#1600 411 16/16 0/16 3
#1675 431 16/16 0/16 2
#2017 611 16/16 0/16 8

Key observations:

  1. Workers ARE dispatched (16/16 active consistently)
  2. Workers ARE filing bugs (39 bugs at cycle 61, decreasing over time as duplicates are caught)
  3. But zero areas are ever marked as completed across 600+ cycles spanning ~8 hours
  4. The latest session (#2007) shows 8 workers posted "Starting" comments but none posted "Completed" comments
  5. All feature areas remain in "🔄" (in-progress) state indefinitely

Root Cause Analysis: The disconnect appears to be between worker execution and supervisor completion detection:

  • Workers start, post a "Starting" comment, begin analysis, file bugs, but then either:
    (a) Hit context limits before they can exit cleanly with a result summary
    (b) Never explicitly post a "Completed" comment that the supervisor can detect
    (c) The supervisor's session status polling doesn't correctly detect worker completion
  • The agent definition instructs workers to "exit" after testing their area, but doesn't provide explicit guidance on HOW to signal completion back to the supervisor
  • The supervisor's monitoring loop checks if session is completed or errored but the pseudocode doesn't specify what constitutes "completed" — if workers exhaust their context window, they may just stop without a clean exit signal

Proposed Change

Modify ca-uat-tester.md to address the completion gap:

  1. Add explicit worker completion protocol — Workers MUST post a structured "Completed" comment on the tracking issue before exiting, with a specific format the supervisor can parse:

    ## UAT Worker COMPLETED — <feature_area>
    BUGS_FILED: <N>
    ISSUE_NUMBERS: [#X, #Y, ...]
    STATUS: completed|failed|timeout
    
  2. Add worker time budget — Workers should be instructed to budget their analysis time. After completing their analysis passes, they MUST post the completion comment and exit. The instruction should emphasize: "Do NOT continue analyzing indefinitely. Complete your passes, file bugs, post completion, and exit."

  3. Add supervisor timeout handling — The supervisor should implement a maximum worker lifetime (e.g., if a worker has been active for more than 30 minutes without posting a completion comment, mark the area as "tested with timeout" and move on). This prevents the supervisor from waiting indefinitely for workers that hit context limits.

  4. Add supervisor completion detection fallback — If the supervisor can't detect session completion via the API, it should also check for the structured "Completed" comment on the tracking issue as a fallback signal.

Expected Impact

  • Areas will actually be marked as completed, giving accurate progress tracking
  • The supervisor can detect and recover from worker timeouts
  • Progress reports will show meaningful completion percentages instead of perpetual 0%
  • Workers that hit context limits won't block the entire testing pipeline
  • Human observers can see actual UAT progress

Risk Assessment

  • Low risk: These changes add completion signaling and timeout handling without modifying the analysis logic itself.
  • Potential concern: The 30-minute timeout might be too short for complex feature areas. However, a worker that hasn't completed in 30 minutes is likely stuck or has hit context limits, so timing out and re-dispatching is better than waiting indefinitely.
  • Potential concern: Adding the completion comment protocol adds a small amount of overhead to each worker, but this is negligible compared to the analysis work.

This is a proposal from the agent evolver. A human must approve this issue before the change will be implemented. To approve: remove the needs feedback label, add State/Verified, or comment with approval.


Automated by CleverAgents Bot
Supervisor: Agent Evolver | Agent: ca-agent-evolver

## Agent Improvement Proposal ### Pattern Detected **Type**: Workflow fix — UAT workers never report completion, supervisor shows 0% indefinitely **Affected Agent**: `ca-uat-tester` (Pool Supervisor + Worker Mode) **Evidence**: Across multiple UAT supervisor sessions during v3.7.0, the supervisor consistently reports **0% area completion** despite workers being active and filing bugs: | Report | Cycle | Active Workers | Areas Completed | Bugs Filed | |---|---|---|---|---| | #1459 | 61 | 16/16 | 0/16 | 39 | | #1478 | 141 | 16/16 | 0/16 | 30 | | #1578 | 361 | 16/16 | 0/16 | 9 | | #1585 | 391 | 16/16 | 0/16 | 9 | | #1586 | 401 | 16/16 | 0/16 | 9 | | #1600 | 411 | 16/16 | 0/16 | 3 | | #1675 | 431 | 16/16 | 0/16 | 2 | | #2017 | 611 | 16/16 | 0/16 | 8 | **Key observations**: 1. Workers ARE dispatched (16/16 active consistently) 2. Workers ARE filing bugs (39 bugs at cycle 61, decreasing over time as duplicates are caught) 3. But **zero areas are ever marked as completed** across 600+ cycles spanning ~8 hours 4. The latest session (#2007) shows 8 workers posted "Starting" comments but none posted "Completed" comments 5. All feature areas remain in "🔄" (in-progress) state indefinitely **Root Cause Analysis**: The disconnect appears to be between worker execution and supervisor completion detection: - Workers start, post a "Starting" comment, begin analysis, file bugs, but then either: (a) Hit context limits before they can exit cleanly with a result summary (b) Never explicitly post a "Completed" comment that the supervisor can detect (c) The supervisor's session status polling doesn't correctly detect worker completion - The agent definition instructs workers to "exit" after testing their area, but doesn't provide explicit guidance on HOW to signal completion back to the supervisor - The supervisor's monitoring loop checks `if session is completed or errored` but the pseudocode doesn't specify what constitutes "completed" — if workers exhaust their context window, they may just stop without a clean exit signal ### Proposed Change Modify `ca-uat-tester.md` to address the completion gap: 1. **Add explicit worker completion protocol** — Workers MUST post a structured "Completed" comment on the tracking issue before exiting, with a specific format the supervisor can parse: ``` ## UAT Worker COMPLETED — <feature_area> BUGS_FILED: <N> ISSUE_NUMBERS: [#X, #Y, ...] STATUS: completed|failed|timeout ``` 2. **Add worker time budget** — Workers should be instructed to budget their analysis time. After completing their analysis passes, they MUST post the completion comment and exit. The instruction should emphasize: "Do NOT continue analyzing indefinitely. Complete your passes, file bugs, post completion, and exit." 3. **Add supervisor timeout handling** — The supervisor should implement a maximum worker lifetime (e.g., if a worker has been active for more than 30 minutes without posting a completion comment, mark the area as "tested with timeout" and move on). This prevents the supervisor from waiting indefinitely for workers that hit context limits. 4. **Add supervisor completion detection fallback** — If the supervisor can't detect session completion via the API, it should also check for the structured "Completed" comment on the tracking issue as a fallback signal. ### Expected Impact - Areas will actually be marked as completed, giving accurate progress tracking - The supervisor can detect and recover from worker timeouts - Progress reports will show meaningful completion percentages instead of perpetual 0% - Workers that hit context limits won't block the entire testing pipeline - Human observers can see actual UAT progress ### Risk Assessment - **Low risk**: These changes add completion signaling and timeout handling without modifying the analysis logic itself. - **Potential concern**: The 30-minute timeout might be too short for complex feature areas. However, a worker that hasn't completed in 30 minutes is likely stuck or has hit context limits, so timing out and re-dispatching is better than waiting indefinitely. - **Potential concern**: Adding the completion comment protocol adds a small amount of overhead to each worker, but this is negligible compared to the analysis work. --- *This is a proposal from the agent evolver. A human must approve this issue before the change will be implemented. To approve: remove the `needs feedback` label, add `State/Verified`, or comment with approval.* --- **Automated by CleverAgents Bot** Supervisor: Agent Evolver | Agent: ca-agent-evolver
Author
Owner

approved

approved
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#2030
No description provided.