Proposal: improve ca-uat-tester and ca-bug-hunter — add backlog pressure throttling to prevent issue creation from outpacing implementation #3689

Closed
opened 2026-04-05 22:10:59 +00:00 by freemo · 1 comment
Owner

Agent Improvement Proposal

Pattern Detected

Type: workflow_fix
Affected Agents: ca-uat-tester, ca-bug-hunter
Evidence:

During the current build session (#3654) and the previous session (#3377), the UAT tester and bug hunter agents created issues at a rate that consistently outpaced the implementation pool's ability to close them:

  • Open bugs: 536 → 809 (+273 new bugs in a single session) — from timeline PR #3446
  • Milestone completion percentages DECLINED across all milestones despite active implementation work:
    • M3 (v3.2.0): 67% → 63%
    • M4 (v3.3.0): 63% → 60%
    • M5 (v3.4.0): 70% → 68%
    • M6 (v3.5.0): 67% → 55%
    • M7 (v3.6.0): 55% → 43%
    • M8 (v3.7.0): 52% → 43%
  • Open PRs: 92 → 170 (+78 new PRs from agent-driven activity) — further overwhelming the review pipeline
  • The timeline updater explicitly noted: "Completion percentages declined across M3 and M4 because agent-driven issue creation outpacing closures"

Neither ca-uat-tester.md nor ca-bug-hunter.md currently contain any mechanism to check the size of the existing bug backlog before dispatching new workers. They file issues unconditionally regardless of how many are already open.

Proposed Change

Add a backlog pressure check to both ca-uat-tester and ca-bug-hunter pool supervisor modes. Before dispatching new worker instances in each cycle:

  1. Query Forgejo for the count of open issues with Type/Bug label (or UAT-filed issues)
  2. Query Forgejo for the count of open PRs (indicating implementation pipeline saturation)
  3. If open bugs exceed a configurable threshold (suggested: 200) OR open PRs exceed a threshold (suggested: 100), pause new worker dispatches for that cycle
  4. Log a health signal explaining the pause: "Backlog pressure: {N} open bugs, {M} open PRs — pausing new worker dispatches until implementation catches up"
  5. Continue monitoring in subsequent cycles and resume dispatching when the backlog drops below the threshold

This is a pool supervisor mode only change — individual workers already assigned to a feature area would continue their current work. Only NEW worker dispatches would be throttled.

Expected Impact

  • Milestone completion percentages should stabilize or increase instead of declining
  • The implementation pool will have time to process existing bugs before new ones are filed
  • The PR review pipeline will not be overwhelmed with more PRs than reviewers can handle
  • Overall system throughput should improve as resources are not wasted on filing issues that can't be addressed

Risk Assessment

  • Low risk: The change only affects the dispatch rate, not the quality of testing. All feature areas will still eventually be tested.
  • Potential downside: If the threshold is set too low, legitimate bugs may be discovered later than they otherwise would be. The suggested threshold of 200 open bugs is conservative — it still allows a substantial backlog.
  • Mitigation: The threshold should be configurable and the pause should be logged clearly so it's visible in health signals.

This is a proposal from the agent evolver. A human must approve this issue before the change will be implemented. To approve: remove the needs feedback label, add State/Verified, or comment with approval.


Automated by CleverAgents Bot
Supervisor: Agent Evolver | Agent: ca-agent-evolver

## Agent Improvement Proposal ### Pattern Detected **Type**: workflow_fix **Affected Agents**: `ca-uat-tester`, `ca-bug-hunter` **Evidence**: During the current build session (#3654) and the previous session (#3377), the UAT tester and bug hunter agents created issues at a rate that consistently outpaced the implementation pool's ability to close them: - **Open bugs**: 536 → 809 (+273 new bugs in a single session) — from timeline PR #3446 - **Milestone completion percentages DECLINED** across all milestones despite active implementation work: - M3 (v3.2.0): 67% → 63% - M4 (v3.3.0): 63% → 60% - M5 (v3.4.0): 70% → 68% - M6 (v3.5.0): 67% → 55% - M7 (v3.6.0): 55% → 43% - M8 (v3.7.0): 52% → 43% - **Open PRs**: 92 → 170 (+78 new PRs from agent-driven activity) — further overwhelming the review pipeline - The timeline updater explicitly noted: *"Completion percentages declined across M3 and M4 because agent-driven issue creation outpacing closures"* Neither `ca-uat-tester.md` nor `ca-bug-hunter.md` currently contain any mechanism to check the size of the existing bug backlog before dispatching new workers. They file issues unconditionally regardless of how many are already open. ### Proposed Change Add a **backlog pressure check** to both `ca-uat-tester` and `ca-bug-hunter` pool supervisor modes. Before dispatching new worker instances in each cycle: 1. Query Forgejo for the count of open issues with `Type/Bug` label (or UAT-filed issues) 2. Query Forgejo for the count of open PRs (indicating implementation pipeline saturation) 3. If open bugs exceed a configurable threshold (suggested: 200) OR open PRs exceed a threshold (suggested: 100), **pause new worker dispatches** for that cycle 4. Log a health signal explaining the pause: *"Backlog pressure: {N} open bugs, {M} open PRs — pausing new worker dispatches until implementation catches up"* 5. Continue monitoring in subsequent cycles and resume dispatching when the backlog drops below the threshold This is a **pool supervisor mode only** change — individual workers already assigned to a feature area would continue their current work. Only NEW worker dispatches would be throttled. ### Expected Impact - Milestone completion percentages should stabilize or increase instead of declining - The implementation pool will have time to process existing bugs before new ones are filed - The PR review pipeline will not be overwhelmed with more PRs than reviewers can handle - Overall system throughput should improve as resources are not wasted on filing issues that can't be addressed ### Risk Assessment - **Low risk**: The change only affects the dispatch rate, not the quality of testing. All feature areas will still eventually be tested. - **Potential downside**: If the threshold is set too low, legitimate bugs may be discovered later than they otherwise would be. The suggested threshold of 200 open bugs is conservative — it still allows a substantial backlog. - **Mitigation**: The threshold should be configurable and the pause should be logged clearly so it's visible in health signals. --- *This is a proposal from the agent evolver. A human must approve this issue before the change will be implemented. To approve: remove the `needs feedback` label, add `State/Verified`, or comment with approval.* --- **Automated by CleverAgents Bot** Supervisor: Agent Evolver | Agent: ca-agent-evolver
Author
Owner

@HAL9000 closing this ticket as "won't do", as long as it finds legitimate issues then its ok if it explodes, since these arent suggesting features. The bigger issue is when we fix one issue, it may fix multiple other issues. So if we have a lot of issues we may find we spend most of our time retrying tests to see if the regression still exists. But we can address that once it becomes a problem.

@HAL9000 closing this ticket as "won't do", as long as it finds legitimate issues then its ok if it explodes, since these arent suggesting features. The bigger issue is when we fix one issue, it may fix multiple other issues. So if we have a lot of issues we may find we spend most of our time retrying tests to see if the regression still exists. But we can address that once it becomes a problem.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#3689
No description provided.