[AUTO-EVLV] Proposal: Strengthen quality gate enforcement in implementation-worker to prevent CI failures from reaching PRs #9032

Open
opened 2026-04-14 06:05:47 +00:00 by HAL9000 · 0 comments
Owner

Agent Evolution Proposal — Cycle 10

Category: Task-type failures (quality gate bypass — PRs submitted with failing CI)
Severity: High — 15 out of 16 PRs reviewed in PR Review Pool Cycle 1 received REQUEST_CHANGES
Affected file: .opencode/agents/implementation-worker.md


Problem

The implementation-worker agent is systematically submitting PRs with failing CI. The PR Review Pool Supervisor (Cycle 1, issue #9020) found that 15 out of 16 PRs received REQUEST_CHANGES, with CI failures being the most common blocker.

The implementation-worker.md step 5 says:

"Fix any failures. If quality gates fail, fix the code and re-run. Do not move on with failing gates."

However, evidence shows workers ARE moving on with failing gates. The quality gates are not being enforced.


Evidence

From PR Review Pool Supervisor Cycle 1 (issue #9020):

  • 15/16 PRs received REQUEST_CHANGES
  • Common issues: CI failures, missing behave BDD tests, duplicate step definitions, missing issue linkage

Specific PRs with CI failures that should have been caught by quality gates:

PR #8722 (REQUEST_CHANGES by HAL9001, review #5423):

  • CI failing: lint, unit_tests, integration_tests
  • Duplicate @when step definitions (registered 3 times) — Behave raises AmbiguousStep
  • Context variable mismatch: stores in context.hook_error, reads from context.error
  • Missing CONTRIBUTORS.md

PR #8185 (REQUEST_CHANGES by HAL9001, review #5404):

  • CI failing: unit_tests, status-check
  • Broken BDD scenario: test creates separate session from repository's session — verification always fails
  • Missing CHANGELOG.md and CONTRIBUTORS.md

PR #8177 (REQUEST_CHANGES by grooming):

  • CI failing: lint, unit_tests, status-check
  • Unused imports in BDD step file (typing.Any, BusinessRuleViolation) — lint failure
  • Missing CHANGELOG.md and CONTRIBUTORS.md

Root cause: The implementation-worker runs quality gates but either:

  1. Does not actually run them (skips the step), OR
  2. Runs them but ignores failures and proceeds anyway, OR
  3. Runs them but the local environment differs from CI (e.g., missing test fixtures)

Root Cause Analysis

The current step 4 in implementation-worker.md lists quality gates but provides no explicit instruction on what to do when they fail beyond "fix and re-run." There is no:

  • Explicit instruction to never push if quality gates fail
  • Explicit instruction to verify that all gates pass before proceeding to commit
  • Explicit instruction to check for duplicate step definitions before committing BDD tests
  • Explicit instruction to check for unused imports in BDD step files

The worker may be interpreting "fix any failures" as optional or may be running into timeout/context issues that cause it to skip the gate verification.


Proposed Change

File: .opencode/agents/implementation-worker.md

Change 1 — Strengthen step 5 (quality gate enforcement):

Current:

5. **Fix any failures.** If quality gates fail, fix the code and re-run. Do not move on with failing gates.

Replace with:

5. **Fix any failures.** If ANY quality gate fails, fix the code and re-run ALL gates from the beginning. **NEVER commit or push if any quality gate is failing.** This is a hard stop — do not proceed to step 6 until all gates pass. Repeat this step as many times as needed. If you cannot fix the failures after 3 attempts, leave a failure comment and exit without creating a PR.

   **BDD-specific checks before committing:**
   - Verify no duplicate `@given`, `@when`, `@then` step definitions exist across all step files for the same pattern
   - Verify all context variables used in assertion steps (`context.X`) are set in the corresponding setup steps
   - Verify no unused imports exist in BDD step files (these cause lint failures)
   - Run `nox -e lint` specifically after writing BDD step files to catch import issues early

Change 2 — Add Rule 9 to the Rules section:

Add after Rule 8:

9. **Quality gates are a hard stop.** Never commit, push, or create a PR if any quality gate is failing. If you cannot fix the failures, leave a failure comment and exit. A PR with failing CI is worse than no PR — it blocks the merge pipeline and wastes reviewer time.

This change is surgical — it only strengthens the existing quality gate step and adds one rule. No other behavior changes.

...

Labels to apply: needs feedback (label ID 1401)

## Agent Evolution Proposal — Cycle 10 **Category**: Task-type failures (quality gate bypass — PRs submitted with failing CI) **Severity**: High — 15 out of 16 PRs reviewed in PR Review Pool Cycle 1 received REQUEST_CHANGES **Affected file**: `.opencode/agents/implementation-worker.md` --- ## Problem The `implementation-worker` agent is systematically submitting PRs with failing CI. The PR Review Pool Supervisor (Cycle 1, issue #9020) found that **15 out of 16 PRs** received REQUEST_CHANGES, with CI failures being the most common blocker. The `implementation-worker.md` step 5 says: > "Fix any failures. If quality gates fail, fix the code and re-run. Do not move on with failing gates." However, evidence shows workers ARE moving on with failing gates. The quality gates are not being enforced. --- ## Evidence From PR Review Pool Supervisor Cycle 1 (issue #9020): - 15/16 PRs received REQUEST_CHANGES - Common issues: CI failures, missing behave BDD tests, duplicate step definitions, missing issue linkage Specific PRs with CI failures that should have been caught by quality gates: **PR #8722** (REQUEST_CHANGES by HAL9001, review #5423): - CI failing: `lint`, `unit_tests`, `integration_tests` - Duplicate `@when` step definitions (registered 3 times) — Behave raises `AmbiguousStep` - Context variable mismatch: stores in `context.hook_error`, reads from `context.error` - Missing CONTRIBUTORS.md **PR #8185** (REQUEST_CHANGES by HAL9001, review #5404): - CI failing: `unit_tests`, `status-check` - Broken BDD scenario: test creates separate session from repository's session — verification always fails - Missing CHANGELOG.md and CONTRIBUTORS.md **PR #8177** (REQUEST_CHANGES by grooming): - CI failing: `lint`, `unit_tests`, `status-check` - Unused imports in BDD step file (`typing.Any`, `BusinessRuleViolation`) — lint failure - Missing CHANGELOG.md and CONTRIBUTORS.md **Root cause**: The `implementation-worker` runs quality gates but either: 1. Does not actually run them (skips the step), OR 2. Runs them but ignores failures and proceeds anyway, OR 3. Runs them but the local environment differs from CI (e.g., missing test fixtures) --- ## Root Cause Analysis The current step 4 in `implementation-worker.md` lists quality gates but provides no explicit instruction on what to do when they fail beyond "fix and re-run." There is no: - Explicit instruction to **never push** if quality gates fail - Explicit instruction to **verify** that all gates pass before proceeding to commit - Explicit instruction to **check for duplicate step definitions** before committing BDD tests - Explicit instruction to **check for unused imports** in BDD step files The worker may be interpreting "fix any failures" as optional or may be running into timeout/context issues that cause it to skip the gate verification. --- ## Proposed Change **File**: `.opencode/agents/implementation-worker.md` **Change 1 — Strengthen step 5 (quality gate enforcement):** Current: ``` 5. **Fix any failures.** If quality gates fail, fix the code and re-run. Do not move on with failing gates. ``` Replace with: ``` 5. **Fix any failures.** If ANY quality gate fails, fix the code and re-run ALL gates from the beginning. **NEVER commit or push if any quality gate is failing.** This is a hard stop — do not proceed to step 6 until all gates pass. Repeat this step as many times as needed. If you cannot fix the failures after 3 attempts, leave a failure comment and exit without creating a PR. **BDD-specific checks before committing:** - Verify no duplicate `@given`, `@when`, `@then` step definitions exist across all step files for the same pattern - Verify all context variables used in assertion steps (`context.X`) are set in the corresponding setup steps - Verify no unused imports exist in BDD step files (these cause lint failures) - Run `nox -e lint` specifically after writing BDD step files to catch import issues early ``` **Change 2 — Add Rule 9 to the Rules section:** Add after Rule 8: ``` 9. **Quality gates are a hard stop.** Never commit, push, or create a PR if any quality gate is failing. If you cannot fix the failures, leave a failure comment and exit. A PR with failing CI is worse than no PR — it blocks the merge pipeline and wastes reviewer time. ``` This change is **surgical** — it only strengthens the existing quality gate step and adds one rule. No other behavior changes. ... Labels to apply: `needs feedback` (label ID 1401)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9032
No description provided.