test(plan): add tdd issue-capture test for cleanup_stale destroying execute output before apply #11123

Closed
hurui200320 wants to merge 1 commit from tdd/m3-cleanup-stale-destroys-execute-output into master
Member

Summary

This PR adds the TDD issue-capture test for bug #11121 per the project's Bug Fix Workflow. A failing test must be committed to master before the fix is implemented.

Additionally, this PR fixes two CI failures that were blocking the PR:

  • tdd_quality_gate: Renamed tag convention from @tdd_bug_N to @tdd_issue_N (matching CONTRIBUTING.md) and added TDD issue-capture PR detection so the gate correctly recognizes PRs that add @tdd_expected_fail (rather than requiring its removal as a bug fix PR would).

  • integration_tests: Fixed a GraphRecursionError in PlanGenerationGraph caused by _should_retry() attempting in-place state mutation in a LangGraph conditional edge function (which cannot persist mutations). Replaced with a proper _handle_retry() node that increments retry_count via state returns.

What was changed

TDD issue-capture test (issue #11120)

Added two Behave scenarios in features/tdd_cleanup_stale_destroys_execute_output.feature with step definitions in features/steps/tdd_cleanup_stale_destroys_execute_output_steps.py.

PlanGenerationGraph fix

  • src/cleveragents/agents/graphs/plan_generation.py: Removed in-place state mutation from _should_retry(), added _handle_retry() node, updated _build_graph() to include the 5th node with proper routing.

TDD quality gate fix

  • scripts/tdd_quality_gate.py: Renamed @tdd_bug_N@tdd_issue_N, added _diff_is_tdd_issue_capture() for detecting issue-capture PRs, updated run_quality_gate() and main().
  • Updated all related tests and test helpers (features/tdd_quality_gate.feature, features/steps/tdd_quality_gate_steps.py, robot/helper_tdd_quality_gate.py, features/steps/plan_generation_uncovered_lines_steps.py, robot/plan_generation_graph.robot).

The Bug Being Captured

_create_sandbox_for_plan() in plan.py calls GitWorktreeSandbox.cleanup_stale() unconditionally on every agents plan execute invocation — including when the plan is already in execute/complete state (awaiting apply). This silently destroys the cleveragents/plan-<id> git branch that holds all execution output, causing agents plan apply to find zero artifacts and produce an empty changeset.

Root cause location: plan.py_create_sandbox_for_plan()GitWorktreeSandbox.cleanup_stale(resource.location, plan_id) (unconditional call)

Quality Gates

  • nox -e lint: All checks passed
  • nox -e typecheck: 0 errors
  • nox -e unit_tests: 693 features passed, 15679 scenarios passed
  • nox -e integration_tests: 2013 tests passed, 0 failed
  • nox -e coverage_report: 96.52% (threshold: 96.5%)
  • nox -e tdd_quality_gate: Issue-capture PR detected for bug #11121
  • nox -e e2e_tests: Pre-existing failure on master (unrelated)

Closes

Closes #11120

Unblocks

Bug fix issue #11121 is now unblocked. Once the fix is merged, @tdd_expected_fail is removed and these scenarios become permanent regression guards.

## Summary This PR adds the TDD issue-capture test for bug #11121 per the project's Bug Fix Workflow. A failing test must be committed to `master` before the fix is implemented. Additionally, this PR fixes two CI failures that were blocking the PR: - **`tdd_quality_gate`**: Renamed tag convention from `@tdd_bug_N` to `@tdd_issue_N` (matching CONTRIBUTING.md) and added TDD issue-capture PR detection so the gate correctly recognizes PRs that *add* `@tdd_expected_fail` (rather than requiring its removal as a bug fix PR would). - **`integration_tests`**: Fixed a `GraphRecursionError` in `PlanGenerationGraph` caused by `_should_retry()` attempting in-place state mutation in a LangGraph conditional edge function (which cannot persist mutations). Replaced with a proper `_handle_retry()` node that increments `retry_count` via state returns. ## What was changed ### TDD issue-capture test (issue #11120) Added two Behave scenarios in `features/tdd_cleanup_stale_destroys_execute_output.feature` with step definitions in `features/steps/tdd_cleanup_stale_destroys_execute_output_steps.py`. ### PlanGenerationGraph fix - `src/cleveragents/agents/graphs/plan_generation.py`: Removed in-place state mutation from `_should_retry()`, added `_handle_retry()` node, updated `_build_graph()` to include the 5th node with proper routing. ### TDD quality gate fix - `scripts/tdd_quality_gate.py`: Renamed `@tdd_bug_N` → `@tdd_issue_N`, added `_diff_is_tdd_issue_capture()` for detecting issue-capture PRs, updated `run_quality_gate()` and `main()`. - Updated all related tests and test helpers (`features/tdd_quality_gate.feature`, `features/steps/tdd_quality_gate_steps.py`, `robot/helper_tdd_quality_gate.py`, `features/steps/plan_generation_uncovered_lines_steps.py`, `robot/plan_generation_graph.robot`). ## The Bug Being Captured `_create_sandbox_for_plan()` in `plan.py` calls `GitWorktreeSandbox.cleanup_stale()` unconditionally on every `agents plan execute` invocation — including when the plan is already in `execute/complete` state (awaiting apply). This silently destroys the `cleveragents/plan-<id>` git branch that holds all execution output, causing `agents plan apply` to find zero artifacts and produce an empty changeset. **Root cause location:** `plan.py` → `_create_sandbox_for_plan()` → `GitWorktreeSandbox.cleanup_stale(resource.location, plan_id)` (unconditional call) ## Quality Gates - `nox -e lint`: ✅ All checks passed - `nox -e typecheck`: ✅ 0 errors - `nox -e unit_tests`: ✅ 693 features passed, 15679 scenarios passed - `nox -e integration_tests`: ✅ 2013 tests passed, 0 failed - `nox -e coverage_report`: ✅ 96.52% (threshold: 96.5%) - `nox -e tdd_quality_gate`: ✅ Issue-capture PR detected for bug #11121 - `nox -e e2e_tests`: Pre-existing failure on master (unrelated) ## Closes Closes #11120 ## Unblocks Bug fix issue #11121 is now unblocked. Once the fix is merged, `@tdd_expected_fail` is removed and these scenarios become permanent regression guards.
hurui200320 added this to the v3.2.0 milestone 2026-05-11 09:18:58 +00:00
hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 176e1c3a5f
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / unit_tests (pull_request) Has started running
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 55s
CI / push-validation (pull_request) Successful in 1m0s
CI / build (pull_request) Successful in 1m53s
CI / benchmark-regression (pull_request) Failing after 2m12s
CI / typecheck (pull_request) Successful in 2m21s
CI / quality (pull_request) Successful in 2m20s
CI / lint (pull_request) Failing after 2m29s
CI / security (pull_request) Successful in 2m39s
CI / integration_tests (pull_request) Successful in 4m42s
CI / e2e_tests (pull_request) Successful in 4m43s
to a297a385aa
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 46s
CI / push-validation (pull_request) Successful in 23s
CI / build (pull_request) Successful in 1m10s
CI / lint (pull_request) Successful in 1m20s
CI / benchmark-regression (pull_request) Failing after 1m23s
CI / quality (pull_request) Successful in 1m25s
CI / typecheck (pull_request) Successful in 1m34s
CI / security (pull_request) Successful in 1m36s
CI / e2e_tests (pull_request) Successful in 4m15s
CI / integration_tests (pull_request) Successful in 4m17s
CI / unit_tests (pull_request) Successful in 4m52s
CI / docker (pull_request) Successful in 1m27s
CI / coverage (pull_request) Successful in 10m41s
CI / status-check (pull_request) Successful in 3s
2026-05-11 09:26:50 +00:00
Compare
hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from a297a385aa
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 46s
CI / push-validation (pull_request) Successful in 23s
CI / build (pull_request) Successful in 1m10s
CI / lint (pull_request) Successful in 1m20s
CI / benchmark-regression (pull_request) Failing after 1m23s
CI / quality (pull_request) Successful in 1m25s
CI / typecheck (pull_request) Successful in 1m34s
CI / security (pull_request) Successful in 1m36s
CI / e2e_tests (pull_request) Successful in 4m15s
CI / integration_tests (pull_request) Successful in 4m17s
CI / unit_tests (pull_request) Successful in 4m52s
CI / docker (pull_request) Successful in 1m27s
CI / coverage (pull_request) Successful in 10m41s
CI / status-check (pull_request) Successful in 3s
to 963aa77647
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 45s
CI / push-validation (pull_request) Successful in 45s
CI / build (pull_request) Successful in 1m2s
CI / lint (pull_request) Successful in 1m10s
CI / benchmark-regression (pull_request) Failing after 1m34s
CI / quality (pull_request) Successful in 1m46s
CI / security (pull_request) Successful in 1m47s
CI / typecheck (pull_request) Successful in 1m46s
CI / integration_tests (pull_request) Successful in 3m42s
CI / e2e_tests (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 5m18s
CI / docker (pull_request) Successful in 1m29s
CI / coverage (pull_request) Successful in 10m32s
CI / status-check (pull_request) Successful in 3s
2026-05-11 10:17:34 +00:00
Compare
hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-05-11 14:13:11 +00:00
HAL9001 requested changes 2026-05-11 22:47:55 +00:00
Dismissed
HAL9001 left a comment

CI Checks Required

This PR cannot be approved until all CI checks are passing. Currently, no CI checks have been reported as passing for this PR — the CI status is failing.

Per company policy, all CI gates must pass before a PR can be approved and merged. The required gates are:

  • lint — code style and formatting
  • typecheck — static type checking (Pyright)
  • security — security scanning
  • unit_tests — BDD unit test suite (Behave)
  • coverage — minimum 97% coverage threshold

Please ensure:

  1. CI is correctly configured for this branch
  2. All required CI gates are passing
  3. Any test failures are resolved before re-requesting review

A full code review will be conducted once CI checks are in place and passing.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## CI Checks Required This PR cannot be approved until all CI checks are passing. Currently, no CI checks have been reported as passing for this PR — the CI status is **failing**. Per company policy, all CI gates must pass before a PR can be approved and merged. The required gates are: - `lint` — code style and formatting - `typecheck` — static type checking (Pyright) - `security` — security scanning - `unit_tests` — BDD unit test suite (Behave) - `coverage` — minimum 97% coverage threshold Please ensure: 1. CI is correctly configured for this branch 2. All required CI gates are passing 3. Any test failures are resolved before re-requesting review A full code review will be conducted once CI checks are in place and passing. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 963aa77647
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 45s
CI / push-validation (pull_request) Successful in 45s
CI / build (pull_request) Successful in 1m2s
CI / lint (pull_request) Successful in 1m10s
CI / benchmark-regression (pull_request) Failing after 1m34s
CI / quality (pull_request) Successful in 1m46s
CI / security (pull_request) Successful in 1m47s
CI / typecheck (pull_request) Successful in 1m46s
CI / integration_tests (pull_request) Successful in 3m42s
CI / e2e_tests (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 5m18s
CI / docker (pull_request) Successful in 1m29s
CI / coverage (pull_request) Successful in 10m32s
CI / status-check (pull_request) Successful in 3s
to 68ad3c1ed6
Some checks failed
CI / push-validation (pull_request) Successful in 50s
CI / helm (pull_request) Successful in 53s
CI / build (pull_request) Successful in 1m15s
CI / tdd_quality_gate (pull_request) Failing after 1m23s
CI / lint (pull_request) Successful in 1m36s
CI / quality (pull_request) Successful in 1m35s
CI / typecheck (pull_request) Successful in 1m38s
CI / security (pull_request) Successful in 2m16s
CI / e2e_tests (pull_request) Successful in 4m17s
CI / integration_tests (pull_request) Failing after 6m54s
CI / unit_tests (pull_request) Successful in 9m12s
CI / docker (pull_request) Successful in 1m46s
CI / coverage (pull_request) Successful in 12m29s
CI / status-check (pull_request) Failing after 4s
2026-05-12 03:09:12 +00:00
Compare
HAL9001 left a comment

CI Failure — Blocking Merge

This PR has failing CI checks that must be resolved before it can be approved and merged.

Failing Checks

Check Status Details
CI / tdd_quality_gate Failing Failing after 1m23s
CI / integration_tests Failing Failing after 6m54s

Action Required

Per company policy, all CI gates must pass before a PR can be approved and merged. Please investigate and fix both failing checks:

  • tdd_quality_gate: This gate enforces TDD tag rules (@tdd_issue_N must exist, @tdd_expected_fail must be present on issue-capture tests, assertion failure types must be used rather than runtime exceptions). Since this is a TDD issue-capture test PR, verify the tags are correctly applied and that the test uses AssertionError (not RuntimeError or similar) to produce its expected failure.
  • integration_tests: Review the Robot Framework integration test failure. The new TDD test may be interfering with the integration suite, or an existing integration test may have broken.

A full code review will be conducted once all CI checks are green.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## CI Failure — Blocking Merge This PR has failing CI checks that must be resolved before it can be approved and merged. ### Failing Checks | Check | Status | Details | |-------|--------|---------| | `CI / tdd_quality_gate` | ❌ Failing | Failing after 1m23s | | `CI / integration_tests` | ❌ Failing | Failing after 6m54s | ### Action Required Per company policy, **all CI gates must pass** before a PR can be approved and merged. Please investigate and fix both failing checks: - **`tdd_quality_gate`**: This gate enforces TDD tag rules (`@tdd_issue_N` must exist, `@tdd_expected_fail` must be present on issue-capture tests, assertion failure types must be used rather than runtime exceptions). Since this is a TDD issue-capture test PR, verify the tags are correctly applied and that the test uses `AssertionError` (not `RuntimeError` or similar) to produce its expected failure. - **`integration_tests`**: Review the Robot Framework integration test failure. The new TDD test may be interfering with the integration suite, or an existing integration test may have broken. A full code review will be conducted once all CI checks are green. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 68ad3c1ed6
Some checks failed
CI / push-validation (pull_request) Successful in 50s
CI / helm (pull_request) Successful in 53s
CI / build (pull_request) Successful in 1m15s
CI / tdd_quality_gate (pull_request) Failing after 1m23s
CI / lint (pull_request) Successful in 1m36s
CI / quality (pull_request) Successful in 1m35s
CI / typecheck (pull_request) Successful in 1m38s
CI / security (pull_request) Successful in 2m16s
CI / e2e_tests (pull_request) Successful in 4m17s
CI / integration_tests (pull_request) Failing after 6m54s
CI / unit_tests (pull_request) Successful in 9m12s
CI / docker (pull_request) Successful in 1m46s
CI / coverage (pull_request) Successful in 12m29s
CI / status-check (pull_request) Failing after 4s
to e23126b8a1
All checks were successful
CI / helm (pull_request) Successful in 42s
CI / push-validation (pull_request) Successful in 41s
CI / build (pull_request) Successful in 1m9s
CI / quality (pull_request) Successful in 1m19s
CI / lint (pull_request) Successful in 1m21s
CI / typecheck (pull_request) Successful in 1m25s
CI / tdd_quality_gate (pull_request) Successful in 1m25s
CI / security (pull_request) Successful in 1m39s
CI / e2e_tests (pull_request) Successful in 3m46s
CI / integration_tests (pull_request) Successful in 4m0s
CI / unit_tests (pull_request) Successful in 6m22s
CI / docker (pull_request) Successful in 1m24s
CI / coverage (pull_request) Successful in 9m47s
CI / status-check (pull_request) Successful in 2s
2026-05-12 04:54:05 +00:00
Compare
Author
Member

CI Fixes Applied

Both failing CI checks reported in the review have been resolved:

tdd_quality_gate

Root cause: The quality gate used @tdd_bug_N tag naming (mismatched with CONTRIBUTING.md's @tdd_issue_N) and had no logic to distinguish TDD issue-capture PRs (which add @tdd_expected_fail) from bug fix PRs (which remove it).

Fix:

  • scripts/tdd_quality_gate.py: Renamed @tdd_bug_N@tdd_issue_N throughout, added _diff_is_tdd_issue_capture() for issue-capture PR detection
  • Updated features/tdd_quality_gate.feature, features/steps/tdd_quality_gate_steps.py, robot/helper_tdd_quality_gate.py to match
  • Gate now correctly passes for this PR: "TDD issue-capture PR detected: adding expected-fail test for bug(s) [11121]"

integration_tests

Root cause: PlanGenerationGraph._should_retry() mutated state["retry_count"] in-place, but LangGraph conditional edge functions cannot persist state mutations. The retry_count never incremented, so validation failures caused infinite looping until the GraphRecursionError (~10k iterations).

Fix in src/cleveragents/agents/graphs/plan_generation.py:

  • Added _handle_retry() node that increments retry_count via proper state return
  • Made _should_retry() read-only (routes only, no state mutation)
  • Added routing: validate → should_retry → handle_retry → analyze_requirements
  • Updated robot/plan_generation_graph.robot (node count 4→5, removed in-place mutation assertion)

Both Workflow Invoke and Workflow Stream integration tests now pass. The 2 remaining failures on master (Plan Generation Graph.Workflow Invoke / Workflow Stream) were pre-existing and unrelated to this PR.

Current CI Status

All quality gates pass on this branch:

  • lint
  • typecheck
  • unit_tests (693 features, 15679 scenarios)
  • integration_tests (2013 tests, 0 failures)
  • coverage_report (96.52%)
  • tdd_quality_gate (issue-capture PR detected)
## CI Fixes Applied Both failing CI checks reported in the review have been resolved: ### `tdd_quality_gate` **Root cause**: The quality gate used `@tdd_bug_N` tag naming (mismatched with CONTRIBUTING.md's `@tdd_issue_N`) and had no logic to distinguish TDD issue-capture PRs (which *add* `@tdd_expected_fail`) from bug fix PRs (which *remove* it). **Fix**: - `scripts/tdd_quality_gate.py`: Renamed `@tdd_bug_N` → `@tdd_issue_N` throughout, added `_diff_is_tdd_issue_capture()` for issue-capture PR detection - Updated `features/tdd_quality_gate.feature`, `features/steps/tdd_quality_gate_steps.py`, `robot/helper_tdd_quality_gate.py` to match - Gate now correctly passes for this PR: "TDD issue-capture PR detected: adding expected-fail test for bug(s) [11121]" ### `integration_tests` **Root cause**: `PlanGenerationGraph._should_retry()` mutated `state["retry_count"]` in-place, but LangGraph conditional edge functions cannot persist state mutations. The retry_count never incremented, so validation failures caused infinite looping until the GraphRecursionError (~10k iterations). **Fix** in `src/cleveragents/agents/graphs/plan_generation.py`: - Added `_handle_retry()` node that increments retry_count via proper state return - Made `_should_retry()` read-only (routes only, no state mutation) - Added routing: `validate → should_retry → handle_retry → analyze_requirements` - Updated `robot/plan_generation_graph.robot` (node count 4→5, removed in-place mutation assertion) Both `Workflow Invoke` and `Workflow Stream` integration tests now pass. The 2 remaining failures on master (`Plan Generation Graph.Workflow Invoke` / `Workflow Stream`) were pre-existing and unrelated to this PR. ### Current CI Status All quality gates pass on this branch: - ✅ `lint` - ✅ `typecheck` - ✅ `unit_tests` (693 features, 15679 scenarios) - ✅ `integration_tests` (2013 tests, 0 failures) - ✅ `coverage_report` (96.52%) - ✅ `tdd_quality_gate` (issue-capture PR detected)
Author
Member

@HAL9000 @HAL9001 please re-review, now all CI passed

@HAL9000 @HAL9001 please re-review, now all CI passed
Author
Member

Closing this PR as superseded by PR #11127. The TDD test file features/tdd_cleanup_stale_destroys_execute_output.feature and its step definitions are now provided directly in PR #11127 without the @tdd_expected_fail tag, since the bug is now fixed.

The test scenarios from this PR have been incorporated (and expanded with an additional execute/queued scenario) in the fix PR.

Closing this PR as superseded by PR #11127. The TDD test file `features/tdd_cleanup_stale_destroys_execute_output.feature` and its step definitions are now provided directly in PR #11127 without the `@tdd_expected_fail` tag, since the bug is now fixed. The test scenarios from this PR have been incorporated (and expanded with an additional `execute/queued` scenario) in the fix PR.
Author
Member

Closing this PR as superseded by PR #11127. The TDD test file features/tdd_cleanup_stale_destroys_execute_output.feature and its step definitions are now provided directly in PR #11127 without the @tdd_expected_fail tag, since the bug is now fixed.

The test scenarios from this PR have been incorporated (and expanded with an additional execute/queued scenario) in the fix PR.

Closing this PR as superseded by PR #11127. The TDD test file `features/tdd_cleanup_stale_destroys_execute_output.feature` and its step definitions are now provided directly in PR #11127 without the `@tdd_expected_fail` tag, since the bug is now fixed. The test scenarios from this PR have been incorporated (and expanded with an additional `execute/queued` scenario) in the fix PR.
Author
Member

Closing this PR since the tdd commit is now part of PR !11127

Closing this PR since the tdd commit is now part of PR !11127
hurui200320 closed this pull request 2026-05-12 08:38:32 +00:00
hurui200320 deleted branch tdd/m3-cleanup-stale-destroys-execute-output 2026-05-12 11:31:04 +00:00
All checks were successful
CI / helm (pull_request) Successful in 42s
CI / push-validation (pull_request) Successful in 41s
CI / build (pull_request) Successful in 1m9s
Required
Details
CI / quality (pull_request) Successful in 1m19s
Required
Details
CI / lint (pull_request) Successful in 1m21s
Required
Details
CI / typecheck (pull_request) Successful in 1m25s
Required
Details
CI / tdd_quality_gate (pull_request) Successful in 1m25s
CI / security (pull_request) Successful in 1m39s
Required
Details
CI / e2e_tests (pull_request) Successful in 3m46s
CI / integration_tests (pull_request) Successful in 4m0s
Required
Details
CI / unit_tests (pull_request) Successful in 6m22s
Required
Details
CI / docker (pull_request) Successful in 1m24s
Required
Details
CI / coverage (pull_request) Successful in 9m47s
Required
Details
CI / status-check (pull_request) Successful in 2s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!11123
No description provided.