test(plan): add tdd issue-capture test for cleanup_stale destroying execute output before apply #11123

2026-05-11T09:18:45Z

hurui200320 commented

2026-05-11 09:18:45 +00:00

Summary

This PR adds the TDD issue-capture test for bug #11121 per the project's Bug Fix Workflow. A failing test must be committed to master before the fix is implemented.

Additionally, this PR fixes two CI failures that were blocking the PR:

tdd_quality_gate: Renamed tag convention from @tdd_bug_N to @tdd_issue_N (matching CONTRIBUTING.md) and added TDD issue-capture PR detection so the gate correctly recognizes PRs that add @tdd_expected_fail (rather than requiring its removal as a bug fix PR would).
integration_tests: Fixed a GraphRecursionError in PlanGenerationGraph caused by _should_retry() attempting in-place state mutation in a LangGraph conditional edge function (which cannot persist mutations). Replaced with a proper _handle_retry() node that increments retry_count via state returns.

What was changed

TDD issue-capture test (issue #11120)

Added two Behave scenarios in features/tdd_cleanup_stale_destroys_execute_output.feature with step definitions in features/steps/tdd_cleanup_stale_destroys_execute_output_steps.py.

PlanGenerationGraph fix

src/cleveragents/agents/graphs/plan_generation.py: Removed in-place state mutation from _should_retry(), added _handle_retry() node, updated _build_graph() to include the 5th node with proper routing.

TDD quality gate fix

scripts/tdd_quality_gate.py: Renamed @tdd_bug_N → @tdd_issue_N, added _diff_is_tdd_issue_capture() for detecting issue-capture PRs, updated run_quality_gate() and main().
Updated all related tests and test helpers (features/tdd_quality_gate.feature, features/steps/tdd_quality_gate_steps.py, robot/helper_tdd_quality_gate.py, features/steps/plan_generation_uncovered_lines_steps.py, robot/plan_generation_graph.robot).

The Bug Being Captured

_create_sandbox_for_plan() in plan.py calls GitWorktreeSandbox.cleanup_stale() unconditionally on every agents plan execute invocation — including when the plan is already in execute/complete state (awaiting apply). This silently destroys the cleveragents/plan-<id> git branch that holds all execution output, causing agents plan apply to find zero artifacts and produce an empty changeset.

Root cause location: plan.py → _create_sandbox_for_plan() → GitWorktreeSandbox.cleanup_stale(resource.location, plan_id) (unconditional call)

Quality Gates

nox -e lint: ✅ All checks passed
nox -e typecheck: ✅ 0 errors
nox -e unit_tests: ✅ 693 features passed, 15679 scenarios passed
nox -e integration_tests: ✅ 2013 tests passed, 0 failed
nox -e coverage_report: ✅ 96.52% (threshold: 96.5%)
nox -e tdd_quality_gate: ✅ Issue-capture PR detected for bug #11121
nox -e e2e_tests: Pre-existing failure on master (unrelated)

Closes

Closes #11120

Unblocks

Bug fix issue #11121 is now unblocked. Once the fix is merged, @tdd_expected_fail is removed and these scenarios become permanent regression guards.

## Summary This PR adds the TDD issue-capture test for bug #11121 per the project's Bug Fix Workflow. A failing test must be committed to `master` before the fix is implemented. Additionally, this PR fixes two CI failures that were blocking the PR: - **`tdd_quality_gate`**: Renamed tag convention from `@tdd_bug_N` to `@tdd_issue_N` (matching CONTRIBUTING.md) and added TDD issue-capture PR detection so the gate correctly recognizes PRs that *add* `@tdd_expected_fail` (rather than requiring its removal as a bug fix PR would). - **`integration_tests`**: Fixed a `GraphRecursionError` in `PlanGenerationGraph` caused by `_should_retry()` attempting in-place state mutation in a LangGraph conditional edge function (which cannot persist mutations). Replaced with a proper `_handle_retry()` node that increments `retry_count` via state returns. ## What was changed ### TDD issue-capture test (issue #11120) Added two Behave scenarios in `features/tdd_cleanup_stale_destroys_execute_output.feature` with step definitions in `features/steps/tdd_cleanup_stale_destroys_execute_output_steps.py`. ### PlanGenerationGraph fix - `src/cleveragents/agents/graphs/plan_generation.py`: Removed in-place state mutation from `_should_retry()`, added `_handle_retry()` node, updated `_build_graph()` to include the 5th node with proper routing. ### TDD quality gate fix - `scripts/tdd_quality_gate.py`: Renamed `@tdd_bug_N` → `@tdd_issue_N`, added `_diff_is_tdd_issue_capture()` for detecting issue-capture PRs, updated `run_quality_gate()` and `main()`. - Updated all related tests and test helpers (`features/tdd_quality_gate.feature`, `features/steps/tdd_quality_gate_steps.py`, `robot/helper_tdd_quality_gate.py`, `features/steps/plan_generation_uncovered_lines_steps.py`, `robot/plan_generation_graph.robot`). ## The Bug Being Captured `_create_sandbox_for_plan()` in `plan.py` calls `GitWorktreeSandbox.cleanup_stale()` unconditionally on every `agents plan execute` invocation — including when the plan is already in `execute/complete` state (awaiting apply). This silently destroys the `cleveragents/plan-<id>` git branch that holds all execution output, causing `agents plan apply` to find zero artifacts and produce an empty changeset. **Root cause location:** `plan.py` → `_create_sandbox_for_plan()` → `GitWorktreeSandbox.cleanup_stale(resource.location, plan_id)` (unconditional call) ## Quality Gates - `nox -e lint`: ✅ All checks passed - `nox -e typecheck`: ✅ 0 errors - `nox -e unit_tests`: ✅ 693 features passed, 15679 scenarios passed - `nox -e integration_tests`: ✅ 2013 tests passed, 0 failed - `nox -e coverage_report`: ✅ 96.52% (threshold: 96.5%) - `nox -e tdd_quality_gate`: ✅ Issue-capture PR detected for bug #11121 - `nox -e e2e_tests`: Pre-existing failure on master (unrelated) ## Closes Closes #11120 ## Unblocks Bug fix issue #11121 is now unblocked. Once the fix is merged, `@tdd_expected_fail` is removed and these scenarios become permanent regression guards.

hurui200320 added this to the v3.2.0 milestone 2026-05-11 09:18:58 +00:00

hurui200320 added the

Type

Testing

label 2026-05-11 09:19:27 +00:00

hurui200320 added a new dependency 2026-05-11 09:24:33 +00:00

#11120 TDD: cleanup_stale destroys git worktree branch on re-invoked execute, causing plan apply to find zero artifacts

hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 176e1c3a5f to a297a385aa

2026-05-11 09:26:50 +00:00

Compare

hurui200320 requested review from HAL9000 2026-05-11 09:26:57 +00:00

hurui200320 requested review from HAL9001 2026-05-11 09:26:57 +00:00

hurui200320 added the

labels 2026-05-11 09:27:35 +00:00

hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from a297a385aa to 963aa77647

2026-05-11 10:17:34 +00:00

Compare

hurui200320 added a new dependency 2026-05-11 14:05:01 +00:00

#11127 fix(plan): guard cleanup_stale against execute/processing and execute/complete plans

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-05-11 14:13:11 +00:00

HAL9001 requested changes 2026-05-11 22:47:55 +00:00

Dismissed

HAL9001 left a comment

CI Checks Required

This PR cannot be approved until all CI checks are passing. Currently, no CI checks have been reported as passing for this PR — the CI status is failing.

Per company policy, all CI gates must pass before a PR can be approved and merged. The required gates are:

lint — code style and formatting
typecheck — static type checking (Pyright)
security — security scanning
unit_tests — BDD unit test suite (Behave)
coverage — minimum 97% coverage threshold

Please ensure:

CI is correctly configured for this branch
All required CI gates are passing
Any test failures are resolved before re-requesting review

A full code review will be conducted once CI checks are in place and passing.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## CI Checks Required This PR cannot be approved until all CI checks are passing. Currently, no CI checks have been reported as passing for this PR — the CI status is **failing**. Per company policy, all CI gates must pass before a PR can be approved and merged. The required gates are: - `lint` — code style and formatting - `typecheck` — static type checking (Pyright) - `security` — security scanning - `unit_tests` — BDD unit test suite (Behave) - `coverage` — minimum 97% coverage threshold Please ensure: 1. CI is correctly configured for this branch 2. All required CI gates are passing 3. Any test failures are resolved before re-requesting review A full code review will be conducted once CI checks are in place and passing. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-05-11 22:47:59 +00:00

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 referenced this pull request

2026-05-12 02:45:32 +00:00

fix(plan): guard cleanup_stale against execute/processing and execute/complete plans #11127

hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 963aa77647 to 68ad3c1ed6

2026-05-12 03:09:12 +00:00

Compare

hurui200320 requested review from HAL9001 2026-05-12 03:09:25 +00:00

HAL9001 requested changes 2026-05-12 03:26:45 +00:00

HAL9001 left a comment

CI Failure — Blocking Merge

This PR has failing CI checks that must be resolved before it can be approved and merged.

Failing Checks

Check	Status	Details
`CI / tdd_quality_gate`	❌ Failing	Failing after 1m23s
`CI / integration_tests`	❌ Failing	Failing after 6m54s

Action Required

Per company policy, all CI gates must pass before a PR can be approved and merged. Please investigate and fix both failing checks:

tdd_quality_gate: This gate enforces TDD tag rules (@tdd_issue_N must exist, @tdd_expected_fail must be present on issue-capture tests, assertion failure types must be used rather than runtime exceptions). Since this is a TDD issue-capture test PR, verify the tags are correctly applied and that the test uses AssertionError (not RuntimeError or similar) to produce its expected failure.
integration_tests: Review the Robot Framework integration test failure. The new TDD test may be interfering with the integration suite, or an existing integration test may have broken.

A full code review will be conducted once all CI checks are green.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## CI Failure — Blocking Merge This PR has failing CI checks that must be resolved before it can be approved and merged. ### Failing Checks | Check | Status | Details | |-------|--------|---------| | `CI / tdd_quality_gate` | ❌ Failing | Failing after 1m23s | | `CI / integration_tests` | ❌ Failing | Failing after 6m54s | ### Action Required Per company policy, **all CI gates must pass** before a PR can be approved and merged. Please investigate and fix both failing checks: - **`tdd_quality_gate`**: This gate enforces TDD tag rules (`@tdd_issue_N` must exist, `@tdd_expected_fail` must be present on issue-capture tests, assertion failure types must be used rather than runtime exceptions). Since this is a TDD issue-capture test PR, verify the tags are correctly applied and that the test uses `AssertionError` (not `RuntimeError` or similar) to produce its expected failure. - **`integration_tests`**: Review the Robot Framework integration test failure. The new TDD test may be interfering with the integration suite, or an existing integration test may have broken. A full code review will be conducted once all CI checks are green. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

hurui200320 force-pushed tdd/m3-cleanup-stale-destroys-execute-output from 68ad3c1ed6 to e23126b8a1

2026-05-12 04:54:05 +00:00

Compare

hurui200320 referenced this pull request

2026-05-12 04:55:11 +00:00

TDD: cleanup_stale destroys git worktree branch on re-invoked execute, causing plan apply to find zero artifacts #11120

hurui200320 commented

2026-05-12 04:55:38 +00:00

CI Fixes Applied

Both failing CI checks reported in the review have been resolved:

`tdd_quality_gate`

Root cause: The quality gate used @tdd_bug_N tag naming (mismatched with CONTRIBUTING.md's @tdd_issue_N) and had no logic to distinguish TDD issue-capture PRs (which add @tdd_expected_fail) from bug fix PRs (which remove it).

Fix:

scripts/tdd_quality_gate.py: Renamed @tdd_bug_N → @tdd_issue_N throughout, added _diff_is_tdd_issue_capture() for issue-capture PR detection
Updated features/tdd_quality_gate.feature, features/steps/tdd_quality_gate_steps.py, robot/helper_tdd_quality_gate.py to match
Gate now correctly passes for this PR: "TDD issue-capture PR detected: adding expected-fail test for bug(s) [11121]"

`integration_tests`

Root cause: PlanGenerationGraph._should_retry() mutated state["retry_count"] in-place, but LangGraph conditional edge functions cannot persist state mutations. The retry_count never incremented, so validation failures caused infinite looping until the GraphRecursionError (~10k iterations).

Fix in src/cleveragents/agents/graphs/plan_generation.py:

Added _handle_retry() node that increments retry_count via proper state return
Made _should_retry() read-only (routes only, no state mutation)
Added routing: validate → should_retry → handle_retry → analyze_requirements
Updated robot/plan_generation_graph.robot (node count 4→5, removed in-place mutation assertion)

Both Workflow Invoke and Workflow Stream integration tests now pass. The 2 remaining failures on master (Plan Generation Graph.Workflow Invoke / Workflow Stream) were pre-existing and unrelated to this PR.

Current CI Status

All quality gates pass on this branch:

✅ lint
✅ typecheck
✅ unit_tests (693 features, 15679 scenarios)
✅ integration_tests (2013 tests, 0 failures)
✅ coverage_report (96.52%)
✅ tdd_quality_gate (issue-capture PR detected)

## CI Fixes Applied Both failing CI checks reported in the review have been resolved: ### `tdd_quality_gate` **Root cause**: The quality gate used `@tdd_bug_N` tag naming (mismatched with CONTRIBUTING.md's `@tdd_issue_N`) and had no logic to distinguish TDD issue-capture PRs (which *add* `@tdd_expected_fail`) from bug fix PRs (which *remove* it). **Fix**: - `scripts/tdd_quality_gate.py`: Renamed `@tdd_bug_N` → `@tdd_issue_N` throughout, added `_diff_is_tdd_issue_capture()` for issue-capture PR detection - Updated `features/tdd_quality_gate.feature`, `features/steps/tdd_quality_gate_steps.py`, `robot/helper_tdd_quality_gate.py` to match - Gate now correctly passes for this PR: "TDD issue-capture PR detected: adding expected-fail test for bug(s) [11121]" ### `integration_tests` **Root cause**: `PlanGenerationGraph._should_retry()` mutated `state["retry_count"]` in-place, but LangGraph conditional edge functions cannot persist state mutations. The retry_count never incremented, so validation failures caused infinite looping until the GraphRecursionError (~10k iterations). **Fix** in `src/cleveragents/agents/graphs/plan_generation.py`: - Added `_handle_retry()` node that increments retry_count via proper state return - Made `_should_retry()` read-only (routes only, no state mutation) - Added routing: `validate → should_retry → handle_retry → analyze_requirements` - Updated `robot/plan_generation_graph.robot` (node count 4→5, removed in-place mutation assertion) Both `Workflow Invoke` and `Workflow Stream` integration tests now pass. The 2 remaining failures on master (`Plan Generation Graph.Workflow Invoke` / `Workflow Stream`) were pre-existing and unrelated to this PR. ### Current CI Status All quality gates pass on this branch: - ✅ `lint` - ✅ `typecheck` - ✅ `unit_tests` (693 features, 15679 scenarios) - ✅ `integration_tests` (2013 tests, 0 failures) - ✅ `coverage_report` (96.52%) - ✅ `tdd_quality_gate` (issue-capture PR detected)

hurui200320 requested review from HAL9001 2026-05-12 05:16:42 +00:00

hurui200320 referenced this pull request

2026-05-12 05:51:08 +00:00

fix(sandbox): preserve LLM-plan artifacts when cleanup_stale encounters committed branches (#11120) #11137

hurui200320 commented

2026-05-12 06:24:38 +00:00

@HAL9000 @HAL9001 please re-review, now all CI passed

hurui200320 commented

2026-05-12 06:58:18 +00:00

Closing this PR as superseded by PR #11127. The TDD test file features/tdd_cleanup_stale_destroys_execute_output.feature and its step definitions are now provided directly in PR #11127 without the @tdd_expected_fail tag, since the bug is now fixed.

The test scenarios from this PR have been incorporated (and expanded with an additional execute/queued scenario) in the fix PR.

Closing this PR as superseded by PR #11127. The TDD test file `features/tdd_cleanup_stale_destroys_execute_output.feature` and its step definitions are now provided directly in PR #11127 without the `@tdd_expected_fail` tag, since the bug is now fixed. The test scenarios from this PR have been incorporated (and expanded with an additional `execute/queued` scenario) in the fix PR.

hurui200320 commented

2026-05-12 06:58:18 +00:00

Closing this PR as superseded by PR #11127. The TDD test file features/tdd_cleanup_stale_destroys_execute_output.feature and its step definitions are now provided directly in PR #11127 without the @tdd_expected_fail tag, since the bug is now fixed.

The test scenarios from this PR have been incorporated (and expanded with an additional execute/queued scenario) in the fix PR.

Closing this PR as superseded by PR #11127. The TDD test file `features/tdd_cleanup_stale_destroys_execute_output.feature` and its step definitions are now provided directly in PR #11127 without the `@tdd_expected_fail` tag, since the bug is now fixed. The test scenarios from this PR have been incorporated (and expanded with an additional `execute/queued` scenario) in the fix PR.

hurui200320 closed this pull request

2026-05-12 06:58:25 +00:00

hurui200320 closed this pull request

2026-05-12 06:58:25 +00:00

hurui200320 reopened this pull request

2026-05-12 07:53:42 +00:00

hurui200320 commented

2026-05-12 08:38:23 +00:00

Closing this PR since the tdd commit is now part of PR !11127

hurui200320 closed this pull request

2026-05-12 08:38:32 +00:00

hurui200320 deleted branch tdd/m3-cleanup-stale-destroys-execute-output

2026-05-12 11:31:04 +00:00

CI / helm (pull_request) Successful in 42s

Details

CI / push-validation (pull_request) Successful in 41s

Details

CI / build (pull_request) Successful in 1m9s

Required

Details

CI / quality (pull_request) Successful in 1m19s

Required

Details

CI / lint (pull_request) Successful in 1m21s

Required

Details

CI / typecheck (pull_request) Successful in 1m25s

Required

Details

CI / tdd_quality_gate (pull_request) Successful in 1m25s

Details

CI / security (pull_request) Successful in 1m39s

Required

Details

CI / e2e_tests (pull_request) Successful in 3m46s

Details

CI / integration_tests (pull_request) Successful in 4m0s

Required

Details

CI / unit_tests (pull_request) Successful in 6m22s

Required

Details

CI / docker (pull_request) Successful in 1m24s

Required

Details

CI / coverage (pull_request) Successful in 9m47s

Required

Details

CI / status-check (pull_request) Successful in 2s

Details

Pull request closed

This pull request cannot be reopened because the branch was deleted.

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Blocks

#11120 TDD: cleanup_stale destroys git worktree branch on re-invoked execute, causing plan apply to find zero artifacts

cleveragents/cleveragents-core

#11127 fix(plan): guard cleanup_stale against execute/processing and execute/complete plans

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#11123