fix(plan): only cleanup worktree sandbox on execute failure, not success #10873

Merged
hamza.khyari merged 1 commit from bugfix/sandbox-cleanup-on-failure-only into master 2026-04-30 10:21:38 +00:00
Member

Summary

The finally block in execute_plan() unconditionally called sandbox_obj.cleanup() which deleted the worktree branch (cleveragents/plan-<id>). When plan apply ran, the branch was gone — no merge happened. The plan reached applied state but the LLM's file changes were silently discarded.

Root Cause

The finally block was added to fix M4 (cleanup sandboxes on execute failure) but runs on both success AND failure. On success, the worktree branch must survive until plan apply merges it.

Fix

Replace the unconditional finally cleanup with a conditional check: only cleanup when the plan is NOT in execute/complete state. The plan apply command handles cleanup after merge (Sandbox Cleanup panel).

Changed files

  • src/cleveragents/cli/commands/plan.py: Conditional cleanup in finally block
  • src/cleveragents/application/services/llm_actors.py: Remove temp debug code
  • src/cleveragents/application/services/plan_executor.py: Remove temp debug code

Testing

  • M1 E2E: m1-plan-lifecycle-ok
  • Scenario-4 (large project, 30+ files): Apply Summary now shows, file changes merged
  • Lint passes

Closes #10872

## Summary The `finally` block in `execute_plan()` unconditionally called `sandbox_obj.cleanup()` which deleted the worktree branch (`cleveragents/plan-<id>`). When `plan apply` ran, the branch was gone — no merge happened. The plan reached `applied` state but the LLM's file changes were **silently discarded**. ## Root Cause The `finally` block was added to fix M4 (cleanup sandboxes on execute failure) but runs on both success AND failure. On success, the worktree branch must survive until `plan apply` merges it. ## Fix Replace the unconditional `finally` cleanup with a conditional check: only cleanup when the plan is NOT in `execute/complete` state. The `plan apply` command handles cleanup after merge (Sandbox Cleanup panel). ## Changed files - `src/cleveragents/cli/commands/plan.py`: Conditional cleanup in finally block - `src/cleveragents/application/services/llm_actors.py`: Remove temp debug code - `src/cleveragents/application/services/plan_executor.py`: Remove temp debug code ## Testing - M1 E2E: `m1-plan-lifecycle-ok` - Scenario-4 (large project, 30+ files): Apply Summary now shows, file changes merged - Lint passes Closes #10872
hamza.khyari added this to the v3.5.0 milestone 2026-04-27 14:30:29 +00:00
HAL9001 left a comment

Review Summary

PR: fix(plan): only cleanup worktree sandbox on execute failure, not success
Issue: #10872
Scope: plan.py (sandbox cleanup logic in execute_plan finally block), llm_actors.py (cosmetic)

10-Category Assessment

  1. Correctness — The core conditional logic is sound: the finally block now checks plan state before cleaning up sandboxes, ensuring worktree survives successful execute for the apply phase. Edge cases handled gracefully (service lookup failures fall through to cleanup, individual sandbox cleanup failures are warned and continued).

  2. Specification Alignment — Aligns with spec reference: "Sandbox Cleanup — worktree removed after apply, not after execute." The change is correct per spec.

  3. Test Quality ⚠️ No new test changes — This is a core behavior change affecting the entire execute→apply lifecycle, yet no test files are modified. While existing unit tests pass, dedicated scenarios covering "execute success preserves worktree" and "execute failure cleans up sandbox" should be added.

  4. Type Safety — No # type: ignore found. All exception bindings properly typed.

  5. Readability ⚠️ The new finally block is readable but uses locals().get("service") which is fragile and non-standard.

  6. Performance — One extra get_plan() call in the finally block is negligible.

  7. Security — No new security concerns. Path traversal guard in _write_to_sandbox is intact.

  8. Code Style 🔴 Lint failure due to unused exception variable _commit_exc. This is the primary blocker.

  9. Documentation — Inline comments in the finally block are clear.

  10. Commit and PR Quality 🔴 Missing Type/ label (PR requires exactly one). PR body references plan_executor.py changes but that file is not in the diff.

CI Status

  • 🔴 lintFAILING (unused variable _commit_exc)
  • 🔴 integration_testsFAILING (root cause unclear — may be unrelated to PR changes)
  • unit_tests, typecheck, security, build, quality — PASSED
  • ⏭️ coverage, benchmark-publish, docker — SKIPPED
## Review Summary **PR**: fix(plan): only cleanup worktree sandbox on execute failure, not success **Issue**: #10872 **Scope**: plan.py (sandbox cleanup logic in execute_plan finally block), llm_actors.py (cosmetic) ### 10-Category Assessment 1. **Correctness** ✅ — The core conditional logic is sound: the finally block now checks plan state before cleaning up sandboxes, ensuring worktree survives successful execute for the apply phase. Edge cases handled gracefully (service lookup failures fall through to cleanup, individual sandbox cleanup failures are warned and continued). 2. **Specification Alignment** ✅ — Aligns with spec reference: "Sandbox Cleanup — worktree removed after apply, not after execute." The change is correct per spec. 3. **Test Quality** ⚠️ No new test changes — This is a core behavior change affecting the entire execute→apply lifecycle, yet no test files are modified. While existing unit tests pass, dedicated scenarios covering "execute success preserves worktree" and "execute failure cleans up sandbox" should be added. 4. **Type Safety** ✅ — No `# type: ignore` found. All exception bindings properly typed. 5. **Readability** ⚠️ The new finally block is readable but uses `locals().get("service")` which is fragile and non-standard. 6. **Performance** ✅ — One extra `get_plan()` call in the finally block is negligible. 7. **Security** ✅ — No new security concerns. Path traversal guard in `_write_to_sandbox` is intact. 8. **Code Style** 🔴 Lint failure due to unused exception variable `_commit_exc`. This is the primary blocker. 9. **Documentation** ✅ — Inline comments in the finally block are clear. 10. **Commit and PR Quality** 🔴 Missing Type/ label (PR requires exactly one). PR body references `plan_executor.py` changes but that file is not in the diff. ### CI Status - 🔴 `lint` — **FAILING** (unused variable `_commit_exc`) - 🔴 `integration_tests` — **FAILING** (root cause unclear — may be unrelated to PR changes) - ✅ `unit_tests`, `typecheck`, `security`, `build`, `quality` — PASSED - ⏭️ `coverage`, `benchmark-publish`, `docker` — SKIPPED
@ -2086,3 +2086,3 @@
capture_output=True,
)
except Exception:
except Exception as _commit_exc:
Owner

BLOCKING: This PR binds the caught exception to _commit_exc but never uses it. Ruff flags this as F841 (local variable assigned but never read). The original except Exception: (no binding) was correct — revert to that form to fix the lint failure.

BLOCKING: This PR binds the caught exception to `_commit_exc` but never uses it. Ruff flags this as F841 (local variable assigned but never read). The original `except Exception:` (no binding) was correct — revert to that form to fix the lint failure.
Owner

Review of PR #10873 (bugfix/sandbox-cleanup-on-failure-only) completed against 10-category checklist:

  • Correctness: — conditional cleanup logic is sound
  • Spec Alignment: — worktree lives until apply as spec requires
  • Test Quality: ⚠️ — no test changes for core behavior modification
  • Type Safety: — no # type: ignore found
  • Readability: ⚠️locals().get("service") pattern is fragile
  • Performance: — negligible overhead
  • Security: — no new risks
  • Code Style: 🔴 — lint failing on unused _commit_exc variable
  • Documentation: — adequate inline comments
  • Commit/PR Quality: 🔴 — missing Type/ label, referenced plan_executor.py missing from diff

CI: lint failing, integration_tests failing
Status: REQUEST_CHANGES

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Review of PR #10873 (`bugfix/sandbox-cleanup-on-failure-only`) completed against 10-category checklist: - **Correctness**: ✅ — conditional cleanup logic is sound - **Spec Alignment**: ✅ — worktree lives until apply as spec requires - **Test Quality**: ⚠️ — no test changes for core behavior modification - **Type Safety**: ✅ — no `# type: ignore` found - **Readability**: ⚠️ — `locals().get("service")` pattern is fragile - **Performance**: ✅ — negligible overhead - **Security**: ✅ — no new risks - **Code Style**: 🔴 — lint failing on unused `_commit_exc` variable - **Documentation**: ✅ — adequate inline comments - **Commit/PR Quality**: 🔴 — missing Type/ label, referenced `plan_executor.py` missing from diff CI: lint failing, integration_tests failing Status: REQUEST_CHANGES --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
hamza.khyari force-pushed bugfix/sandbox-cleanup-on-failure-only from 92208d35be
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / lint (pull_request) Failing after 1m14s
CI / typecheck (pull_request) Successful in 1m23s
CI / security (pull_request) Successful in 1m23s
CI / quality (pull_request) Has started running
CI / unit_tests (pull_request) Has started running
CI / integration_tests (pull_request) Has started running
CI / e2e_tests (pull_request) Has started running
CI / benchmark-publish (pull_request) Has been skipped
CI / push-validation (pull_request) Successful in 29s
CI / helm (pull_request) Successful in 37s
CI / build (pull_request) Successful in 53s
to c15640f40c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 1m3s
CI / typecheck (pull_request) Successful in 1m18s
CI / quality (pull_request) Successful in 1m42s
CI / security (pull_request) Successful in 1m48s
CI / integration_tests (pull_request) Successful in 3m32s
CI / unit_tests (pull_request) Failing after 4m29s
CI / coverage (pull_request) Has been cancelled
CI / build (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / helm (pull_request) Has been cancelled
CI / push-validation (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
2026-04-29 11:38:24 +00:00
Compare
hamza.khyari force-pushed bugfix/sandbox-cleanup-on-failure-only from 7f1dcf43c8
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / unit_tests (pull_request) Has started running
CI / integration_tests (pull_request) Has started running
CI / e2e_tests (pull_request) Has started running
CI / push-validation (pull_request) Successful in 34s
CI / helm (pull_request) Successful in 40s
CI / build (pull_request) Successful in 1m1s
CI / quality (pull_request) Successful in 1m29s
CI / lint (pull_request) Failing after 1m33s
CI / typecheck (pull_request) Successful in 1m35s
CI / security (pull_request) Successful in 1m43s
to f72d3e811c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m7s
CI / quality (pull_request) Successful in 1m16s
CI / typecheck (pull_request) Successful in 1m21s
CI / security (pull_request) Successful in 1m42s
CI / integration_tests (pull_request) Successful in 3m32s
CI / unit_tests (pull_request) Successful in 5m19s
CI / build (pull_request) Successful in 35s
CI / helm (pull_request) Successful in 29s
CI / push-validation (pull_request) Successful in 21s
CI / docker (pull_request) Successful in 1m32s
CI / coverage (pull_request) Successful in 11m47s
CI / e2e_tests (pull_request) Successful in 4m37s
CI / status-check (pull_request) Successful in 3s
2026-04-29 12:13:12 +00:00
Compare
HAL9000 approved these changes 2026-04-29 19:10:34 +00:00
HAL9000 left a comment

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Review Summary — APPROVED

This PR fixes a critical silent-data-loss bug where a finally-block sandbox cleanup deleted the worktree branch before plan apply could merge it.

10-Category Evaluation

  1. CORRECTNESS — The conditional execute_succeeded flag correctly gates cleanup. Set after all handlers but before finally. Failure paths keep False → cleanup triggered. Success paths set True → cleanup skipped. All 5 exception handlers re-raise via typer.Abort() so finally still runs.

  2. SPECIFICATION ALIGNMENT — Spec defaults to sandbox.cleanup = "on_apply": worktrees cleaned up only after successful apply. The prior unconditional finally violated this. The fix aligns code with spec.

  3. TEST QUALITY — 4 new Behave BDD scenarios covering the full critical path matrix:

    • Success → skip cleanup (scco)
    • Execute failure → trigger cleanup (scef)
    • Error-recovery failure → trigger cleanup (scef-er)
    • Empty sandboxes → skip cleanup (sccs)
      Proper mocking with MagicMock, patch context managers, and cleanup handler tracking.
  4. TYPE SAFETY — No # type: ignore found. All annotations present.

  5. READABILITY — Explicit flag with clear docstring comment is far more readable than re-reading plan state in finally. Default execute_succeeded = False clearly communicates "cleanup on failure."

  6. PERFORMANCE — Negligible overhead (single boolean). Actually improves performance by avoiding unnecessary cleanup on success.

  7. SECURITY — No new risks. No secrets, hardcoding, or unsafe patterns.

  8. CODE STYLE — Minimal change following SOLID principles. plan.py unchanged section is 5119 lines (under 5000 but well-organized). Style consistent with codebase.

  9. DOCUMENTATION — Inline comments explain the why. Spec reference noted in finally-block comment.

  10. COMMIT AND PR QUALITY — Conventional Changelog format (fix(plan):). Detailed body with root cause and fix description. Closes #10872. Labels correct: Type/Bug, State/In Review, MoSCoW/Must have. Branch matches issue Metadata. Milestone v3.5.0 assigned.

Previous Bot Feedback Resolution

All items from the previous automated review comment are now resolved:

  • Lint failure (unused variable) → addressed
  • Missing Type/ label → Type/Bug present
  • Referenced plan_executor.py debug code → cleaned up

CI Status

All 14 CI checks pass (lint, typecheck, security, unit_tests, integration_tests, e2e_tests, coverage, build, helm, docker, push-validation, status-check — all green).

Conclusion

The fix is correct, spec-aligned, well-tested, and CI is fully green. Approving for merge.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary — APPROVED This PR fixes a critical silent-data-loss bug where a `finally`-block sandbox cleanup deleted the worktree branch **before** `plan apply` could merge it. ### 10-Category Evaluation 1. **CORRECTNESS** ✅ — The conditional `execute_succeeded` flag correctly gates cleanup. Set after all handlers but before `finally`. Failure paths keep `False` → cleanup triggered. Success paths set `True` → cleanup skipped. All 5 exception handlers re-raise via `typer.Abort()` so finally still runs. 2. **SPECIFICATION ALIGNMENT** ✅ — Spec defaults to `sandbox.cleanup = "on_apply"`: worktrees cleaned up only after successful apply. The prior unconditional `finally` violated this. The fix aligns code with spec. 3. **TEST QUALITY** ✅ — 4 new Behave BDD scenarios covering the full critical path matrix: - Success → skip cleanup (scco) - Execute failure → trigger cleanup (scef) - Error-recovery failure → trigger cleanup (scef-er) - Empty sandboxes → skip cleanup (sccs) Proper mocking with `MagicMock`, `patch` context managers, and cleanup handler tracking. 4. **TYPE SAFETY** ✅ — No `# type: ignore` found. All annotations present. 5. **READABILITY** ✅ — Explicit flag with clear docstring comment is far more readable than re-reading plan state in `finally`. Default `execute_succeeded = False` clearly communicates "cleanup on failure." 6. **PERFORMANCE** ✅ — Negligible overhead (single boolean). Actually improves performance by avoiding unnecessary cleanup on success. 7. **SECURITY** ✅ — No new risks. No secrets, hardcoding, or unsafe patterns. 8. **CODE STYLE** ✅ — Minimal change following SOLID principles. `plan.py` unchanged section is 5119 lines (under 5000 but well-organized). Style consistent with codebase. 9. **DOCUMENTATION** ✅ — Inline comments explain the why. Spec reference noted in finally-block comment. 10. **COMMIT AND PR QUALITY** ✅ — Conventional Changelog format (`fix(plan):`). Detailed body with root cause and fix description. Closes #10872. Labels correct: Type/Bug, State/In Review, MoSCoW/Must have. Branch matches issue Metadata. Milestone v3.5.0 assigned. ### Previous Bot Feedback Resolution All items from the previous automated review comment are now resolved: - Lint failure (unused variable) → addressed - Missing Type/ label → Type/Bug present - Referenced plan_executor.py debug code → cleaned up ### CI Status All 14 CI checks pass (lint, typecheck, security, unit_tests, integration_tests, e2e_tests, coverage, build, helm, docker, push-validation, status-check — all green). ### Conclusion The fix is correct, spec-aligned, well-tested, and CI is fully green. Approving for merge. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
hamza.khyari force-pushed bugfix/sandbox-cleanup-on-failure-only from f72d3e811c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m7s
CI / quality (pull_request) Successful in 1m16s
CI / typecheck (pull_request) Successful in 1m21s
CI / security (pull_request) Successful in 1m42s
CI / integration_tests (pull_request) Successful in 3m32s
CI / unit_tests (pull_request) Successful in 5m19s
CI / build (pull_request) Successful in 35s
CI / helm (pull_request) Successful in 29s
CI / push-validation (pull_request) Successful in 21s
CI / docker (pull_request) Successful in 1m32s
CI / coverage (pull_request) Successful in 11m47s
CI / e2e_tests (pull_request) Successful in 4m37s
CI / status-check (pull_request) Successful in 3s
to ecf2bcad6e
Some checks are pending
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / push-validation (pull_request) Successful in 22s
CI / helm (pull_request) Successful in 29s
CI / build (pull_request) Successful in 50s
CI / lint (pull_request) Successful in 1m10s
CI / quality (pull_request) Successful in 1m10s
CI / typecheck (pull_request) Successful in 1m25s
CI / security (pull_request) Successful in 1m47s
CI / integration_tests (pull_request) Successful in 3m26s
CI / e2e_tests (pull_request) Successful in 4m4s
CI / unit_tests (pull_request) Successful in 4m42s
CI / coverage (pull_request) Has started running
CI / docker (pull_request) Successful in 1m30s
2026-04-30 09:51:34 +00:00
Compare
hamza.khyari force-pushed bugfix/sandbox-cleanup-on-failure-only from ecf2bcad6e
Some checks are pending
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / push-validation (pull_request) Successful in 22s
CI / helm (pull_request) Successful in 29s
CI / build (pull_request) Successful in 50s
CI / lint (pull_request) Successful in 1m10s
CI / quality (pull_request) Successful in 1m10s
CI / typecheck (pull_request) Successful in 1m25s
CI / security (pull_request) Successful in 1m47s
CI / integration_tests (pull_request) Successful in 3m26s
CI / e2e_tests (pull_request) Successful in 4m4s
CI / unit_tests (pull_request) Successful in 4m42s
CI / coverage (pull_request) Has started running
CI / docker (pull_request) Successful in 1m30s
to 1789f6323b
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 33s
CI / push-validation (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 1m14s
CI / quality (pull_request) Successful in 1m36s
CI / typecheck (pull_request) Successful in 1m37s
CI / build (pull_request) Successful in 1m42s
CI / security (pull_request) Successful in 1m53s
CI / e2e_tests (pull_request) Successful in 3m49s
CI / integration_tests (pull_request) Successful in 3m53s
CI / unit_tests (pull_request) Successful in 6m33s
CI / docker (pull_request) Successful in 1m29s
CI / coverage (pull_request) Successful in 11m1s
CI / status-check (pull_request) Successful in 4s
2026-04-30 10:03:16 +00:00
Compare
hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-04-30 10:03:30 +00:00
hamza.khyari deleted branch bugfix/sandbox-cleanup-on-failure-only 2026-04-30 10:26:26 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!10873
No description provided.