feat(plan): implement git worktree sandbox for execute and merge-based apply #5998

Merged
hamza.khyari merged 1 commit from feature/git-worktree-apply into master 2026-04-09 14:33:34 +00:00
Member

Summary

Implement spec-aligned git worktree sandbox for the plan execute/apply lifecycle (specification.md §13225-13276).

Execute phase: creates an isolated git worktree via GitWorktreeSandbox for the plan's linked git-checkout resource. LLM file output is written to the worktree and committed on branch cleveragents/plan-<plan_id> — no merge yet.

Apply phase: merges the worktree branch into the project's current branch via git merge. Prints spec-aligned panels:

  • Apply Summary: Plan ID, artifacts count, insertions/deletions, project name, applied-at timestamp
  • Sandbox Cleanup: worktree removed, branch merged to main
  • Next Steps: review git diff, commit changes
  • Footer: ✓ OK Changes applied

Non-git projects fall back to the original flat directory sandbox with shutil.copy2.

Also fixes

  • Context assembly failure (#4454): ContextFragment metadata values (detail_depth, relevance_score) must be strings, not int/float. Pydantic validation errors crashed the context assembler, leaving the LLM with zero file context.
  • Duplicate execute dispatch (#2265): A2A facade _handle_plan_execute now checks if the plan has already reached execute/apply phase before attempting a transition, eliminating the noisy "Invalid phase transition from execute to execute" error.
  • Assembly error logging: execute_context_assembly_failed warning now includes the actual error string.

Changed files

  • src/cleveragents/cli/commands/plan.py — sandbox creation, worktree apply, spec panels, facade notify reorder
  • src/cleveragents/a2a/facade.py — idempotent _handle_plan_execute
  • src/cleveragents/application/services/context_tier_hydrator.py — metadata type fix
  • src/cleveragents/application/services/llm_actors.py — error string in assembly warning
  • features/git_worktree_apply.feature + step file — 6 Behave scenarios
  • CHANGELOG.md — updated
  • 4 existing step files — added mocks for new sandbox functions

Testing

  • M1 E2E: m1-plan-lifecycle-ok
  • Scenario-1: full end-to-end with real LLM — calculator fixed, spec panels displayed, zero warnings
  • 6 new Behave scenarios for git worktree apply lifecycle

Closes #4454
Closes #2265

## Summary Implement spec-aligned git worktree sandbox for the plan execute/apply lifecycle (specification.md §13225-13276). **Execute phase**: creates an isolated git worktree via `GitWorktreeSandbox` for the plan's linked git-checkout resource. LLM file output is written to the worktree and committed on branch `cleveragents/plan-<plan_id>` — no merge yet. **Apply phase**: merges the worktree branch into the project's current branch via `git merge`. Prints spec-aligned panels: - **Apply Summary**: Plan ID, artifacts count, insertions/deletions, project name, applied-at timestamp - **Sandbox Cleanup**: worktree removed, branch merged to main - **Next Steps**: review git diff, commit changes - **Footer**: ✓ OK Changes applied Non-git projects fall back to the original flat directory sandbox with `shutil.copy2`. ## Also fixes - **Context assembly failure** (#4454): `ContextFragment` metadata values (`detail_depth`, `relevance_score`) must be strings, not int/float. Pydantic validation errors crashed the context assembler, leaving the LLM with zero file context. - **Duplicate execute dispatch** (#2265): A2A facade `_handle_plan_execute` now checks if the plan has already reached execute/apply phase before attempting a transition, eliminating the noisy "Invalid phase transition from execute to execute" error. - **Assembly error logging**: `execute_context_assembly_failed` warning now includes the actual error string. ## Changed files - `src/cleveragents/cli/commands/plan.py` — sandbox creation, worktree apply, spec panels, facade notify reorder - `src/cleveragents/a2a/facade.py` — idempotent `_handle_plan_execute` - `src/cleveragents/application/services/context_tier_hydrator.py` — metadata type fix - `src/cleveragents/application/services/llm_actors.py` — error string in assembly warning - `features/git_worktree_apply.feature` + step file — 6 Behave scenarios - `CHANGELOG.md` — updated - 4 existing step files — added mocks for new sandbox functions ## Testing - M1 E2E: `m1-plan-lifecycle-ok` - Scenario-1: full end-to-end with real LLM — calculator fixed, spec panels displayed, zero warnings - 6 new Behave scenarios for git worktree apply lifecycle Closes #4454 Closes #2265
Owner

Thank you for this contribution, @hamza.khyari. I have reviewed PR #5998 and the linked issue #4454.

Summary of Changes

This PR wires sandbox_root into PlanExecutor via _get_plan_executor() so that the execute phase writes LLM-generated file output to .cleveragents/sandbox/. Without this wiring, PlanExecutor received sandbox_root=None, causing LLM code changes to be discarded after execute completed.

Assessment

The change is targeted and addresses a clear gap: the sandbox wiring was missing, preventing the execute phase from persisting LLM output. The M1 E2E test confirmation (Applied 2 file(s) from sandbox to project.) demonstrates the fix works end-to-end.

Observations

  • The PR currently has no labels or milestone assigned. I will flag this for the project owner to assign appropriate labels.
  • The linked issue #4454 has a Needs Feedback label. Per project policy, issues with Needs Feedback are proposals awaiting human review. The PR closes this issue, which suggests the feedback has been addressed. Please confirm whether the Needs Feedback label on #4454 can be removed.
  • The PR description does not include quality gate results (lint, typecheck, unit_tests, integration_tests, coverage). Please confirm these pass before requesting review.

Next Steps

This PR will be reviewed by the implementation team once CI checks complete. I will monitor the CI status and provide updates.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for this contribution, @hamza.khyari. I have reviewed PR #5998 and the linked issue #4454. **Summary of Changes** This PR wires `sandbox_root` into `PlanExecutor` via `_get_plan_executor()` so that the execute phase writes LLM-generated file output to `.cleveragents/sandbox/`. Without this wiring, `PlanExecutor` received `sandbox_root=None`, causing LLM code changes to be discarded after execute completed. **Assessment** The change is targeted and addresses a clear gap: the sandbox wiring was missing, preventing the execute phase from persisting LLM output. The M1 E2E test confirmation (`Applied 2 file(s) from sandbox to project.`) demonstrates the fix works end-to-end. **Observations** - The PR currently has no labels or milestone assigned. I will flag this for the project owner to assign appropriate labels. - The linked issue #4454 has a `Needs Feedback` label. Per project policy, issues with `Needs Feedback` are proposals awaiting human review. The PR closes this issue, which suggests the feedback has been addressed. Please confirm whether the `Needs Feedback` label on #4454 can be removed. - The PR description does not include quality gate results (lint, typecheck, unit_tests, integration_tests, coverage). Please confirm these pass before requesting review. **Next Steps** This PR will be reviewed by the implementation team once CI checks complete. I will monitor the CI status and provide updates. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
hamza.khyari force-pushed feature/git-worktree-apply from 9a20c91928
Some checks failed
CI / push-validation (pull_request) Successful in 18s
CI / helm (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 32s
CI / lint (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 48s
CI / security (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 3m6s
CI / build (pull_request) Successful in 3m17s
CI / integration_tests (pull_request) Successful in 4m3s
CI / unit_tests (pull_request) Failing after 5m1s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 185bebe323
Some checks failed
CI / push-validation (pull_request) Successful in 16s
CI / build (pull_request) Successful in 35s
CI / lint (pull_request) Successful in 39s
CI / helm (pull_request) Successful in 39s
CI / quality (pull_request) Successful in 48s
CI / e2e_tests (pull_request) Successful in 3m23s
CI / integration_tests (pull_request) Successful in 3m56s
CI / security (pull_request) Successful in 4m4s
CI / typecheck (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Failing after 5m30s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-04-09 14:04:22 +00:00
Compare
Owner

Thank you for this contribution, @hamza.khyari. I am reviewing PR #5998.

Summary: This PR wires sandbox_root into PlanExecutor via _get_plan_executor() so that the execute phase writes LLM-generated file output to .cleveragents/sandbox/. This addresses a critical gap where PlanExecutor received sandbox_root=None, causing LLM code changes to be discarded after execute completed.

Linked Issue: #4454 — Note that issue #4454 has a "Needs Feedback" label, indicating it is a proposal awaiting human review. The PR can proceed through CI and review, but the linked issue's "Needs Feedback" label should be addressed by a human reviewer before the issue is considered fully verified.

Current CI Status:

Gate Status
lint Successful
typecheck Successful
quality Successful
security Successful
integration_tests Successful
e2e_tests Successful
build Successful
helm Successful
unit_tests Failing
coverage Running

The unit_tests job is failing. I will fetch the CI logs to determine the root cause and follow up.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for this contribution, @hamza.khyari. I am reviewing PR #5998. **Summary**: This PR wires `sandbox_root` into `PlanExecutor` via `_get_plan_executor()` so that the execute phase writes LLM-generated file output to `.cleveragents/sandbox/`. This addresses a critical gap where `PlanExecutor` received `sandbox_root=None`, causing LLM code changes to be discarded after execute completed. **Linked Issue**: #4454 — Note that issue #4454 has a "Needs Feedback" label, indicating it is a proposal awaiting human review. The PR can proceed through CI and review, but the linked issue's "Needs Feedback" label should be addressed by a human reviewer before the issue is considered fully verified. **Current CI Status**: | Gate | Status | |---|---| | `lint` | Successful | | `typecheck` | Successful | | `quality` | Successful | | `security` | Successful | | `integration_tests` | Successful | | `e2e_tests` | Successful | | `build` | Successful | | `helm` | Successful | | `unit_tests` | **Failing** | | `coverage` | Running | The `unit_tests` job is failing. I will fetch the CI logs to determine the root cause and follow up. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
hamza.khyari force-pushed feature/git-worktree-apply from 185bebe323
Some checks failed
CI / push-validation (pull_request) Successful in 16s
CI / build (pull_request) Successful in 35s
CI / lint (pull_request) Successful in 39s
CI / helm (pull_request) Successful in 39s
CI / quality (pull_request) Successful in 48s
CI / e2e_tests (pull_request) Successful in 3m23s
CI / integration_tests (pull_request) Successful in 3m56s
CI / security (pull_request) Successful in 4m4s
CI / typecheck (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Failing after 5m30s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 751261213d
Some checks failed
CI / push-validation (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 28s
CI / helm (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 1m20s
CI / e2e_tests (pull_request) Successful in 3m10s
CI / build (pull_request) Successful in 3m25s
CI / typecheck (pull_request) Successful in 4m6s
CI / integration_tests (pull_request) Successful in 4m22s
CI / unit_tests (pull_request) Failing after 5m11s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-04-09 14:12:34 +00:00
Compare
hamza.khyari force-pushed feature/git-worktree-apply from 751261213d
Some checks failed
CI / push-validation (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 28s
CI / helm (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 1m20s
CI / e2e_tests (pull_request) Successful in 3m10s
CI / build (pull_request) Successful in 3m25s
CI / typecheck (pull_request) Successful in 4m6s
CI / integration_tests (pull_request) Successful in 4m22s
CI / unit_tests (pull_request) Failing after 5m11s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 06428a5db6
All checks were successful
CI / push-validation (pull_request) Successful in 16s
CI / helm (pull_request) Successful in 23s
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 48s
CI / typecheck (pull_request) Successful in 53s
CI / security (pull_request) Successful in 53s
CI / quality (pull_request) Successful in 51s
CI / e2e_tests (pull_request) Successful in 3m27s
CI / integration_tests (pull_request) Successful in 7m14s
CI / unit_tests (pull_request) Successful in 8m34s
CI / docker (pull_request) Successful in 1m20s
CI / coverage (pull_request) Successful in 14m36s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 57m55s
2026-04-09 14:17:59 +00:00
Compare
hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-04-09 14:23:49 +00:00
hamza.khyari added this to the v3.5.0 milestone 2026-04-09 14:27:11 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!5998
No description provided.