feat(plan): run tests, lint, and typecheck in sandbox during execute phase #6055

Open
opened 2026-04-09 14:12:21 +00:00 by hamza.khyari · 2 comments
Member

Summary

The specification (§13249-13254) requires the execute phase to run validation checks (tests, lint, typecheck) against the LLM-generated code inside the git worktree sandbox before the plan reaches execute/complete. The results are then displayed in a Validation (from Execute) panel during plan apply.

Metadata

  • Commit Message: feat(plan): run tests, lint, and typecheck in sandbox during execute phase
  • Branch: feature/sandbox-execute-validation

Current Behavior

The execute phase writes LLM output to the worktree and commits it. No validation is run. The apply phase shows Apply Summary and Sandbox Cleanup panels but no Validation panel.

Expected Behavior (from spec)

╭─ Validation (from Execute) ────╮
│ Tests: passed (24/24)          │
│ Lint: passed (0 warnings)      │
│ Type Check: passed (0 errors)  │
│ Duration: 12.4s                │
╰────────────────────────────────╯

Subtasks

  • After LLM output is committed to the worktree, detect and run the project test runner (pytest, nox, etc.) inside the worktree
  • Run lint check (ruff/flake8) inside the worktree
  • Run typecheck (pyright/mypy) inside the worktree
  • Store validation results in plan metadata (pass/fail, counts, duration)
  • Display Validation panel during plan apply from stored results
  • Handle timeout/failure gracefully — validation failure should not block apply, but should be surfaced to user

Definition of Done

  • Execute phase runs tests + lint + typecheck in sandbox worktree
  • Results stored in plan metadata and retrievable at apply time
  • Apply displays Validation (from Execute) panel matching spec §13249-13254
  • Validation failure does not crash the pipeline — results are advisory

Spec Reference

specification.md §13249-13254

## Summary The specification (§13249-13254) requires the execute phase to run validation checks (tests, lint, typecheck) against the LLM-generated code inside the git worktree sandbox before the plan reaches `execute/complete`. The results are then displayed in a **Validation (from Execute)** panel during `plan apply`. ## Metadata - **Commit Message**: `feat(plan): run tests, lint, and typecheck in sandbox during execute phase` - **Branch**: `feature/sandbox-execute-validation` ## Current Behavior The execute phase writes LLM output to the worktree and commits it. No validation is run. The apply phase shows Apply Summary and Sandbox Cleanup panels but no Validation panel. ## Expected Behavior (from spec) ``` ╭─ Validation (from Execute) ────╮ │ Tests: passed (24/24) │ │ Lint: passed (0 warnings) │ │ Type Check: passed (0 errors) │ │ Duration: 12.4s │ ╰────────────────────────────────╯ ``` ## Subtasks - [ ] After LLM output is committed to the worktree, detect and run the project test runner (pytest, nox, etc.) inside the worktree - [ ] Run lint check (ruff/flake8) inside the worktree - [ ] Run typecheck (pyright/mypy) inside the worktree - [ ] Store validation results in plan metadata (pass/fail, counts, duration) - [ ] Display Validation panel during `plan apply` from stored results - [ ] Handle timeout/failure gracefully — validation failure should not block apply, but should be surfaced to user ## Definition of Done - Execute phase runs tests + lint + typecheck in sandbox worktree - Results stored in plan metadata and retrievable at apply time - Apply displays Validation (from Execute) panel matching spec §13249-13254 - Validation failure does not crash the pipeline — results are advisory ## Spec Reference `specification.md` §13249-13254
hamza.khyari added this to the v3.5.0 milestone 2026-04-09 14:23:04 +00:00
Owner

Thank you for filing this issue, @hamza.khyari. I am reviewing it now and will provide triage feedback shortly.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for filing this issue, @hamza.khyari. I am reviewing it now and will provide triage feedback shortly. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Owner

Issue reviewed and triaged:

Completeness assessment (per CONTRIBUTING.md):

  • Clear title: Yes
  • Background/context with spec reference (§13249-13254): Yes
  • Current vs expected behavior: Yes
  • Metadata (commit message, branch name): Yes
  • Subtasks checklist: Yes
  • Definition of Done: Yes

This is a well-formed feature issue. The automated project owner has already applied the appropriate labels.

Triage summary:

  • Type: Feature
  • Priority: Medium — this is a spec-required feature for the execute phase validation panel, but not a blocker for core functionality
  • Story Points: 5 (L) — involves multiple subsystems (execute phase, plan metadata storage, apply display), requires careful integration with the sandbox worktree
  • Milestone: v3.5.0 (Autonomy Hardening) — appropriate, as this is part of the validation-gated apply acceptance criteria
  • MoSCoW: Should Have — the spec requires this, but the execute phase can function without it (validation is advisory per your DoD)

One item to address: The issue does not have a parent Epic link. Per CONTRIBUTING.md, all issues (except Epics and Legendaries) must be linked to a parent Epic. This issue should be linked to an Epic covering the plan execute phase or sandbox validation. If you know the relevant Epic number, please add the dependency link. If not, I can search for the appropriate parent Epic.

Next step: This issue is now ready for implementation and will be picked up by the implementation team.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Issue reviewed and triaged: **Completeness assessment** (per CONTRIBUTING.md): - Clear title: Yes - Background/context with spec reference (§13249-13254): Yes - Current vs expected behavior: Yes - Metadata (commit message, branch name): Yes - Subtasks checklist: Yes - Definition of Done: Yes This is a well-formed feature issue. The automated project owner has already applied the appropriate labels. **Triage summary:** - **Type**: Feature - **Priority**: Medium — this is a spec-required feature for the execute phase validation panel, but not a blocker for core functionality - **Story Points**: 5 (L) — involves multiple subsystems (execute phase, plan metadata storage, apply display), requires careful integration with the sandbox worktree - **Milestone**: v3.5.0 (Autonomy Hardening) — appropriate, as this is part of the validation-gated apply acceptance criteria - **MoSCoW**: Should Have — the spec requires this, but the execute phase can function without it (validation is advisory per your DoD) **One item to address**: The issue does not have a parent Epic link. Per CONTRIBUTING.md, all issues (except Epics and Legendaries) must be linked to a parent Epic. This issue should be linked to an Epic covering the plan execute phase or sandbox validation. If you know the relevant Epic number, please add the dependency link. If not, I can search for the appropriate parent Epic. **Next step**: This issue is now ready for implementation and will be picked up by the implementation team. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#6055
No description provided.