cleveragents/cleveragents-core

Fork 3

feat(plan): run tests, lint, and typecheck in sandbox during execute phase #6055

New issue

Open

opened 2026-04-09 14:12:21 +00:00 by hamza.khyari · 2 comments

hamza.khyari commented

2026-04-09 14:12:21 +00:00

Member

Summary

The specification (§13249-13254) requires the execute phase to run validation checks (tests, lint, typecheck) against the LLM-generated code inside the git worktree sandbox before the plan reaches execute/complete. The results are then displayed in a Validation (from Execute) panel during plan apply.

Metadata

Commit Message: feat(plan): run tests, lint, and typecheck in sandbox during execute phase
Branch: feature/sandbox-execute-validation

Current Behavior

The execute phase writes LLM output to the worktree and commits it. No validation is run. The apply phase shows Apply Summary and Sandbox Cleanup panels but no Validation panel.

Expected Behavior (from spec)

╭─ Validation (from Execute) ────╮
│ Tests: passed (24/24)          │
│ Lint: passed (0 warnings)      │
│ Type Check: passed (0 errors)  │
│ Duration: 12.4s                │
╰────────────────────────────────╯

Subtasks

After LLM output is committed to the worktree, detect and run the project test runner (pytest, nox, etc.) inside the worktree
Run lint check (ruff/flake8) inside the worktree
Run typecheck (pyright/mypy) inside the worktree
Store validation results in plan metadata (pass/fail, counts, duration)
Display Validation panel during plan apply from stored results
Handle timeout/failure gracefully — validation failure should not block apply, but should be surfaced to user

Definition of Done

Execute phase runs tests + lint + typecheck in sandbox worktree
Results stored in plan metadata and retrievable at apply time
Apply displays Validation (from Execute) panel matching spec §13249-13254
Validation failure does not crash the pipeline — results are advisory

Spec Reference

specification.md §13249-13254

## Summary The specification (§13249-13254) requires the execute phase to run validation checks (tests, lint, typecheck) against the LLM-generated code inside the git worktree sandbox before the plan reaches `execute/complete`. The results are then displayed in a **Validation (from Execute)** panel during `plan apply`. ## Metadata - **Commit Message**: `feat(plan): run tests, lint, and typecheck in sandbox during execute phase` - **Branch**: `feature/sandbox-execute-validation` ## Current Behavior The execute phase writes LLM output to the worktree and commits it. No validation is run. The apply phase shows Apply Summary and Sandbox Cleanup panels but no Validation panel. ## Expected Behavior (from spec) ``` ╭─ Validation (from Execute) ────╮ │ Tests: passed (24/24) │ │ Lint: passed (0 warnings) │ │ Type Check: passed (0 errors) │ │ Duration: 12.4s │ ╰────────────────────────────────╯ ``` ## Subtasks - [ ] After LLM output is committed to the worktree, detect and run the project test runner (pytest, nox, etc.) inside the worktree - [ ] Run lint check (ruff/flake8) inside the worktree - [ ] Run typecheck (pyright/mypy) inside the worktree - [ ] Store validation results in plan metadata (pass/fail, counts, duration) - [ ] Display Validation panel during `plan apply` from stored results - [ ] Handle timeout/failure gracefully — validation failure should not block apply, but should be surfaced to user ## Definition of Done - Execute phase runs tests + lint + typecheck in sandbox worktree - Results stored in plan metadata and retrievable at apply time - Apply displays Validation (from Execute) panel matching spec §13249-13254 - Validation failure does not crash the pipeline — results are advisory ## Spec Reference `specification.md` §13249-13254

hamza.khyari added this to the v3.5.0 milestone

2026-04-09 14:23:04 +00:00

hamza.khyari added the

labels

2026-04-09 14:23:06 +00:00

HAL9000 commented

2026-04-09 14:27:27 +00:00

Owner

Thank you for filing this issue, @hamza.khyari. I am reviewing it now and will provide triage feedback shortly.

Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for filing this issue, @hamza.khyari. I am reviewing it now and will provide triage feedback shortly. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison

HAL9000 commented

2026-04-09 14:54:19 +00:00

Owner

Issue reviewed and triaged:

Completeness assessment (per CONTRIBUTING.md):

Clear title: Yes
Background/context with spec reference (§13249-13254): Yes
Current vs expected behavior: Yes
Metadata (commit message, branch name): Yes
Subtasks checklist: Yes
Definition of Done: Yes

This is a well-formed feature issue. The automated project owner has already applied the appropriate labels.

Triage summary:

Type: Feature
Priority: Medium — this is a spec-required feature for the execute phase validation panel, but not a blocker for core functionality
Story Points: 5 (L) — involves multiple subsystems (execute phase, plan metadata storage, apply display), requires careful integration with the sandbox worktree
Milestone: v3.5.0 (Autonomy Hardening) — appropriate, as this is part of the validation-gated apply acceptance criteria
MoSCoW: Should Have — the spec requires this, but the execute phase can function without it (validation is advisory per your DoD)

One item to address: The issue does not have a parent Epic link. Per CONTRIBUTING.md, all issues (except Epics and Legendaries) must be linked to a parent Epic. This issue should be linked to an Epic covering the plan execute phase or sandbox validation. If you know the relevant Epic number, please add the dependency link. If not, I can search for the appropriate parent Epic.

Next step: This issue is now ready for implementation and will be picked up by the implementation team.

Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Issue reviewed and triaged: **Completeness assessment** (per CONTRIBUTING.md): - Clear title: Yes - Background/context with spec reference (§13249-13254): Yes - Current vs expected behavior: Yes - Metadata (commit message, branch name): Yes - Subtasks checklist: Yes - Definition of Done: Yes This is a well-formed feature issue. The automated project owner has already applied the appropriate labels. **Triage summary:** - **Type**: Feature - **Priority**: Medium — this is a spec-required feature for the execute phase validation panel, but not a blocker for core functionality - **Story Points**: 5 (L) — involves multiple subsystems (execute phase, plan metadata storage, apply display), requires careful integration with the sandbox worktree - **Milestone**: v3.5.0 (Autonomy Hardening) — appropriate, as this is part of the validation-gated apply acceptance criteria - **MoSCoW**: Should Have — the spec requires this, but the execute phase can function without it (validation is advisory per your DoD) **One item to address**: The issue does not have a parent Epic link. Per CONTRIBUTING.md, all issues (except Epics and Legendaries) must be linked to a parent Epic. This issue should be linked to an Epic covering the plan execute phase or sandbox validation. If you know the relevant Epic number, please add the dependency link. If not, I can search for the appropriate parent Epic. **Next step**: This issue is now ready for implementation and will be picked up by the implementation team. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison

HAL9000 added a new dependency

2026-04-09 15:07:58 +00:00

#4959 EPIC: Correction Engine — Revert & Append Modes with plan.attempt Tracking

HAL9000 referenced this issue

2026-04-09 15:13:55 +00:00

[AUTO-EPIC] Epic Planning Health Report (Cycle 21) #5883

HAL9000 referenced this issue

2026-04-09 15:24:14 +00:00

[AUTO-LIAISON] Human Liaison Status (Cycle 20) #6136

HAL9000 referenced this issue

2026-04-09 15:34:21 +00:00

[AUTO-WATCHDOG] System Health Report (Cycle 21) #5915