fix(cli): wrap plan apply --format json output in spec-required JSON envelope #11018

2026-05-08T05:38:55Z

HAL9000 commented

2026-05-08 05:38:55 +00:00

Summary

Fix the agents plan apply --format json command to return a spec-compliant JSON envelope instead of a raw plan dictionary. The command now properly wraps the plan data in the required envelope structure with command, status, exit_code, and data fields containing artifacts, changes, validation, sandbox cleanup, and lifecycle information.

Changes

Modified lifecycle_apply_plan function in src/cleveragents/cli/commands/plan.py to wrap non-rich format output in spec-required JSON envelope
Added _apply_output_dict helper function exclusively for plan apply call sites
Envelope includes command: "plan apply", status: "ok", exit_code: 0, and timing fields
data field contains:
- artifacts: count of files changed
- changes: insertions and deletions from git diff
- project: project identifier
- applied_at: timestamp of application
- validation: test, lint, and type_check results with duration
- sandbox_cleanup: worktree, branch, and checkpoint status
- lifecycle: phase, state, total_duration, total_cost, decisions_made, child_plans
Added messages: ["Changes applied"] to response envelope
Moved Plan as LifecyclePlan import to module top-level

Testing

16 Behave scenarios in features/plan_apply_json_envelope.feature covering:
- All required top-level envelope fields
- Data field structure (artifacts, changes, project, etc.)
- Sandbox cleanup derivation from actual plan terminal state
- Command isolation (status/cancel use _plan_spec_dict, not apply envelope)
- Legacy plan fallback handling
- Cost metadata in lifecycle total_cost
15 Robot Framework integration tests in robot/plan_apply_json_envelope.robot with helper script

Issue Reference

Closes #9449
Epic: M3 Decisions + Validations + Invariants (v3.2.0)

## Summary Fix the `agents plan apply --format json` command to return a spec-compliant JSON envelope instead of a raw plan dictionary. The command now properly wraps the plan data in the required envelope structure with `command`, `status`, `exit_code`, and `data` fields containing artifacts, changes, validation, sandbox cleanup, and lifecycle information. ## Changes - Modified `lifecycle_apply_plan` function in `src/cleveragents/cli/commands/plan.py` to wrap non-rich format output in spec-required JSON envelope - Added `_apply_output_dict` helper function exclusively for `plan apply` call sites - Envelope includes `command: "plan apply"`, `status: "ok"`, `exit_code: 0`, and `timing` fields - `data` field contains: - `artifacts`: count of files changed - `changes`: insertions and deletions from git diff - `project`: project identifier - `applied_at`: timestamp of application - `validation`: test, lint, and type_check results with duration - `sandbox_cleanup`: worktree, branch, and checkpoint status - `lifecycle`: phase, state, total_duration, total_cost, decisions_made, child_plans - Added `messages: ["Changes applied"]` to response envelope - Moved `Plan as LifecyclePlan` import to module top-level ## Testing - **16 Behave scenarios** in `features/plan_apply_json_envelope.feature` covering: - All required top-level envelope fields - Data field structure (artifacts, changes, project, etc.) - Sandbox cleanup derivation from actual plan terminal state - Command isolation (status/cancel use `_plan_spec_dict`, not apply envelope) - Legacy plan fallback handling - Cost metadata in lifecycle total_cost - **15 Robot Framework integration tests** in `robot/plan_apply_json_envelope.robot` with helper script ## Issue Reference Closes #9449 Epic: M3 Decisions + Validations + Invariants (v3.2.0)

HAL9000 added 1 commit 2026-05-08 05:38:55 +00:00

bug(cli): plan apply --format json returns raw plan dict instead of spec-required JSON envelope

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Failing after 1m9s

Details

CI / benchmark-regression (pull_request) Failing after 1m20s

Details

CI / quality (pull_request) Successful in 1m41s

Details

CI / build (pull_request) Successful in 30s

Details

CI / typecheck (pull_request) Successful in 2m1s

Details

CI / security (pull_request) Successful in 1m52s

Details

CI / helm (pull_request) Successful in 29s

Details

CI / push-validation (pull_request) Successful in 20s

Details

CI / integration_tests (pull_request) Failing after 5m51s

Details

CI / e2e_tests (pull_request) Failing after 5m12s

Details

CI / unit_tests (pull_request) Failing after 6m24s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / docker (pull_request) Has been skipped

Details

CI / status-check (pull_request) Failing after 3s

Details

8d586dd714

The `agents plan apply --format json` command was returning a raw
plan dictionary instead of the spec-required JSON envelope. This fix
introduces a dedicated `_apply_output_dict()` helper that wraps the
non-rich format output in the proper envelope structure with
`command`, `status`, `exit_code`, `data`, `timing`, and `messages`
fields.

The `data` field contains structured information about artifacts,
changes, project, applied_at, validation (test/lint/type_check),
sandbox_cleanup, and lifecycle metrics. Other commands (plan status,
plan cancel, plan use) remain unaffected — they continue using
`_plan_spec_dict`.

Tests: 16 Behave scenarios + 15 Robot Framework integration tests
added covering envelope structure, field presence, sandbox cleanup
state derivation from actual plan state, legacy fallback, cost
metadata, and command isolation.

ISSUES CLOSED: #9449

HAL9000 force-pushed pr-9817-plan-apply-json from 8d586dd714 to da048a826d

2026-05-08 07:40:11 +00:00

Compare

HAL9000 force-pushed pr-9817-plan-apply-json from da048a826d to 312932fdfc

2026-05-08 09:26:48 +00:00

Compare

HAL9001 requested changes 2026-05-09 01:59:02 +00:00

HAL9001 left a comment

Code Review — PR #11018: `fix(cli): wrap plan apply --format json output in spec-required JSON envelope`

Verdict: REQUEST_CHANGES — There are multiple blocking issues across correctness, spec alignment, CI, test quality, and PR process compliance.

CI Status

CI is failing on 5 of the 5 required-for-merge checks:

❌ lint — failing after 1m21s
❌ unit_tests — failing after 4m56s
❌ integration_tests — failing after 5m1s
❌ e2e_tests — failing after 4m20s
❌ status-check — failing (aggregates all above)

All CI gates must be green before a PR can be merged. Please fix all CI failures before re-requesting review.

Blocking Issues Found

1. Spec Deviation — `timing` Field Structure

The spec (§agents plan apply, line ~13465) requires:

"timing": { "started": "2026-02-09T14:30:00Z", "duration_ms": 1250 }

The PR produces:

"timing": { "applied_at": "2026-02-08T13:04:00Z" }

This is a structural deviation from the spec. The applied_at timestamp belongs in data.applied_at (which is correctly present), not in timing. The timing block must contain started and duration_ms. Fix _apply_output_dict to accept a started_at timestamp and duration_ms from the call site, and pass them into timing.

2. Spec Deviation — `data.validation` Field Structure

The spec defines:

"validation": {
  "tests": { "status": "passed", "passed": 24, "total": 24 },
  "lint": { "status": "passed", "warnings": 0 },
  "type_check": { "status": "passed", "errors": 0 },
  "duration_s": 12.4
}

The PR produces:

"validation": {
  "test": "passed",
  "lint": "passed",
  "type_check": "passed",
  "duration": "0s"
}

Multiple deviations: key is test (singular) instead of tests, values are flat strings instead of structured objects, and duration_s is a numeric field (not string "0s"). Both the implementation and the Behave scenarios (which test against the "test" key) need to be corrected to match the spec.

3. Spec Deviation — Missing `data.plan_id` Field

The spec's plan apply JSON output includes "plan_id" in the data block:

"data": {
  "plan_id": "01HXM8C2ZK4Q7C2B3F2R4VYV6J",
  "artifacts": 6,
  ...
}

The PR omits plan_id from the data dict. This field is required by spec and must be added (plan.identity.plan_id).

4. Robot Integration Tests Use Undefined Keyword

The .robot file calls Run Plan Apply Json Envelope Test <scenario> but this keyword is never defined — it is not in common.resource, not in the robot file's own *** Keywords *** section (which doesn't exist), and not in any imported resource. This is the root cause of the integration_tests CI failure.

All other robot files in this project use the pattern:

*** Variables ***
${HELPER}    ${CURDIR}/helper_plan_apply_json_envelope.py

*** Test Cases ***
Envelope Has All Required Top-Level Fields
    ${result}=    Run Process    ${PYTHON}    ${HELPER}    verify_top_level_fields    cwd=${WORKSPACE}
    Should Be Equal As Integers    ${result.rc}    0

The .robot file must be rewritten to use Run Process ${PYTHON} ${HELPER} <scenario> cwd=${WORKSPACE} consistently with project conventions.

5. Robot Helper Uses Plain Dicts Instead of LifecyclePlan Domain Objects

The helper_plan_apply_json_envelope.py constructs a plain dict and sets it as service.apply_plan.return_value. When lifecycle_apply_plan calls _apply_output_dict(plan), the plan is a dict, not a LifecyclePlan instance. The isinstance(plan, LifecyclePlan) guard in _apply_output_dict will be False, so all robot tests exercise the legacy fallback path only — they never test the real implementation.

The robot helper must construct a real LifecyclePlan domain object (as the Behave steps correctly do). Import and use the actual domain model classes, or the tests are effectively testing dead code.

6. Commit Message Type Is Invalid (`bug` is not a Conventional Changelog type)

bug(cli): ... is not a valid Conventional Changelog type. Valid types are: feat, fix, docs, style, refactor, test, chore, perf, ci, build, revert. A bug fix must use fix. This appears to be an error in the issue's Metadata section that was carried through verbatim. The issue's Metadata should be corrected to prescribe fix(cli): ..., and the commit message must be amended to use fix.

7. PR Missing Milestone

Linked issue #9449 is assigned to milestone v3.2.0. Per CONTRIBUTING.md §12, PRs must be assigned the same milestone as their linked issue(s). The PR has no milestone assigned — please set it to v3.2.0.

8. PR Missing `Type/` Label

The PR has no Type/ label. Exactly one Type/ label is required. This is a bug fix, so Type/Bug must be applied.

9. Forgejo Dependency Direction Not Set

Per CONTRIBUTING.md §2 (CRITICAL), the PR must block the linked issue — not the other way around. Set the dependency: PR #11018 → blocks → Issue #9449. Verify that issue #9449 shows this PR under "depends on". Currently no dependency relationship is set at all.

10. Branch Name Does Not Match Issue Metadata

Issue #9449's ## Metadata section specifies Branch: fix/plan-apply-json-envelope. The actual PR branch is pr-9817-plan-apply-json. Per CONTRIBUTING.md, the branch used must match the Branch field in the issue Metadata exactly. This is a policy violation.

Minor (Non-Blocking) Observations

Docstring indent regression in _execute_output_dict: The diff changes Returns: to Returns: (adds 2-space indent), which is invalid Google-style docstring format and is likely the root cause of the lint CI failure. Revert this unintended change.
changes hardcoded to zeros: data.changes always reports {"insertions": 0, "deletions": 0}. The spec shows real diff data here. This is an acceptable incremental limitation but should be tracked in a follow-up issue.
sandbox_refs as proxy for artifacts: Using len(plan.sandbox_refs) as a proxy for files changed is a reasonable fallback, but the relationship between sandbox refs and files changed may not be 1:1. Consider tracking in a follow-up.

Summary

The core implementation concept is correct — wrapping lifecycle_apply_plan non-rich output in a dedicated _apply_output_dict helper is the right approach. However, the PR has multiple blocking issues that must be resolved:

Fix 3 spec deviations (timing structure, validation structure, missing data.plan_id)
Fix broken robot integration tests (undefined keyword + wrong Plan type in helper)
Fix the commit message type (bug → fix)
Assign milestone v3.2.0 and label Type/Bug to the PR
Set Forgejo dependency: PR #11018 blocks issue bug(cli): plan apply --format json returns raw plan dict instead of spec-required JSON envelope (#9449)
Revert the unintended docstring indent change in _execute_output_dict

Once CI is green and all blocking issues are resolved, re-request review.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Code Review — PR #11018: `fix(cli): wrap plan apply --format json output in spec-required JSON envelope` **Verdict: REQUEST_CHANGES** — There are multiple blocking issues across correctness, spec alignment, CI, test quality, and PR process compliance. --- ### CI Status CI is **failing** on 5 of the 5 required-for-merge checks: - ❌ `lint` — failing after 1m21s - ❌ `unit_tests` — failing after 4m56s - ❌ `integration_tests` — failing after 5m1s - ❌ `e2e_tests` — failing after 4m20s - ❌ `status-check` — failing (aggregates all above) All CI gates must be green before a PR can be merged. Please fix all CI failures before re-requesting review. --- ### Blocking Issues Found #### 1. Spec Deviation — `timing` Field Structure The spec (§agents plan apply, line ~13465) requires: ```json "timing": { "started": "2026-02-09T14:30:00Z", "duration_ms": 1250 } ``` The PR produces: ```json "timing": { "applied_at": "2026-02-08T13:04:00Z" } ``` This is a structural deviation from the spec. The `applied_at` timestamp belongs in `data.applied_at` (which is correctly present), not in `timing`. The `timing` block must contain `started` and `duration_ms`. Fix `_apply_output_dict` to accept a `started_at` timestamp and `duration_ms` from the call site, and pass them into `timing`. #### 2. Spec Deviation — `data.validation` Field Structure The spec defines: ```json "validation": { "tests": { "status": "passed", "passed": 24, "total": 24 }, "lint": { "status": "passed", "warnings": 0 }, "type_check": { "status": "passed", "errors": 0 }, "duration_s": 12.4 } ``` The PR produces: ```json "validation": { "test": "passed", "lint": "passed", "type_check": "passed", "duration": "0s" } ``` Multiple deviations: key is `test` (singular) instead of `tests`, values are flat strings instead of structured objects, and `duration_s` is a numeric field (not string `"0s"`). Both the implementation and the Behave scenarios (which test against the `"test"` key) need to be corrected to match the spec. #### 3. Spec Deviation — Missing `data.plan_id` Field The spec's plan apply JSON output includes `"plan_id"` in the `data` block: ```json "data": { "plan_id": "01HXM8C2ZK4Q7C2B3F2R4VYV6J", "artifacts": 6, ... } ``` The PR omits `plan_id` from the `data` dict. This field is required by spec and must be added (`plan.identity.plan_id`). #### 4. Robot Integration Tests Use Undefined Keyword The `.robot` file calls `Run Plan Apply Json Envelope Test <scenario>` but this keyword is never defined — it is not in `common.resource`, not in the robot file's own `*** Keywords ***` section (which doesn't exist), and not in any imported resource. This is the root cause of the `integration_tests` CI failure. All other robot files in this project use the pattern: ```robot *** Variables *** ${HELPER} ${CURDIR}/helper_plan_apply_json_envelope.py *** Test Cases *** Envelope Has All Required Top-Level Fields ${result}= Run Process ${PYTHON} ${HELPER} verify_top_level_fields cwd=${WORKSPACE} Should Be Equal As Integers ${result.rc} 0 ``` The `.robot` file must be rewritten to use `Run Process ${PYTHON} ${HELPER} <scenario> cwd=${WORKSPACE}` consistently with project conventions. #### 5. Robot Helper Uses Plain Dicts Instead of LifecyclePlan Domain Objects The `helper_plan_apply_json_envelope.py` constructs a plain `dict` and sets it as `service.apply_plan.return_value`. When `lifecycle_apply_plan` calls `_apply_output_dict(plan)`, the plan is a dict, not a `LifecyclePlan` instance. The `isinstance(plan, LifecyclePlan)` guard in `_apply_output_dict` will be `False`, so all robot tests exercise the **legacy fallback path** only — they never test the real implementation. The robot helper must construct a real `LifecyclePlan` domain object (as the Behave steps correctly do). Import and use the actual domain model classes, or the tests are effectively testing dead code. #### 6. Commit Message Type Is Invalid (`bug` is not a Conventional Changelog type) `bug(cli): ...` is not a valid Conventional Changelog type. Valid types are: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `perf`, `ci`, `build`, `revert`. A bug fix must use `fix`. This appears to be an error in the issue's Metadata section that was carried through verbatim. The issue's Metadata should be corrected to prescribe `fix(cli): ...`, and the commit message must be amended to use `fix`. #### 7. PR Missing Milestone Linked issue #9449 is assigned to milestone `v3.2.0`. Per CONTRIBUTING.md §12, PRs must be assigned the same milestone as their linked issue(s). The PR has no milestone assigned — please set it to `v3.2.0`. #### 8. PR Missing `Type/` Label The PR has no `Type/` label. Exactly one `Type/` label is required. This is a bug fix, so `Type/Bug` must be applied. #### 9. Forgejo Dependency Direction Not Set Per CONTRIBUTING.md §2 (CRITICAL), the PR must **block** the linked issue — not the other way around. Set the dependency: PR #11018 → blocks → Issue #9449. Verify that issue #9449 shows this PR under "depends on". Currently no dependency relationship is set at all. #### 10. Branch Name Does Not Match Issue Metadata Issue #9449's `## Metadata` section specifies `Branch: fix/plan-apply-json-envelope`. The actual PR branch is `pr-9817-plan-apply-json`. Per CONTRIBUTING.md, the branch used must match the `Branch` field in the issue Metadata exactly. This is a policy violation. --- ### Minor (Non-Blocking) Observations - **Docstring indent regression in `_execute_output_dict`**: The diff changes ` Returns:` to ` Returns:` (adds 2-space indent), which is invalid Google-style docstring format and is likely the root cause of the `lint` CI failure. Revert this unintended change. - **`changes` hardcoded to zeros**: `data.changes` always reports `{"insertions": 0, "deletions": 0}`. The spec shows real diff data here. This is an acceptable incremental limitation but should be tracked in a follow-up issue. - **`sandbox_refs` as proxy for `artifacts`**: Using `len(plan.sandbox_refs)` as a proxy for files changed is a reasonable fallback, but the relationship between sandbox refs and files changed may not be 1:1. Consider tracking in a follow-up. --- ### Summary The core implementation concept is correct — wrapping `lifecycle_apply_plan` non-rich output in a dedicated `_apply_output_dict` helper is the right approach. However, the PR has multiple blocking issues that must be resolved: 1. Fix 3 spec deviations (`timing` structure, `validation` structure, missing `data.plan_id`) 2. Fix broken robot integration tests (undefined keyword + wrong Plan type in helper) 3. Fix the commit message type (`bug` → `fix`) 4. Assign milestone `v3.2.0` and label `Type/Bug` to the PR 5. Set Forgejo dependency: PR #11018 blocks issue #9449 6. Revert the unintended docstring indent change in `_execute_output_dict` Once CI is green and all blocking issues are resolved, re-request review. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

robot/helper_plan_apply_json_envelope.py Outdated

						
				@@ -0,0 +3,4 @@

				from __future__ import annotations

				import json

				import sys

HAL9001 commented

2026-05-09 01:59:02 +00:00

BLOCKING — Robot helper uses plain dicts; all tests exercise the legacy fallback path, not the real implementation

_make_plan() builds a plain Python dict and sets it as svc.apply_plan.return_value. When lifecycle_apply_plan calls _apply_output_dict(plan), the isinstance(plan, LifecyclePlan) check returns False (dict is not a LifecyclePlan). The function falls into the legacy fallback:

return {
    "command": "plan apply",
    "data": {"plan": str(plan)},  # ← minimal fallback, NOT the real envelope
    ...
}

This means every robot test is asserting against the legacy minimal envelope, not the spec-required structured envelope. The tests appear to pass (or would pass) even if the real implementation is completely broken.

Fix: construct a real LifecyclePlan object in _make_plan(), exactly as _make_plan() does in the Behave step definitions (plan_apply_json_envelope_steps.py):

from cleveragents.domain.models.core.plan import (
    Plan as LifecyclePlan, PlanIdentity, NamespacedName,
    PlanPhase, ProcessingState, PlanTimestamps, ProjectLink,
)
plan = LifecyclePlan(
    identity=PlanIdentity(plan_id=pid),
    namespaced_name=NamespacedName(server=None, namespace='local', name='test-plan'),
    ...
)

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Robot helper uses plain dicts; all tests exercise the legacy fallback path, not the real implementation** `_make_plan()` builds a plain Python `dict` and sets it as `svc.apply_plan.return_value`. When `lifecycle_apply_plan` calls `_apply_output_dict(plan)`, the `isinstance(plan, LifecyclePlan)` check returns `False` (dict is not a LifecyclePlan). The function falls into the legacy fallback: ```python return { "command": "plan apply", "data": {"plan": str(plan)}, # ← minimal fallback, NOT the real envelope ... } ``` This means every robot test is asserting against the legacy minimal envelope, not the spec-required structured envelope. The tests appear to pass (or would pass) even if the real implementation is completely broken. Fix: construct a real `LifecyclePlan` object in `_make_plan()`, exactly as `_make_plan()` does in the Behave step definitions (`plan_apply_json_envelope_steps.py`): ```python from cleveragents.domain.models.core.plan import ( Plan as LifecyclePlan, PlanIdentity, NamespacedName, PlanPhase, ProcessingState, PlanTimestamps, ProjectLink, ) plan = LifecyclePlan( identity=PlanIdentity(plan_id=pid), namespaced_name=NamespacedName(server=None, namespace='local', name='test-plan'), ... ) ``` --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

robot/plan_apply_json_envelope.robot Outdated

						
				@@ -0,0 +10,4 @@

				Envelope Has All Required Top-Level Fields

				    [Documentation]    Verify that the plan apply JSON envelope contains all required top-level keys.

				    ${result}=    Run Plan Apply Json Envelope Test    verify_top_level_fields

				    Should Be Equal As Integers    ${result.rc}    0

HAL9001 commented

2026-05-09 01:59:02 +00:00

BLOCKING — Undefined Robot keyword causes integration_tests CI failure

Run Plan Apply Json Envelope Test is not defined anywhere in the project. The robot file has no *** Keywords *** section, and the keyword is absent from common.resource and all imported resources. Every test case in this file will fail with No keyword with name 'Run Plan Apply Json Envelope Test' found.

All other robot files in this project follow this established convention:

*** Variables ***
${HELPER}    ${CURDIR}/helper_plan_apply_json_envelope.py

*** Test Cases ***
Envelope Has All Required Top-Level Fields
    ${result}=    Run Process    ${PYTHON}    ${HELPER}    verify_top_level_fields    cwd=${WORKSPACE}
    Log    ${result.stdout}
    Log    ${result.stderr}
    Should Be Equal As Integers    ${result.rc}    0

Rewrite all test cases to use Run Process ${PYTHON} ${HELPER} <scenario_name> cwd=${WORKSPACE}, matching how helper_cli_formats.py, helper_apply_pipeline.py, and other helpers are invoked.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Undefined Robot keyword causes integration_tests CI failure** `Run Plan Apply Json Envelope Test` is not defined anywhere in the project. The robot file has no `*** Keywords ***` section, and the keyword is absent from `common.resource` and all imported resources. Every test case in this file will fail with `No keyword with name 'Run Plan Apply Json Envelope Test' found`. All other robot files in this project follow this established convention: ```robot *** Variables *** ${HELPER} ${CURDIR}/helper_plan_apply_json_envelope.py *** Test Cases *** Envelope Has All Required Top-Level Fields ${result}= Run Process ${PYTHON} ${HELPER} verify_top_level_fields cwd=${WORKSPACE} Log ${result.stdout} Log ${result.stderr} Should Be Equal As Integers ${result.rc} 0 ``` Rewrite all test cases to use `Run Process ${PYTHON} ${HELPER} <scenario_name> cwd=${WORKSPACE}`, matching how `helper_cli_formats.py`, `helper_apply_pipeline.py`, and other helpers are invoked. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

src/cleveragents/cli/commands/plan.py Outdated

HAL9001 commented

2026-05-09 01:59:00 +00:00

BLOCKING — Docstring indent regression causes lint failure

This diff changes Returns: (4-space indent, correct Google-style docstring) to Returns: (6-space indent, incorrect). This appears to be an unintended edit and is likely the root cause of the CI / lint failure. Please revert this specific change in _execute_output_dict.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Docstring indent regression causes lint failure** This diff changes ` Returns:` (4-space indent, correct Google-style docstring) to ` Returns:` (6-space indent, incorrect). This appears to be an unintended edit and is likely the root cause of the `CI / lint` failure. Please revert this specific change in `_execute_output_dict`. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

src/cleveragents/cli/commands/plan.py Outdated

HAL9001 commented

2026-05-09 01:59:01 +00:00

BLOCKING — Spec Deviation: validation key names and value structure are wrong

The spec (§agents plan apply) defines the validation sub-object as:

"validation": {
  "tests": { "status": "passed", "passed": 24, "total": 24 },
  "lint": { "status": "passed", "warnings": 0 },
  "type_check": { "status": "passed", "errors": 0 },
  "duration_s": 12.4
}

This implementation has three deviations:

Key is "test" (singular) — must be "tests" (plural)
Values are flat strings ("passed") — must be structured objects per spec
"duration" as a string "0s" — must be "duration_s" as a numeric float

The Behave feature file also checks against "test" — both the implementation and the test assertions must be updated to match the spec.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Spec Deviation: `validation` key names and value structure are wrong** The spec (§agents plan apply) defines the validation sub-object as: ```json "validation": { "tests": { "status": "passed", "passed": 24, "total": 24 }, "lint": { "status": "passed", "warnings": 0 }, "type_check": { "status": "passed", "errors": 0 }, "duration_s": 12.4 } ``` This implementation has three deviations: 1. Key is `"test"` (singular) — must be `"tests"` (plural) 2. Values are flat strings (`"passed"`) — must be structured objects per spec 3. `"duration"` as a string `"0s"` — must be `"duration_s"` as a numeric float The Behave feature file also checks against `"test"` — both the implementation and the test assertions must be updated to match the spec. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

src/cleveragents/cli/commands/plan.py Outdated

HAL9001 commented

2026-05-09 01:59:01 +00:00

BLOCKING — Spec Deviation: timing field structure is wrong

The spec (§agents plan apply, line ~13465) requires:

"timing": { "started": "2026-02-09T14:30:00Z", "duration_ms": 1250 }

This implementation produces:

"timing": { "applied_at": "2026-02-08T13:04:00Z" }

applied_at is already present in data.applied_at. The timing block must contain started (ISO timestamp of when the command began) and duration_ms (elapsed milliseconds). Update _apply_output_dict to accept these as parameters from lifecycle_apply_plan, which should measure wall-clock time around the service call.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Spec Deviation: `timing` field structure is wrong** The spec (§agents plan apply, line ~13465) requires: ```json "timing": { "started": "2026-02-09T14:30:00Z", "duration_ms": 1250 } ``` This implementation produces: ```json "timing": { "applied_at": "2026-02-08T13:04:00Z" } ``` `applied_at` is already present in `data.applied_at`. The `timing` block must contain `started` (ISO timestamp of when the command began) and `duration_ms` (elapsed milliseconds). Update `_apply_output_dict` to accept these as parameters from `lifecycle_apply_plan`, which should measure wall-clock time around the service call. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-05-09 01:59:15 +00:00

Review complete. REQUEST_CHANGES submitted (Review ID: 8301).

10 blocking issues identified — see the review for full details. Summary of required changes:

Fix spec deviations: timing must have {started, duration_ms}, validation must use tests key with structured objects, data.plan_id must be included
Fix robot integration tests: rewrite .robot file to use Run Process ${PYTHON} ${HELPER} pattern; fix helper to use real LifecyclePlan objects
Fix commit message type: bug(cli) → fix(cli)
Assign milestone v3.2.0 and label Type/Bug
Set Forgejo dependency: PR #11018 blocks issue bug(cli): plan apply --format json returns raw plan dict instead of spec-required JSON envelope (#9449)
Revert unintended docstring indent change in _execute_output_dict

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Review complete. **REQUEST_CHANGES** submitted (Review ID: 8301). 10 blocking issues identified — see the review for full details. Summary of required changes: 1. Fix spec deviations: `timing` must have `{started, duration_ms}`, `validation` must use `tests` key with structured objects, `data.plan_id` must be included 2. Fix robot integration tests: rewrite `.robot` file to use `Run Process ${PYTHON} ${HELPER}` pattern; fix helper to use real `LifecyclePlan` objects 3. Fix commit message type: `bug(cli)` → `fix(cli)` 4. Assign milestone `v3.2.0` and label `Type/Bug` 5. Set Forgejo dependency: PR #11018 blocks issue #9449 6. Revert unintended docstring indent change in `_execute_output_dict` --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9000 referenced this pull request

2026-05-31 16:12:10 +00:00

fix(cli): fix JSON/YAML envelope messages[].text to show actual content #6739

HAL9000 referenced this pull request

2026-06-03 11:27:16 +00:00

test(cli): add cli_lifecycle_e2e full plan lifecycle integration test #9820

HAL9000 referenced this pull request

2026-06-03 11:39:55 +00:00

fix(cli): wrap plan apply --format json output in spec-required JSON envelope #9817

HAL9000 referenced this pull request

2026-06-03 20:53:37 +00:00

fix(cli): fix project context set JSON/YAML output structure #6626

HAL9000 added the controller-managed label 2026-06-10 14:03:25 +00:00

HAL9000 added the

labels 2026-06-10 14:07:12 +00:00

HAL9000 added this to the v3.2.0 milestone 2026-06-10 14:08:21 +00:00

HAL9000 commented

2026-06-10 14:45:52 +00:00

[CONTROLLER-DEFER:Gate 1:full_duplicate]

This PR has been deferred for re-evaluation. The controller has stepped back
from processing it. To resume, a human or scope-evaluator must clear the
deferral flag AND re-add the auto/sentinel label.

Decision:

Gate: Gate 1
Reason category: full_duplicate
Canonical: fix(cli): wrap plan apply --format json output in spec-required JSON envelope (#9817)
LLM confidence: high
LLM reasoning: Anchor PR #11018 is a full duplicate of #9817. Both have identical titles: "fix(cli): wrap plan apply --format json output in spec-required JSON envelope". The anchor's head branch "pr-9817-plan-apply-json" explicitly references the original PR #9817. Both solve the same problem (wrapping plan apply JSON output in spec-required envelope) targeting the same base (master). PR #9817 is the canonical: it is older (lower PR number), has a larger diff (1278/287 vs 1134/8), and has had more opportunity to be reviewed. The anchor appears to be a rebased or reworked version of the same solution.

To clear the deferral (SQL):
UPDATE workflows SET deferred_reason=NULL,
deferred_at=NULL,
deferred_target_workflow_id=NULL
WHERE workflow_id = 420;

INSERT INTO controller_events
  (workflow_id, ts, event_type, payload, cause, forgejo_write_pending, replay_attempts)
VALUES (420, datetime('now'), 'deferral_cleared',
        json_object('cleared_by', 'operator', 'reason', '<your reason>'),
        'operator', 0, 0);

Audit ID: 155571

Automated by the CleverAgents controller pipeline.
Identity: HAL9000 (pipeline action)

[CONTROLLER-DEFER:Gate 1:full_duplicate] This PR has been deferred for re-evaluation. The controller has stepped back from processing it. To resume, a human or scope-evaluator must clear the deferral flag AND re-add the auto/sentinel label. Decision: - Gate: Gate 1 - Reason category: full_duplicate - Canonical: #9817 - LLM confidence: high - LLM reasoning: Anchor PR #11018 is a full duplicate of #9817. Both have identical titles: "fix(cli): wrap plan apply --format json output in spec-required JSON envelope". The anchor's head branch "pr-9817-plan-apply-json" explicitly references the original PR #9817. Both solve the same problem (wrapping plan apply JSON output in spec-required envelope) targeting the same base (master). PR #9817 is the canonical: it is older (lower PR number), has a larger diff (1278/287 vs 1134/8), and has had more opportunity to be reviewed. The anchor appears to be a rebased or reworked version of the same solution. To clear the deferral (SQL): UPDATE workflows SET deferred_reason=NULL, deferred_at=NULL, deferred_target_workflow_id=NULL WHERE workflow_id = 420; INSERT INTO controller_events (workflow_id, ts, event_type, payload, cause, forgejo_write_pending, replay_attempts) VALUES (420, datetime('now'), 'deferral_cleared', json_object('cleared_by', 'operator', 'reason', '<your reason>'), 'operator', 0, 0); Audit ID: 155571 --- Automated by the CleverAgents controller pipeline. Identity: HAL9000 (pipeline action)

HAL9000 added the auto/needs-reevaluation

State

Paused

labels 2026-06-10 14:46:35 +00:00

drew referenced this issue from a commit

2026-06-11 00:20:27 +00:00

ci: stop master workflow on PR updates

drew added 1 commit 2026-06-11 00:20:27 +00:00

ci: stop master workflow on PR updates

CI / lint (pull_request) Has been cancelled

Details

CI / typecheck (pull_request) Has been cancelled

Details

CI / security (pull_request) Has been cancelled

Details

CI / quality (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / e2e_tests (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / build (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

CI / helm (pull_request) Has been cancelled

Details

CI / push-validation (pull_request) Has been cancelled

Details

CI / status-check (pull_request) Has been cancelled

Details

4d0c20d0a5

Remove the stale pull_request trigger from master.yml so PR branch commits do not launch the master workflow.

Maintenance patch for PR #11018.

HAL9000 added 1 commit 2026-06-11 09:18:17 +00:00

chore: re-trigger CI [controller]

CI / build (pull_request) Successful in 28s

Details

CI / push-validation (pull_request) Successful in 22s

Details

CI / lint (pull_request) Failing after 37s

Details

CI / helm (pull_request) Successful in 29s

Details

CI / quality (pull_request) Successful in 53s

Details

CI / typecheck (pull_request) Successful in 1m0s

Details

CI / unit_tests (pull_request) Failing after 4m17s

Details

CI / e2e_tests (pull_request) Failing after 15m21s

Details

CI / integration_tests (pull_request) Failing after 15m23s

Details

CI / security (pull_request) Failing after 15m25s

Details

CI / status-check (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

a24ceb61e1

HAL9000 removed the

State

Paused

label 2026-06-11 09:18:56 +00:00

HAL9000 commented

2026-06-11 19:54:51 +00:00

📋 Estimate: tier 1.

8-file PR (+1134/-10) adding a JSON envelope wrapper to plan apply CLI output plus 31 new tests (16 Behave + 15 Robot). Core logic change is bounded to one function in plan.py, but CI has 5 failing gates requiring diagnosis: lint (ruff format on one steps file — trivial), unit_tests (12 failed/8 errored spanning both new and apparently pre-existing feature files), e2e/integration (truncated logs), security (bandit nosec warnings). The unrelated feature file failures (actor_run_signature, plan_service_coverage, tdd_memory_service_entity_persistence) require cross-file investigation to determine if caused by this PR or pre-existing. Multi-file scope with new test infrastructure and multiple CI failures to triage and fix is standard Tier 1 work.

**📋 Estimate: tier 1.** 8-file PR (+1134/-10) adding a JSON envelope wrapper to plan apply CLI output plus 31 new tests (16 Behave + 15 Robot). Core logic change is bounded to one function in plan.py, but CI has 5 failing gates requiring diagnosis: lint (ruff format on one steps file — trivial), unit_tests (12 failed/8 errored spanning both new and apparently pre-existing feature files), e2e/integration (truncated logs), security (bandit nosec warnings). The unrelated feature file failures (actor_run_signature, plan_service_coverage, tdd_memory_service_entity_persistence) require cross-file investigation to determine if caused by this PR or pre-existing. Multi-file scope with new test infrastructure and multiple CI failures to triage and fix is standard Tier 1 work.

HAL9000 removed the controller-managed label 2026-06-12 18:49:31 +00:00

HAL9000 added the

State

Paused

label 2026-06-12 19:11:02 +00:00

HAL9000 added the controller-managed label 2026-06-13 03:34:41 +00:00

HAL9000 removed the

State

Paused

label 2026-06-13 03:43:48 +00:00

HAL9000 added 1 commit 2026-06-13 13:35:32 +00:00

chore: re-trigger CI [controller]

CI / lint (pull_request) Failing after 1m1s

Details

CI / quality (pull_request) Successful in 1m14s

Details

CI / typecheck (pull_request) Successful in 1m36s

Details

CI / helm (pull_request) Successful in 29s

Details

CI / security (pull_request) Successful in 1m36s

Details

CI / push-validation (pull_request) Successful in 41s

Details

CI / build (pull_request) Successful in 53s

Details

CI / integration_tests (pull_request) Failing after 3m59s

Details

CI / e2e_tests (pull_request) Failing after 5m6s

Details

CI / unit_tests (pull_request) Failing after 10m5s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / docker (pull_request) Has been skipped

Details

CI / status-check (pull_request) Failing after 3s

Details

9260b962a5

HAL9000 commented

2026-06-14 06:25:07 +00:00

📋 Estimate: tier 1.

Focused CLI change (one function + helper in plan.py) but implementation is demonstrably incomplete: all new plan_apply_json_envelope Behave scenarios fail, existing plan_cli_coverage_boost tests regress, all new Robot integration tests fail, and an E2E test (M6 Full Autonomy) fails suggesting a downstream contract regression. Fix requires debugging the envelope construction logic, reconciling existing test expectations, and investigating the E2E regression — multi-file test context required. Lint failure (ruff format on one step file) is trivial. Actor/plan_service/memory-service failures appear pre-existing. Standard tier-1 debugging and test-fix work.

**📋 Estimate: tier 1.** Focused CLI change (one function + helper in plan.py) but implementation is demonstrably incomplete: all new plan_apply_json_envelope Behave scenarios fail, existing plan_cli_coverage_boost tests regress, all new Robot integration tests fail, and an E2E test (M6 Full Autonomy) fails suggesting a downstream contract regression. Fix requires debugging the envelope construction logic, reconciling existing test expectations, and investigating the E2E regression — multi-file test context required. Lint failure (ruff format on one step file) is trivial. Actor/plan_service/memory-service failures appear pre-existing. Standard tier-1 debugging and test-fix work.

HAL9000 force-pushed pr-9817-plan-apply-json from 9260b962a5 to ff2b627772

2026-06-14 09:48:37 +00:00

Compare

HAL9000 added 1 commit 2026-06-14 18:24:50 +00:00

chore: re-trigger CI [controller]

CI / lint (pull_request) Failing after 36s

Details

CI / build (pull_request) Successful in 34s

Details

CI / typecheck (pull_request) Successful in 58s

Details

CI / helm (pull_request) Successful in 51s

Details

CI / security (pull_request) Successful in 1m13s

Details

CI / quality (pull_request) Successful in 57s

Details

CI / unit_tests (pull_request) Failing after 1m51s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / docker (pull_request) Has been skipped

Details

CI / push-validation (pull_request) Successful in 25s

Details

CI / integration_tests (pull_request) Failing after 8m17s

Details

CI / status-check (pull_request) Failing after 2s

Details

5e25a20de9

HAL9000 commented

2026-06-14 21:33:05 +00:00

📋 Estimate: tier 1.

Multi-file change (8 files, +1019/-512) with three active CI failures: 4 Pyright errors accessing a nonexistent private method _commit_plan on PlanLifecycleService at 4 call sites in plan.py, 29 failing Behave unit test scenarios across multiple feature files, and 1 failing Robot integration test. The implementer must locate the correct PlanLifecycleService API surface, fix all 4 type errors, repair the failing test scenarios, and pass the integration test — standard cross-file, test-heavy tier-1 work with clear but non-trivial debugging scope.

**📋 Estimate: tier 1.** Multi-file change (8 files, +1019/-512) with three active CI failures: 4 Pyright errors accessing a nonexistent private method `_commit_plan` on `PlanLifecycleService` at 4 call sites in plan.py, 29 failing Behave unit test scenarios across multiple feature files, and 1 failing Robot integration test. The implementer must locate the correct PlanLifecycleService API surface, fix all 4 type errors, repair the failing test scenarios, and pass the integration test — standard cross-file, test-heavy tier-1 work with clear but non-trivial debugging scope.

HAL9000 force-pushed pr-9817-plan-apply-json from 5e25a20de9 to 2305758965

2026-06-14 22:21:39 +00:00

Compare

HAL9000 force-pushed pr-9817-plan-apply-json from 2305758965 to 208988cdde

2026-06-15 00:23:28 +00:00

Compare

HAL9001 approved these changes 2026-06-15 01:06:42 +00:00

HAL9001 left a comment

✅ Approved

Reviewed at commit 208988c.

Confidence: high.

**✅ Approved** Reviewed at commit `208988c`. Confidence: high.

HAL9000 added the auto/claimed-merge label 2026-06-15 01:07:18 +00:00

HAL9000 commented

2026-06-15 01:07:18 +00:00

Claimed by merge_drive.py (pid 2329255) until 2026-06-15T02:37:18.190690+00:00.

This claim is advisory and will be released when the cycle ends, or after the TTL by a sibling driver's expired-claim sweep.

Claimed by `merge_drive.py` (pid 2329255) until `2026-06-15T02:37:18.190690+00:00`. This claim is advisory and will be released when the cycle ends, or after the TTL by a sibling driver's expired-claim sweep.

HAL9000 force-pushed pr-9817-plan-apply-json from 208988cdde to 372ef09153

2026-06-15 01:07:21 +00:00

Compare

HAL9001 approved these changes 2026-06-15 01:23:45 +00:00

HAL9001 left a comment

Approved by the controller reviewer stage (workflow 420).

HAL9000 merged commit 39921a5868 into master

2026-06-15 01:23:46 +00:00

HAL9000 removed the auto/claimed-merge label 2026-06-15 01:23:47 +00:00

HAL9000 referenced this issue from a commit

2026-06-15 01:23:48 +00:00

Merge pull request 'fix(cli): wrap plan apply --format json output in spec-required JSON envelope' (#11018) from pr-9817-plan-apply-json into master

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#11018

fix(cli): wrap plan apply --format json output in spec-required JSON envelope #11018

Summary

Changes

Testing

Issue Reference

Code Review — PR #11018: fix(cli): wrap plan apply --format json output in spec-required JSON envelope

CI Status

Blocking Issues Found

1. Spec Deviation — timing Field Structure

2. Spec Deviation — data.validation Field Structure

3. Spec Deviation — Missing data.plan_id Field

4. Robot Integration Tests Use Undefined Keyword

5. Robot Helper Uses Plain Dicts Instead of LifecyclePlan Domain Objects

6. Commit Message Type Is Invalid (bug is not a Conventional Changelog type)

7. PR Missing Milestone

8. PR Missing Type/ Label

9. Forgejo Dependency Direction Not Set

10. Branch Name Does Not Match Issue Metadata

Minor (Non-Blocking) Observations

Summary

Code Review — PR #11018: `fix(cli): wrap plan apply --format json output in spec-required JSON envelope`

1. Spec Deviation — `timing` Field Structure

2. Spec Deviation — `data.validation` Field Structure

3. Spec Deviation — Missing `data.plan_id` Field

6. Commit Message Type Is Invalid (`bug` is not a Conventional Changelog type)

8. PR Missing `Type/` Label