test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #495

Closed
opened 2026-03-02 01:56:33 +00:00 by freemo · 8 comments
Owner

Metadata

  • Commit Message: test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure
  • Branch: test/m4-acceptance-gate

Parent

Epic: #401 (E2E Integration Testing)
Related: #405 (M4 E2E test creation)

Description

Run the existing M4 E2E verification suite (robot/m4_e2e_verification.robot) against the complete v3.3.0 implementation. Update any tests that do not pass against the final implementation. Confirm all milestone acceptance criteria from the v3.3.0 milestone description are satisfied. This issue is the final gate before closing milestone v3.3.0.

The existing E2E test suite was created proactively via #405 while the milestone was still in progress. Once all remaining feature work in v3.3.0 is complete, this issue verifies the full acceptance criteria end-to-end and serves as the last issue closed before the milestone itself is closed.

Acceptance Criteria

Success Criteria Verification (from milestone description)

  • agents plan use local/refactor-action local/monorepo and agents plan execute <plan_id> spawn multiple subplans during Execute
  • agents plan tree <plan_id> displays the subplan tree correctly
  • agents plan diff <plan_id> shows merged results

Technical Criteria (from milestone description)

  • Subplans spawned during Execute
  • Parallel subplan execution works with max_parallel
  • Three-way merge combines non-conflicting changes; conflicts surfaced
  • Parent plan tracks all subplan statuses

Quality Gates

  • nox -s integration_tests passes with m4_e2e_verification.robot suite green
  • nox -s coverage_report confirms coverage >=97%
  • nox (all default sessions) passes

Subtasks

  • Run nox -s integration_tests and verify m4_e2e_verification.robot passes
  • Update tests in m4_e2e_verification.robot / helper_m4_e2e_verification.py if any fail against the final v3.3.0 implementation
  • Verify all acceptance criteria above are satisfied
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors
  • Close milestone v3.3.0 after this issue is merged

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • Milestone v3.3.0 is closed after this issue is merged.
## Metadata - **Commit Message**: `test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure` - **Branch**: `test/m4-acceptance-gate` ## Parent Epic: #401 (E2E Integration Testing) Related: #405 (M4 E2E test creation) ## Description Run the existing M4 E2E verification suite (`robot/m4_e2e_verification.robot`) against the complete v3.3.0 implementation. Update any tests that do not pass against the final implementation. Confirm all milestone acceptance criteria from the v3.3.0 milestone description are satisfied. This issue is the final gate before closing milestone v3.3.0. The existing E2E test suite was created proactively via #405 while the milestone was still in progress. Once all remaining feature work in v3.3.0 is complete, this issue verifies the full acceptance criteria end-to-end and serves as the last issue closed before the milestone itself is closed. ## Acceptance Criteria ### Success Criteria Verification (from milestone description) - [x] `agents plan use local/refactor-action local/monorepo` and `agents plan execute <plan_id>` spawn multiple subplans during Execute - [x] `agents plan tree <plan_id>` displays the subplan tree correctly - [x] `agents plan diff <plan_id>` shows merged results ### Technical Criteria (from milestone description) - [x] Subplans spawned during Execute - [x] Parallel subplan execution works with max_parallel - [x] Three-way merge combines non-conflicting changes; conflicts surfaced - [x] Parent plan tracks all subplan statuses ### Quality Gates - [x] `nox -s integration_tests` passes with `m4_e2e_verification.robot` suite green - [x] `nox -s coverage_report` confirms coverage >=97% - [x] `nox` (all default sessions) passes ## Subtasks - [x] Run `nox -s integration_tests` and verify `m4_e2e_verification.robot` passes - [x] Update tests in `m4_e2e_verification.robot` / `helper_m4_e2e_verification.py` if any fail against the final v3.3.0 implementation - [x] Verify all acceptance criteria above are satisfied - [x] Verify coverage >=97% via `nox -s coverage_report` - [x] Run `nox` (all default sessions), fix any errors - [ ] Close milestone v3.3.0 after this issue is merged ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - Milestone v3.3.0 is closed after this issue is merged.
freemo added this to the v3.3.0 milestone 2026-03-02 01:57:09 +00:00
Member

Implementation Notes — Initial Verification

M4 E2E Verification Suite Results

All 7 test cases in robot/m4_e2e_verification.robot pass against the current master implementation:

  1. Plan Spawns Multiple Subplans During Execute — PASS

    • Plan model correctly creates subplan hierarchy via SubplanConfig with ExecutionMode.PARALLEL
    • PlanIdentity correctly maintains parent_plan_id and root_plan_id links
    • as_cli_dict() correctly reports subplan_count
  2. View Subplan Tree Via Plan Tree — PASS

    • Parent-child hierarchy correctly represented via SubplanStatus entries
    • Completed subplans include changeset_summary and files_changed
    • Queued subplans correctly have no changeset data
  3. Verify Merged Results Via Plan Diff — PASS

    • CLI plan diff command correctly invokes the apply service diff method
    • Returns unified diff output showing changes from multiple subplans
  4. Parallel Subplan Execution With Max Parallel — PASS

    • SubplanConfig validation enforces max_parallel bounds (1-50)
    • SubplanFailureHandler correctly evaluates fail_fast and retry logic
    • All ExecutionMode enum values validated
  5. Three-Way Merge Combines Non-Conflicting Changes — PASS

    • GitMergeStrategy.merge() correctly combines non-conflicting edits
    • SequentialMergeStrategy correctly returns last-wins result
    • MergeResult dataclass fields validated
  6. Merge Conflicts Are Surfaced Correctly — PASS

    • GitMergeStrategy correctly detects conflicting edits on same line
    • Standard git conflict markers (<<<<<<<, =======, >>>>>>>) present
    • conflict_markers list correctly reports conflict regions
    • SubplanMergeStrategy enum values validated
  7. Parent Plan Tracks All Subplan Statuses — PASS

    • Parent tracks COMPLETE, ERRORED, and PROCESSING subplan statuses
    • Retry tracking via SubplanAttempt and previous_attempts works correctly
    • SubplanFailureHandler correctly distinguishes retriable vs non-retriable errors
    • Exhausted retries correctly block further retry attempts

M4 Correction + Subplan Smoke Tests

All 8 test cases in robot/m4_correction_subplan_smoke.robot also pass:

  • Correction revert/append/dry-run via CLI
  • Sequential and parallel subplan status tracking
  • SubplanFailureHandler evaluation
  • Fixture loading
  • Full correction + subplan flow

Design Decision

No test updates were required — all existing M4 E2E tests pass against the final v3.3.0 implementation without modification. The test suite created proactively via #405 correctly anticipated the final API surface.

Next Steps

  • Running nox -s coverage_report to verify >=97% coverage
  • Running full nox suite to verify all quality gates pass
  • Verifying all milestone acceptance criteria from v3.3.0 description
## Implementation Notes — Initial Verification ### M4 E2E Verification Suite Results All 7 test cases in `robot/m4_e2e_verification.robot` pass against the current `master` implementation: 1. **Plan Spawns Multiple Subplans During Execute** — PASS - `Plan` model correctly creates subplan hierarchy via `SubplanConfig` with `ExecutionMode.PARALLEL` - `PlanIdentity` correctly maintains `parent_plan_id` and `root_plan_id` links - `as_cli_dict()` correctly reports `subplan_count` 2. **View Subplan Tree Via Plan Tree** — PASS - Parent-child hierarchy correctly represented via `SubplanStatus` entries - Completed subplans include `changeset_summary` and `files_changed` - Queued subplans correctly have no changeset data 3. **Verify Merged Results Via Plan Diff** — PASS - CLI `plan diff` command correctly invokes the apply service diff method - Returns unified diff output showing changes from multiple subplans 4. **Parallel Subplan Execution With Max Parallel** — PASS - `SubplanConfig` validation enforces `max_parallel` bounds (1-50) - `SubplanFailureHandler` correctly evaluates `fail_fast` and retry logic - All `ExecutionMode` enum values validated 5. **Three-Way Merge Combines Non-Conflicting Changes** — PASS - `GitMergeStrategy.merge()` correctly combines non-conflicting edits - `SequentialMergeStrategy` correctly returns last-wins result - `MergeResult` dataclass fields validated 6. **Merge Conflicts Are Surfaced Correctly** — PASS - `GitMergeStrategy` correctly detects conflicting edits on same line - Standard git conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) present - `conflict_markers` list correctly reports conflict regions - `SubplanMergeStrategy` enum values validated 7. **Parent Plan Tracks All Subplan Statuses** — PASS - Parent tracks COMPLETE, ERRORED, and PROCESSING subplan statuses - Retry tracking via `SubplanAttempt` and `previous_attempts` works correctly - `SubplanFailureHandler` correctly distinguishes retriable vs non-retriable errors - Exhausted retries correctly block further retry attempts ### M4 Correction + Subplan Smoke Tests All 8 test cases in `robot/m4_correction_subplan_smoke.robot` also pass: - Correction revert/append/dry-run via CLI - Sequential and parallel subplan status tracking - SubplanFailureHandler evaluation - Fixture loading - Full correction + subplan flow ### Design Decision No test updates were required — all existing M4 E2E tests pass against the final v3.3.0 implementation without modification. The test suite created proactively via #405 correctly anticipated the final API surface. ### Next Steps - Running `nox -s coverage_report` to verify >=97% coverage - Running full `nox` suite to verify all quality gates pass - Verifying all milestone acceptance criteria from v3.3.0 description
Member

Final M4 Acceptance Gate Results

PR #529 submitted. All M4 acceptance criteria validated against the complete v3.3.0 implementation.

Test Results

Suite Tests Passed Failed
M4 E2E Verification (robot) 7 7 0
M4 Correction/Subplan Smoke (robot) 8 8 0
Full Unit Tests (behave) 7817 7817 0
Full Integration Tests (robot via pabot) 1082 1082 0

Quality Gates

Gate Result
lint PASS
format PASS
typecheck PASS (0 errors)
coverage PASS (97%, threshold 97%)
security_scan PASS
dead_code PASS
docs PASS
build PASS
benchmark PASS

Changes Made

Updated server_mode test expectations from "disabled" to "stubbed" in 4 test files (Behave + Robot Framework) to align with the server stubs configuration introduced in earlier v3.3.0 work. No M4-specific test changes were needed — all M4 tests passed on first run.

Subtask Checklist

  • Run M4 E2E verification suite against final implementation
  • Update failing tests (server_mode expectations only)
  • Verify all M4 acceptance criteria (subplan spawning, plan tree, plan diff, parallel execution, three-way merge, conflict surfacing, parent tracking)
  • Coverage >= 97%
  • All nox quality gates pass
  • CHANGELOG updated
  • PR created with milestone v3.3.0, Type/Testing label
## Final M4 Acceptance Gate Results PR #529 submitted. All M4 acceptance criteria validated against the complete v3.3.0 implementation. ### Test Results | Suite | Tests | Passed | Failed | |---|---|---|---| | M4 E2E Verification (robot) | 7 | 7 | 0 | | M4 Correction/Subplan Smoke (robot) | 8 | 8 | 0 | | Full Unit Tests (behave) | 7817 | 7817 | 0 | | Full Integration Tests (robot via pabot) | 1082 | 1082 | 0 | ### Quality Gates | Gate | Result | |---|---| | lint | PASS | | format | PASS | | typecheck | PASS (0 errors) | | coverage | PASS (97%, threshold 97%) | | security_scan | PASS | | dead_code | PASS | | docs | PASS | | build | PASS | | benchmark | PASS | ### Changes Made Updated `server_mode` test expectations from `"disabled"` to `"stubbed"` in 4 test files (Behave + Robot Framework) to align with the server stubs configuration introduced in earlier v3.3.0 work. No M4-specific test changes were needed — all M4 tests passed on first run. ### Subtask Checklist - [x] Run M4 E2E verification suite against final implementation - [x] Update failing tests (server_mode expectations only) - [x] Verify all M4 acceptance criteria (subplan spawning, plan tree, plan diff, parallel execution, three-way merge, conflict surfacing, parent tracking) - [x] Coverage >= 97% - [x] All nox quality gates pass - [x] CHANGELOG updated - [x] PR created with milestone v3.3.0, Type/Testing label
Member

Implementation Notes — Rebase and Re-verification (Day 24)

Context

PR #529 was closed without merging because the test/m4-acceptance-gate branch was based on an older master. Several dependency PRs needed to merge first. As of today, all dependencies are merged into master.

Rebase Process

Rebased test/m4-acceptance-gate (commit a75920e0) onto latest origin/master (10d5d983). Four files had merge conflicts:

  1. CHANGELOG.md — Master gained ~15 new changelog entries since the branch was created. Resolved by keeping all of master's entries and adding the #495 entry at the top of the Unreleased section.

  2. features/cli_core.feature — Two scenarios checking server_mode. The original commit changed expectations from "disabled" to "stubbed", but master has since updated these to check only for key existence (not specific value). Resolution: kept master's version (key existence check), which is more robust since the value depends on whether a server.url is configured.

  3. robot/cli_core.robot — Same pattern as cli_core.feature. Two test cases for server_mode in JSON and plain formats. Resolution: kept master's version.

  4. robot/helper_server_stubs.py — The server_mode() helper function. Master expects "disabled" (correct for a clean temp dir with no server URL). The old branch expected "stubbed" and had a duplicate resolve_server_mode() call outside the temp dir context. Resolution: kept master's version which correctly tests in a clean environment.

Additionally fixed an inconsistency in robot/server_stubs.robot where the documentation string from the old commit said "stubbed when stub URL is configured" but the actual test checks for "disabled when no URL". Reverted to master's documentation.

Post-Rebase Diff

After conflict resolution, the rebased commit contains only one change: the CHANGELOG.md entry documenting the M4 acceptance gate validation. All server_mode test changes from the original commit are no longer needed — master already has the correct expectations.

Design Decision: server_mode Test Expectations

The resolve_server_mode() function (in cleveragents.cli.commands.server) returns:

  • "disabled" when no server.url is configured
  • "stubbed" when a URL is present but the client is not yet implemented

In test environments using a clean temp dir (no config), the correct expected value is "disabled". Master's tests correctly use this expectation. The original branch's change to "stubbed" was incorrect for the test scenario (though the tests may have passed due to a bug where a second resolve_server_mode() call was made outside the temp dir context).

Full Verification Results

Suite Tests Passed Failed
Unit Tests (Behave) 8134 8134 0
Integration Tests (Robot via pabot) 1137 1137 0
Quality Gate Result
lint PASS
format PASS
typecheck PASS (0 errors)
coverage_report PASS (97%, threshold 97%)
security_scan PASS
dead_code PASS
docs PASS
build PASS
benchmark PASS

M4 Acceptance Criteria Verification

All 7 M4 E2E verification tests in robot/m4_e2e_verification.robot and all 8 tests in robot/m4_correction_subplan_smoke.robot pass against the final v3.3.0 implementation without any modifications.

Verified criteria:

  • Plan model correctly spawns subplans via SubplanConfig with ExecutionMode.PARALLEL
  • PlanIdentity maintains parent/root plan links
  • SubplanConfig validation enforces max_parallel bounds (1-50)
  • GitMergeStrategy.merge() combines non-conflicting changes correctly
  • GitMergeStrategy detects conflicts and produces standard git conflict markers
  • SubplanStatus tracks COMPLETE, ERRORED, PROCESSING subplan states
  • SubplanFailureHandler correctly evaluates fail_fast and retry logic

PR

New PR #560 submitted (replaces closed PR #529). Branch: test/m4-acceptance-gate, commit 9f5430fc.

## Implementation Notes — Rebase and Re-verification (Day 24) ### Context PR #529 was closed without merging because the `test/m4-acceptance-gate` branch was based on an older master. Several dependency PRs needed to merge first. As of today, all dependencies are merged into master. ### Rebase Process Rebased `test/m4-acceptance-gate` (commit `a75920e0`) onto latest `origin/master` (`10d5d983`). Four files had merge conflicts: 1. **`CHANGELOG.md`** — Master gained ~15 new changelog entries since the branch was created. Resolved by keeping all of master's entries and adding the #495 entry at the top of the Unreleased section. 2. **`features/cli_core.feature`** — Two scenarios checking `server_mode`. The original commit changed expectations from `"disabled"` to `"stubbed"`, but master has since updated these to check only for key existence (not specific value). Resolution: kept master's version (key existence check), which is more robust since the value depends on whether a `server.url` is configured. 3. **`robot/cli_core.robot`** — Same pattern as `cli_core.feature`. Two test cases for `server_mode` in JSON and plain formats. Resolution: kept master's version. 4. **`robot/helper_server_stubs.py`** — The `server_mode()` helper function. Master expects `"disabled"` (correct for a clean temp dir with no server URL). The old branch expected `"stubbed"` and had a duplicate `resolve_server_mode()` call outside the temp dir context. Resolution: kept master's version which correctly tests in a clean environment. Additionally fixed an inconsistency in `robot/server_stubs.robot` where the documentation string from the old commit said "stubbed when stub URL is configured" but the actual test checks for "disabled when no URL". Reverted to master's documentation. ### Post-Rebase Diff After conflict resolution, the rebased commit contains only one change: the CHANGELOG.md entry documenting the M4 acceptance gate validation. All `server_mode` test changes from the original commit are no longer needed — master already has the correct expectations. ### Design Decision: server_mode Test Expectations The `resolve_server_mode()` function (in `cleveragents.cli.commands.server`) returns: - `"disabled"` when no `server.url` is configured - `"stubbed"` when a URL is present but the client is not yet implemented In test environments using a clean temp dir (no config), the correct expected value is `"disabled"`. Master's tests correctly use this expectation. The original branch's change to `"stubbed"` was incorrect for the test scenario (though the tests may have passed due to a bug where a second `resolve_server_mode()` call was made outside the temp dir context). ### Full Verification Results | Suite | Tests | Passed | Failed | |---|---|---|---| | Unit Tests (Behave) | 8134 | 8134 | 0 | | Integration Tests (Robot via pabot) | 1137 | 1137 | 0 | | Quality Gate | Result | |---|---| | lint | PASS | | format | PASS | | typecheck | PASS (0 errors) | | coverage_report | PASS (97%, threshold 97%) | | security_scan | PASS | | dead_code | PASS | | docs | PASS | | build | PASS | | benchmark | PASS | ### M4 Acceptance Criteria Verification All 7 M4 E2E verification tests in `robot/m4_e2e_verification.robot` and all 8 tests in `robot/m4_correction_subplan_smoke.robot` pass against the final v3.3.0 implementation without any modifications. Verified criteria: - `Plan` model correctly spawns subplans via `SubplanConfig` with `ExecutionMode.PARALLEL` - `PlanIdentity` maintains parent/root plan links - `SubplanConfig` validation enforces `max_parallel` bounds (1-50) - `GitMergeStrategy.merge()` combines non-conflicting changes correctly - `GitMergeStrategy` detects conflicts and produces standard git conflict markers - `SubplanStatus` tracks COMPLETE, ERRORED, PROCESSING subplan states - `SubplanFailureHandler` correctly evaluates fail_fast and retry logic ### PR New PR #560 submitted (replaces closed PR #529). Branch: `test/m4-acceptance-gate`, commit `9f5430fc`.
Member

Implementation Notes — Code Review Findings Addressed

PR #560 has been updated with the following changes to address the code review findings:

CLI Coverage Gaps Closed

Three of the seven M4 acceptance criteria (plan use, plan execute, plan tree) previously lacked actual CLI invocation in their integration tests. Added Robot Framework test cases that invoke the Typer CLI via CliRunner.invoke():

  1. CLI Plan Use Creates Plan With Subplan Config — Invokes plan use local/refactor-action local/monorepo with patched _get_lifecycle_service. Verifies get_action_by_name and use_action are called with correct arguments.

  2. CLI Plan Execute Transitions With Subplans — Invokes plan execute <plan_id> with patched lifecycle service. Verifies plan transitions to Execute phase and CLI output contains the plan ID.

  3. CLI Plan Tree Displays Subplan Hierarchy — Invokes plan tree <plan_id> --format json with patched get_container. Uses real Decision objects (PROMPT_DEFINITION, STRATEGY_CHOICE, SUBPLAN_PARALLEL_SPAWN, SUBPLAN_SPAWN). Verifies JSON output structure.

The existing plan diff test (plan-diff subcommand) already used runner.invoke(plan_app, ["diff", ...]) and was deemed adequate.

Pre-existing Bug Fix

Fixed features/repositories_uncovered_branches.feature:110 (repo branch cov upsert profile with schema version mismatch). Root cause: plain sessionmaker returns a new session per call, causing session isolation between Given/When steps in in-memory SQLite. Fix: scoped_session(sessionmaker(...)) for thread-local session sharing.

Minor Fixes

  • CONTRIBUTORS.md: corrected alphabetical ordering of "Rui Hu" entry
  • CHANGELOG.md: updated #495 entry to reflect CLI test additions

Quality Gate Results

All nox sessions pass. Coverage at 97%. 8524 unit test scenarios pass. 1110/1118 integration tests pass (8 pre-existing failures in cli_plan_context_commands.robot — identical on master, tracked as follow-up).

## Implementation Notes — Code Review Findings Addressed PR #560 has been updated with the following changes to address the code review findings: ### CLI Coverage Gaps Closed Three of the seven M4 acceptance criteria (`plan use`, `plan execute`, `plan tree`) previously lacked actual CLI invocation in their integration tests. Added Robot Framework test cases that invoke the Typer CLI via `CliRunner.invoke()`: 1. **`CLI Plan Use Creates Plan With Subplan Config`** — Invokes `plan use local/refactor-action local/monorepo` with patched `_get_lifecycle_service`. Verifies `get_action_by_name` and `use_action` are called with correct arguments. 2. **`CLI Plan Execute Transitions With Subplans`** — Invokes `plan execute <plan_id>` with patched lifecycle service. Verifies plan transitions to Execute phase and CLI output contains the plan ID. 3. **`CLI Plan Tree Displays Subplan Hierarchy`** — Invokes `plan tree <plan_id> --format json` with patched `get_container`. Uses real `Decision` objects (`PROMPT_DEFINITION`, `STRATEGY_CHOICE`, `SUBPLAN_PARALLEL_SPAWN`, `SUBPLAN_SPAWN`). Verifies JSON output structure. The existing `plan diff` test (`plan-diff` subcommand) already used `runner.invoke(plan_app, ["diff", ...])` and was deemed adequate. ### Pre-existing Bug Fix Fixed `features/repositories_uncovered_branches.feature:110` (`repo branch cov upsert profile with schema version mismatch`). Root cause: plain `sessionmaker` returns a new session per call, causing session isolation between Given/When steps in in-memory SQLite. Fix: `scoped_session(sessionmaker(...))` for thread-local session sharing. ### Minor Fixes - CONTRIBUTORS.md: corrected alphabetical ordering of "Rui Hu" entry - CHANGELOG.md: updated #495 entry to reflect CLI test additions ### Quality Gate Results All nox sessions pass. Coverage at 97%. 8524 unit test scenarios pass. 1110/1118 integration tests pass (8 pre-existing failures in `cli_plan_context_commands.robot` — identical on master, tracked as follow-up).
Author
Owner

PM Acknowledgment (Day 31):

Thank you @hurui200320 for addressing the review fixes.

Status: M4 is now feature-complete. Bug #563 (race condition) fix merged Day 30 (PR #619). Bug #606 closed Day 29. TDD #635 closed as Wont Do (Day 31).

Action needed: Please verify that the M4 E2E acceptance criteria still pass against the current develop branch (which has received 17 new merges since your last push). Rebase if needed.

Priority: HIGH — M4 closure depends on this verification.

**PM Acknowledgment (Day 31)**: Thank you @hurui200320 for addressing the review fixes. **Status**: M4 is now feature-complete. Bug #563 (race condition) fix merged Day 30 (PR #619). Bug #606 closed Day 29. TDD #635 closed as Wont Do (Day 31). **Action needed**: Please verify that the M4 E2E acceptance criteria still pass against the current `develop` branch (which has received 17 new merges since your last push). Rebase if needed. **Priority**: HIGH — M4 closure depends on this verification.
Member

Implementation Notes — Re-verification and CLI Coverage Gap Closure (Day 31)

Context

PM requested re-verification of M4 E2E acceptance criteria against the current master branch, which has received 17 new merges since PR #560 was merged on Day 25. Additionally, the code review of PR #560 by @CoreRasurae identified 4 CLI test coverage gaps (C1–C4) that needed to be addressed.

Approach

Rebased on latest master (82def111) and strengthened the 4 CLI integration tests in robot/helper_m4_e2e_verification.py to close the review-identified gaps. No new test cases were added — only existing test assertions were strengthened.

Changes

1. cli_plan_use — C1 gap closed

Module: robot/helper_m4_e2e_verification.py, function cli_plan_use

The test previously only verified that get_action_by_name("local/refactor-action") was called and that use_action was invoked once. It did not verify the project argument local/monorepo was passed through.

Fix: Added assertion that use_action receives project_links keyword argument containing a ProjectLink with project_name="local/monorepo". This confirms the CLI correctly converts the positional projects argument (line 1420 of plan.py) to ProjectLink objects and passes them to the service.

Also simplified the fragile multi-strategy action_name extraction code (which tried 3 different approaches to find the argument) to use the direct kwargs.get("action_name") path, since the CLI always calls use_action with keyword arguments.

2. cli_plan_execute — C2 gap closed

Module: robot/helper_m4_e2e_verification.py, function cli_plan_execute

The test previously only verified execute_plan was called and the plan ID appeared in output. It did not verify that the Execute phase transition or subplan data was reflected.

Fix: Added two categories of assertions:

  • CLI output level: Verifies the output contains "execute" (case-insensitive), confirming the CLI reflects the phase transition.
  • Domain model level: Outside the patch context, verifies post_plan.phase == PlanPhase.EXECUTE, len(post_plan.subplan_statuses) == 3, and post_plan.has_subplans == True. This confirms the M4 criterion "spawn multiple subplans during Execute" is satisfied at the domain level.

Note: The CLI's _print_lifecycle_plan and _plan_spec_dict functions do not render subplan_count in their output, so subplan information cannot be directly verified from CLI output. The domain-level check compensates for this.

3. cli_plan_tree — C3 gap closed

Module: robot/helper_m4_e2e_verification.py, function cli_plan_tree

The test previously did shallow substring matching ("subplan_parallel_spawn" in json.dumps(tree_data)) which could pass even with a malformed tree.

Fix: Replaced with full structural hierarchy verification:

  1. Tree is a list with exactly 1 root node
  2. Root node type is prompt_definition with correct decision_id
  3. Root has 1 child of type strategy_choice
  4. Strategy node has 1 child of type subplan_parallel_spawn
  5. Parallel spawn has exactly 2 children, both of type subplan_spawn
  6. Total node count verified as 5 via recursive _count_nodes helper

This validates the M4 criterion "plan tree displays the subplan tree correctly" with structural precision matching the decision hierarchy defined in the specification.

4. plan_diff — C4 gap closed

Module: robot/helper_m4_e2e_verification.py, function plan_diff

The test previously only verified diff was called once with the correct plan_id. It did not verify the diff content appeared in the CLI output.

Fix: Added assertions that the mock diff content ("api.py" and "ui.py" file paths) appears in result.output. This confirms the M4 criterion "plan diff shows merged results" — the CLI actually renders the diff output (via console.print(output) at line 2267 of plan.py).

Quality Gate Results

Stage Result
lint PASS
format PASS (1401 files unchanged)
typecheck PASS (0 errors)
unit_tests PASS (10,431 scenarios, 0 failures)
integration_tests PASS (1,470 tests, 0 failures)
coverage_report PASS (98%, threshold 97%)
security_scan PASS
dead_code PASS
docs PASS
build PASS

PR

New PR #681 submitted. Branch: test/m4-acceptance-gate, commit 97c5dfbc.

## Implementation Notes — Re-verification and CLI Coverage Gap Closure (Day 31) ### Context PM requested re-verification of M4 E2E acceptance criteria against the current master branch, which has received 17 new merges since PR #560 was merged on Day 25. Additionally, the code review of PR #560 by @CoreRasurae identified 4 CLI test coverage gaps (C1–C4) that needed to be addressed. ### Approach Rebased on latest master (`82def111`) and strengthened the 4 CLI integration tests in `robot/helper_m4_e2e_verification.py` to close the review-identified gaps. No new test cases were added — only existing test assertions were strengthened. ### Changes #### 1. `cli_plan_use` — C1 gap closed **Module:** `robot/helper_m4_e2e_verification.py`, function `cli_plan_use` The test previously only verified that `get_action_by_name("local/refactor-action")` was called and that `use_action` was invoked once. It did not verify the **project argument** `local/monorepo` was passed through. **Fix:** Added assertion that `use_action` receives `project_links` keyword argument containing a `ProjectLink` with `project_name="local/monorepo"`. This confirms the CLI correctly converts the positional `projects` argument (line 1420 of `plan.py`) to `ProjectLink` objects and passes them to the service. Also simplified the fragile multi-strategy `action_name` extraction code (which tried 3 different approaches to find the argument) to use the direct `kwargs.get("action_name")` path, since the CLI always calls `use_action` with keyword arguments. #### 2. `cli_plan_execute` — C2 gap closed **Module:** `robot/helper_m4_e2e_verification.py`, function `cli_plan_execute` The test previously only verified `execute_plan` was called and the plan ID appeared in output. It did not verify that the Execute phase transition or subplan data was reflected. **Fix:** Added two categories of assertions: - **CLI output level:** Verifies the output contains "execute" (case-insensitive), confirming the CLI reflects the phase transition. - **Domain model level:** Outside the `patch` context, verifies `post_plan.phase == PlanPhase.EXECUTE`, `len(post_plan.subplan_statuses) == 3`, and `post_plan.has_subplans == True`. This confirms the M4 criterion "spawn multiple subplans during Execute" is satisfied at the domain level. Note: The CLI's `_print_lifecycle_plan` and `_plan_spec_dict` functions do not render `subplan_count` in their output, so subplan information cannot be directly verified from CLI output. The domain-level check compensates for this. #### 3. `cli_plan_tree` — C3 gap closed **Module:** `robot/helper_m4_e2e_verification.py`, function `cli_plan_tree` The test previously did shallow substring matching (`"subplan_parallel_spawn" in json.dumps(tree_data)`) which could pass even with a malformed tree. **Fix:** Replaced with full structural hierarchy verification: 1. Tree is a list with exactly 1 root node 2. Root node type is `prompt_definition` with correct `decision_id` 3. Root has 1 child of type `strategy_choice` 4. Strategy node has 1 child of type `subplan_parallel_spawn` 5. Parallel spawn has exactly 2 children, both of type `subplan_spawn` 6. Total node count verified as 5 via recursive `_count_nodes` helper This validates the M4 criterion "plan tree displays the subplan tree correctly" with structural precision matching the decision hierarchy defined in the specification. #### 4. `plan_diff` — C4 gap closed **Module:** `robot/helper_m4_e2e_verification.py`, function `plan_diff` The test previously only verified `diff` was called once with the correct `plan_id`. It did not verify the diff content appeared in the CLI output. **Fix:** Added assertions that the mock diff content (`"api.py"` and `"ui.py"` file paths) appears in `result.output`. This confirms the M4 criterion "plan diff shows merged results" — the CLI actually renders the diff output (via `console.print(output)` at line 2267 of `plan.py`). ### Quality Gate Results | Stage | Result | |---|---| | lint | PASS | | format | PASS (1401 files unchanged) | | typecheck | PASS (0 errors) | | unit_tests | PASS (10,431 scenarios, 0 failures) | | integration_tests | PASS (1,470 tests, 0 failures) | | coverage_report | PASS (98%, threshold 97%) | | security_scan | PASS | | dead_code | PASS | | docs | PASS | | build | PASS | ### PR New PR #681 submitted. Branch: `test/m4-acceptance-gate`, commit `97c5dfbc`.
Member

Implementation Notes: Review Fix Pass (PR !681, commit 2e8ea557)

Context

The self-review on PR !681 identified 3 blocking and 7 non-blocking issues. This comment documents the design decisions and rationale for each fix.


Blocking Fix #1: Test Tautology in cli_plan_execute()

Problem: The assertions at helper_m4_e2e_verification.py lines 849-858 (old file) checked post_plan.phase, post_plan.subplan_statuses, and post_plan.has_subplans outside the with patch(...) block. These values were set by the test itself via _mock_parent_plan(phase=PlanPhase.EXECUTE, subplan_statuses=statuses) — they could never fail, making them tautological.

Decision: Removed the tautological assertions entirely and replaced with a comment in helper_m4_e2e_cli.cli_plan_execute() explaining:

  • The CLI does not render subplan_count or has_subplans in its output.
  • Domain-level coverage for these properties is already provided by the spawn-subplans and parent-tracking domain tests in helper_m4_e2e_domain.py.
  • If a future CLI change renders subplan info, assertions on CLI output would become meaningful.

Risk: None. The assertions provided no real coverage; removing them eliminates a misleading signal.


Blocking Fix #2: Unguarded Positional Arg Access in plan_diff()

Problem: call_args[0][0] would raise IndexError if the production CLI called diff(plan_id=_ROOT_ULID) as a keyword argument (making call_args.args an empty tuple).

Decision: Applied the kwargs-with-positional-fallback pattern throughout all CLI tests:

plan_id_arg = call_args.kwargs.get("plan_id")
if plan_id_arg is None and call_args.args:
    plan_id_arg = call_args.args[0]

This pattern is now used consistently in plan_diff(), cli_plan_use() (for both action_name and project_links), making it robust against future refactors of the CLI command signatures.


Blocking Fix #3: File Split (500-Line Compliance)

Problem: helper_m4_e2e_verification.py was 1074 lines, more than 2x the CONTRIBUTING.md 500-line limit. The PR added 84 net lines, widening the gap.

Decision: Split into four modules along a natural domain/CLI boundary:

Module Lines Content
helper_m4_e2e_common.py 237 Constants, _fail(), mock assertion wrappers, _make_subplan_status() factory, _make_subplan_statuses(), _mock_parent_plan(), _mock_child_plan(), _FROZEN_NOW
helper_m4_e2e_domain.py 495 6 domain tests (spawn-subplans, plan-tree, parallel-max, merge-clean, merge-conflict, parent-tracking)
helper_m4_e2e_cli.py 426 4 CLI tests (plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree)
helper_m4_e2e_verification.py 81 Thin dispatcher — entry point unchanged for Robot Framework

The Robot Framework .robot file (m4_e2e_verification.robot) references helper_m4_e2e_verification.py as ${HELPER} and did not require any changes. The sys.path manipulation for src/ is in common.py; each sub-module re-includes it for safety (idempotent if _SRC not in sys.path guard).


Non-Blocking Fixes (4-10)

  • #4 Type guard: Added isinstance(pl, ProjectLink) check before iterating project_links in cli_plan_use(). Produces a clear _fail() diagnostic listing actual types if the contract is violated.

  • #5 _fail() wrappers: Extracted _assert_mock_called_once() and _assert_mock_called_once_with() into helper_m4_e2e_common. These catch AssertionError from unittest.mock and re-raise via _fail() to produce the standardised FAIL: prefix expected by the Robot Framework evaluation pattern. Applied to all 5 call sites (plan_diff, cli_plan_use x2, cli_plan_execute, cli_plan_tree).

  • #6 Symmetric fallback: project_links extraction in cli_plan_use() now uses call_kwargs.kwargs.get("project_links") with positional fallback at index 1, symmetric with the action_name extraction at index 0.

  • #7 Word-boundary regex: Replaced "execute" not in output_lower with re.search(r"\bexecute\b", result.output, re.IGNORECASE). This matches "execute" as a distinct word, avoiding false positives from incidental substrings while still matching the phase label in CLI output.

  • #8 DRY extractions: _make_subplan_status() factory (common module) replaces 3 separate inline SubplanStatus(...) construction blocks. _assert_exit_code() replaces 4 identical if result.exit_code != 0: _fail(...) blocks.

  • #9 Frozen timestamp: Replaced all datetime.now() calls (5 occurrences across mock factories) with _FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0). This eliminates subtly inconsistent timestamps within a single test run.

  • #10 Descriptive ULIDs: Renamed _DECISION_ULID_1 through _DECISION_ULID_5 to _DECISION_PROMPT_DEF, _DECISION_STRATEGY, _DECISION_PARALLEL_SPAWN, _DECISION_SPAWN_API, _DECISION_SPAWN_UI — self-documenting which decision type each constant represents.


Quality Gate Results

All nox stages pass across the entire codebase:

  • lint: PASS
  • format: PASS (1404 files unchanged)
  • typecheck: PASS (0 errors)
  • unit_tests: 10,431 scenarios, 0 failures
  • integration_tests: 1,452 tests (including all 10 M4 E2E), 0 failures
  • coverage_report: 98% (threshold 97%)
  • security_scan, dead_code, docs, build: all PASS
## Implementation Notes: Review Fix Pass (PR !681, commit 2e8ea557) ### Context The self-review on PR !681 identified 3 blocking and 7 non-blocking issues. This comment documents the design decisions and rationale for each fix. --- ### Blocking Fix #1: Test Tautology in `cli_plan_execute()` **Problem:** The assertions at `helper_m4_e2e_verification.py` lines 849-858 (old file) checked `post_plan.phase`, `post_plan.subplan_statuses`, and `post_plan.has_subplans` _outside_ the `with patch(...)` block. These values were set by the test itself via `_mock_parent_plan(phase=PlanPhase.EXECUTE, subplan_statuses=statuses)` — they could never fail, making them tautological. **Decision:** Removed the tautological assertions entirely and replaced with a comment in `helper_m4_e2e_cli.cli_plan_execute()` explaining: - The CLI does not render `subplan_count` or `has_subplans` in its output. - Domain-level coverage for these properties is already provided by the `spawn-subplans` and `parent-tracking` domain tests in `helper_m4_e2e_domain.py`. - If a future CLI change renders subplan info, assertions on CLI output would become meaningful. **Risk:** None. The assertions provided no real coverage; removing them eliminates a misleading signal. --- ### Blocking Fix #2: Unguarded Positional Arg Access in `plan_diff()` **Problem:** `call_args[0][0]` would raise `IndexError` if the production CLI called `diff(plan_id=_ROOT_ULID)` as a keyword argument (making `call_args.args` an empty tuple). **Decision:** Applied the kwargs-with-positional-fallback pattern throughout all CLI tests: ```python plan_id_arg = call_args.kwargs.get("plan_id") if plan_id_arg is None and call_args.args: plan_id_arg = call_args.args[0] ``` This pattern is now used consistently in `plan_diff()`, `cli_plan_use()` (for both `action_name` and `project_links`), making it robust against future refactors of the CLI command signatures. --- ### Blocking Fix #3: File Split (500-Line Compliance) **Problem:** `helper_m4_e2e_verification.py` was 1074 lines, more than 2x the CONTRIBUTING.md 500-line limit. The PR added 84 net lines, widening the gap. **Decision:** Split into four modules along a natural domain/CLI boundary: | Module | Lines | Content | |--------|-------|---------| | `helper_m4_e2e_common.py` | 237 | Constants, `_fail()`, mock assertion wrappers, `_make_subplan_status()` factory, `_make_subplan_statuses()`, `_mock_parent_plan()`, `_mock_child_plan()`, `_FROZEN_NOW` | | `helper_m4_e2e_domain.py` | 495 | 6 domain tests (spawn-subplans, plan-tree, parallel-max, merge-clean, merge-conflict, parent-tracking) | | `helper_m4_e2e_cli.py` | 426 | 4 CLI tests (plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree) | | `helper_m4_e2e_verification.py` | 81 | Thin dispatcher — entry point unchanged for Robot Framework | The Robot Framework `.robot` file (`m4_e2e_verification.robot`) references `helper_m4_e2e_verification.py` as `${HELPER}` and did not require any changes. The `sys.path` manipulation for `src/` is in `common.py`; each sub-module re-includes it for safety (idempotent `if _SRC not in sys.path` guard). --- ### Non-Blocking Fixes (4-10) - **#4 Type guard:** Added `isinstance(pl, ProjectLink)` check before iterating `project_links` in `cli_plan_use()`. Produces a clear `_fail()` diagnostic listing actual types if the contract is violated. - **#5 `_fail()` wrappers:** Extracted `_assert_mock_called_once()` and `_assert_mock_called_once_with()` into `helper_m4_e2e_common`. These catch `AssertionError` from `unittest.mock` and re-raise via `_fail()` to produce the standardised `FAIL:` prefix expected by the Robot Framework evaluation pattern. Applied to all 5 call sites (plan_diff, cli_plan_use x2, cli_plan_execute, cli_plan_tree). - **#6 Symmetric fallback:** `project_links` extraction in `cli_plan_use()` now uses `call_kwargs.kwargs.get("project_links")` with positional fallback at index 1, symmetric with the `action_name` extraction at index 0. - **#7 Word-boundary regex:** Replaced `"execute" not in output_lower` with `re.search(r"\bexecute\b", result.output, re.IGNORECASE)`. This matches "execute" as a distinct word, avoiding false positives from incidental substrings while still matching the phase label in CLI output. - **#8 DRY extractions:** `_make_subplan_status()` factory (common module) replaces 3 separate inline `SubplanStatus(...)` construction blocks. `_assert_exit_code()` replaces 4 identical `if result.exit_code != 0: _fail(...)` blocks. - **#9 Frozen timestamp:** Replaced all `datetime.now()` calls (5 occurrences across mock factories) with `_FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)`. This eliminates subtly inconsistent timestamps within a single test run. - **#10 Descriptive ULIDs:** Renamed `_DECISION_ULID_1` through `_DECISION_ULID_5` to `_DECISION_PROMPT_DEF`, `_DECISION_STRATEGY`, `_DECISION_PARALLEL_SPAWN`, `_DECISION_SPAWN_API`, `_DECISION_SPAWN_UI` — self-documenting which decision type each constant represents. --- ### Quality Gate Results All nox stages pass across the entire codebase: - lint: PASS - format: PASS (1404 files unchanged) - typecheck: PASS (0 errors) - unit_tests: 10,431 scenarios, 0 failures - integration_tests: 1,452 tests (including all 10 M4 E2E), 0 failures - coverage_report: 98% (threshold 97%) - security_scan, dead_code, docs, build: all PASS
Member

Implementation Notes — Second Review Fix Pass (PR !681, commit 5b0d934)

Context

The second self-review on PR !681 (posted as PR comment #59640) identified 3 major, 7 minor, and 5 nit issues. This comment documents the design decisions and rationale for each fix.


Major Fix M1: Tautological plan_tree() Domain Test

Problem: helper_m4_e2e_domain.plan_tree() built a tree dict from the SubplanStatus data it had itself injected into the parent plan, then asserted that dict contained those same IDs. This could never fail — the test was checking its own construction, not any system-under-test function.

Decision: Removed the tautological tree dict construction (old lines 120-136). Redocumented the test to clarify it validates two things:

  1. SubplanStatus field completenesssubplan_id, action_name, changeset_summary, files_changed are populated on completed subplans; changeset_summary is None on queued subplans.
  2. Parent→child PlanIdentity linkage — child plans' parent_plan_id matches the parent's plan_id.

Tree construction logic (build_decision_tree()) is exercised by the CLI-level cli_plan_tree() test, which invokes the real function and verifies the full 5-node structural hierarchy via JSON output.

Risk: None. The removed assertions provided no real coverage. The legitimate SubplanStatus field checks and identity linkage checks remain.


Major Fix M2: CLI Error-Path Tests

Problem: All four CLI test functions (plan_diff, cli_plan_use, cli_plan_execute, cli_plan_tree) only exercised happy paths. Error conditions (read-only plans, unavailable actions, missing changesets, empty decisions) were untested.

Decision: Created helper_m4_e2e_cli_errors.py (182 lines) with four new test functions:

  1. cli_plan_execute_readonly: Mocks get_plan() to return a plan with read_only=True. Verifies the CLI aborts with "read-only" in output and execute_plan is never called. Tests the fail-fast guard at plan.py._execute_plan_command().

  2. cli_plan_use_not_found: Mocks get_action_by_name to raise ActionNotAvailableError("nonexistent/action", ActionState.ARCHIVED). Verifies the CLI catches the exception and prints "not available". Tests the except ActionNotAvailableError handler.

  3. cli_plan_diff_no_changeset: Mocks diff() to raise PlanError("Plan has no ChangeSet"). Verifies the CLI catches PlanError and displays "error" or "changeset". Tests the except PlanError handler.

  4. cli_plan_tree_empty: Mocks list_decisions() to return []. Verifies the CLI prints "No decisions" and exits cleanly (exit code 0). Tests the soft/informational exit path.

Trade-off: Used ActionNotAvailableError with ActionState.ARCHIVED rather than ResourceNotFoundError for the plan use test. Both are valid error paths; ActionNotAvailableError was chosen because it has a dedicated except clause in the CLI, providing more specific assertion targets.


Major Fix M3: Duplicate CHANGELOG Entries

Problem: Issue #495 had two changelog entries: one at lines 5-17 (under ## Unreleased without a ### subsection) and one at lines 205-211 (under ### Added).

Decision: Removed the newer entry at lines 5-17. Updated the existing entry under ### Added to include the review-fix details (six-way file split, error-path tests, DRY extractions, tautology removals, git pre-check).


Minor Fixes

  • m1 (git pre-check): Created _assert_git_available() in helper_m4_e2e_merge.py using shutil.which("git"). Called at the start of both merge_clean() and merge_conflict(). Without this, a missing git causes GitMergeStrategy to silently fall back to SequentialMergeStrategy, producing misleading assertion failures.

  • m3 (file split): Moved merge_clean() and merge_conflict() to helper_m4_e2e_merge.py (146 lines). Domain module dropped from 494 to 385 lines, providing 115 lines of headroom.

  • m4 (lazy imports): Replaced eager module-level imports with importlib.import_module() dispatch. The _COMMAND_MAP dict maps command names to (module_name, func_name) tuples. _resolve_handler() imports only the module needed for the specific command. Running merge-clean no longer loads typer.testing or cleveragents.cli.commands.plan.

  • m5 (fmt kwarg): Added assertion in plan_diff() verifying call_args.kwargs.get("fmt"). Uses the same kwargs-with-positional-fallback pattern as plan_id. Falls through silently if fmt is None (not all CLI invocations pass it explicitly).

  • m6 (get_plan guard): Added _assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID) in cli_plan_execute(). This verifies the read-only guard is actually exercised before execute_plan.

  • m7 (sys.path.append): Changed sys.path.insert(0, _SRC) to sys.path.append(_SRC) in all 6 modules. Prevents potential stdlib shadowing if a file in src/ were ever named identically to a stdlib module.

Nit Fixes

  • n1: Removed config.execution_mode != PARALLEL and config.max_parallel != 3 readback assertions in parallel_max(). These verified the config object constructed 2 lines above — tautological.
  • n2: Added docstring note in cli_plan_tree() explaining why it patches get_container (the production tree command uses container.resolve(DecisionService)) rather than _get_lifecycle_service.
  • n4: Fixed import ordering to match ruff isort: third-party (helper_m4_e2e_common, typer) then first-party (cleveragents.*). The robot/ directory is not in the ruff src config so helper_m4_e2e_common is classified as third-party.
  • n5: Added TYPE_CHECKING comment in common.py explaining that click.testing.Result is the canonical import since Typer re-exports CliRunner but not Result.

Quality Gate Results

All 11 nox sessions pass:

  • lint: PASS
  • format: PASS
  • typecheck: PASS (0 errors)
  • unit_tests: PASS (0 failures)
  • integration_tests: PASS (14/14 M4 E2E tests)
  • coverage_report: 98% (threshold 97%)
  • security_scan, dead_code, docs, build, benchmark: all PASS
## Implementation Notes — Second Review Fix Pass (PR !681, commit 5b0d934) ### Context The second self-review on PR !681 (posted as PR comment #59640) identified 3 major, 7 minor, and 5 nit issues. This comment documents the design decisions and rationale for each fix. --- ### Major Fix M1: Tautological `plan_tree()` Domain Test **Problem:** `helper_m4_e2e_domain.plan_tree()` built a tree dict from the SubplanStatus data it had itself injected into the parent plan, then asserted that dict contained those same IDs. This could never fail — the test was checking its own construction, not any system-under-test function. **Decision:** Removed the tautological tree dict construction (old lines 120-136). Redocumented the test to clarify it validates two things: 1. **SubplanStatus field completeness** — `subplan_id`, `action_name`, `changeset_summary`, `files_changed` are populated on completed subplans; `changeset_summary` is `None` on queued subplans. 2. **Parent→child `PlanIdentity` linkage** — child plans' `parent_plan_id` matches the parent's `plan_id`. Tree *construction* logic (`build_decision_tree()`) is exercised by the CLI-level `cli_plan_tree()` test, which invokes the real function and verifies the full 5-node structural hierarchy via JSON output. **Risk:** None. The removed assertions provided no real coverage. The legitimate SubplanStatus field checks and identity linkage checks remain. --- ### Major Fix M2: CLI Error-Path Tests **Problem:** All four CLI test functions (`plan_diff`, `cli_plan_use`, `cli_plan_execute`, `cli_plan_tree`) only exercised happy paths. Error conditions (read-only plans, unavailable actions, missing changesets, empty decisions) were untested. **Decision:** Created `helper_m4_e2e_cli_errors.py` (182 lines) with four new test functions: 1. **`cli_plan_execute_readonly`**: Mocks `get_plan()` to return a plan with `read_only=True`. Verifies the CLI aborts with "read-only" in output and `execute_plan` is never called. Tests the fail-fast guard at `plan.py._execute_plan_command()`. 2. **`cli_plan_use_not_found`**: Mocks `get_action_by_name` to raise `ActionNotAvailableError("nonexistent/action", ActionState.ARCHIVED)`. Verifies the CLI catches the exception and prints "not available". Tests the `except ActionNotAvailableError` handler. 3. **`cli_plan_diff_no_changeset`**: Mocks `diff()` to raise `PlanError("Plan has no ChangeSet")`. Verifies the CLI catches `PlanError` and displays "error" or "changeset". Tests the `except PlanError` handler. 4. **`cli_plan_tree_empty`**: Mocks `list_decisions()` to return `[]`. Verifies the CLI prints "No decisions" and exits cleanly (exit code 0). Tests the soft/informational exit path. **Trade-off:** Used `ActionNotAvailableError` with `ActionState.ARCHIVED` rather than `ResourceNotFoundError` for the `plan use` test. Both are valid error paths; `ActionNotAvailableError` was chosen because it has a dedicated `except` clause in the CLI, providing more specific assertion targets. --- ### Major Fix M3: Duplicate CHANGELOG Entries **Problem:** Issue #495 had two changelog entries: one at lines 5-17 (under `## Unreleased` without a `###` subsection) and one at lines 205-211 (under `### Added`). **Decision:** Removed the newer entry at lines 5-17. Updated the existing entry under `### Added` to include the review-fix details (six-way file split, error-path tests, DRY extractions, tautology removals, git pre-check). --- ### Minor Fixes - **m1 (git pre-check):** Created `_assert_git_available()` in `helper_m4_e2e_merge.py` using `shutil.which("git")`. Called at the start of both `merge_clean()` and `merge_conflict()`. Without this, a missing `git` causes `GitMergeStrategy` to silently fall back to `SequentialMergeStrategy`, producing misleading assertion failures. - **m3 (file split):** Moved `merge_clean()` and `merge_conflict()` to `helper_m4_e2e_merge.py` (146 lines). Domain module dropped from 494 to 385 lines, providing 115 lines of headroom. - **m4 (lazy imports):** Replaced eager module-level imports with `importlib.import_module()` dispatch. The `_COMMAND_MAP` dict maps command names to `(module_name, func_name)` tuples. `_resolve_handler()` imports only the module needed for the specific command. Running `merge-clean` no longer loads `typer.testing` or `cleveragents.cli.commands.plan`. - **m5 (fmt kwarg):** Added assertion in `plan_diff()` verifying `call_args.kwargs.get("fmt")`. Uses the same kwargs-with-positional-fallback pattern as `plan_id`. Falls through silently if `fmt` is `None` (not all CLI invocations pass it explicitly). - **m6 (get_plan guard):** Added `_assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID)` in `cli_plan_execute()`. This verifies the read-only guard is actually exercised before `execute_plan`. - **m7 (sys.path.append):** Changed `sys.path.insert(0, _SRC)` to `sys.path.append(_SRC)` in all 6 modules. Prevents potential stdlib shadowing if a file in `src/` were ever named identically to a stdlib module. ### Nit Fixes - **n1:** Removed `config.execution_mode != PARALLEL` and `config.max_parallel != 3` readback assertions in `parallel_max()`. These verified the config object constructed 2 lines above — tautological. - **n2:** Added docstring note in `cli_plan_tree()` explaining why it patches `get_container` (the production tree command uses `container.resolve(DecisionService)`) rather than `_get_lifecycle_service`. - **n4:** Fixed import ordering to match ruff isort: third-party (`helper_m4_e2e_common`, `typer`) then first-party (`cleveragents.*`). The `robot/` directory is not in the ruff `src` config so `helper_m4_e2e_common` is classified as third-party. - **n5:** Added TYPE_CHECKING comment in common.py explaining that `click.testing.Result` is the canonical import since Typer re-exports `CliRunner` but not `Result`. --- ### Quality Gate Results All 11 nox sessions pass: - lint: PASS - format: PASS - typecheck: PASS (0 errors) - unit_tests: PASS (0 failures) - integration_tests: PASS (14/14 M4 E2E tests) - coverage_report: 98% (threshold 97%) - security_scan, dead_code, docs, build, benchmark: all PASS
hurui200320 2026-03-13 06:10:24 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#495
No description provided.