test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #681

Merged
hurui200320 merged 1 commit from test/m4-acceptance-gate into master 2026-03-13 06:10:23 +00:00
Member

Summary

Re-verifies all M4 acceptance criteria against the current master branch and closes CLI test coverage gaps identified in code review. Addresses all 3 blocking, 7 non-blocking, and 5 nit issues from the second self-review.

Changes

File Split (CONTRIBUTING.md 500-line compliance)

Split the 1074-line helper_m4_e2e_verification.py into six focused modules:

Module Lines Responsibility
helper_m4_e2e_common.py 244 Constants, _fail(), mock assertion wrappers, _make_subplan_status factory, plan mock factories, frozen timestamp
helper_m4_e2e_domain.py 385 Domain model tests: spawn-subplans, plan-tree, parallel-max, parent-tracking
helper_m4_e2e_merge.py 146 Merge tests: merge-clean, merge-conflict (with shutil.which("git") pre-check)
helper_m4_e2e_cli.py 445 CLI happy-path tests: plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree
helper_m4_e2e_cli_errors.py 182 CLI error-path tests: execute-readonly, use-not-found, diff-no-changeset, tree-empty
helper_m4_e2e_verification.py 105 Thin dispatcher with lazy importlib imports (entry point unchanged for Robot Framework)

Major Fixes (M1–M3)

# Issue Fix
M1 Tautological plan_tree() domain test built a tree dict from its own injected data Removed tautological tree construction; redocumented test as SubplanStatus field completeness + parent linkage validation
M2 No CLI error-path tests — all four CLI functions only exercised happy paths Added 4 error-path tests: execute on read-only plan, use with ActionNotAvailableError, diff with PlanError (no changeset), tree with zero decisions
M3 Duplicate CHANGELOG entries for #495 (lines 5-17 and 205-211) Consolidated into a single entry under ### Added

Minor Fixes (m1–m7)

# Issue Fix
m1 Environment-dependent merge test failure if git not on $PATH Added _assert_git_available() pre-check using shutil.which("git")
m2 CHANGELOG entry placement without subsection Consolidated into the existing ### Added entry
m3 helper_m4_e2e_domain.py at 494 lines — tight headroom Moved merge functions to helper_m4_e2e_merge.py; domain dropped to 385 lines
m4 Eager imports in dispatcher negate split benefit Replaced with importlib.import_module() lazy dispatch
m5 plan_diff() does not verify fmt kwarg Added assertion checking call_args.kwargs.get("fmt")
m6 cli_plan_execute() does not verify get_plan() guard call Added _assert_mock_called_once_with(mock_service.get_plan, ...)
m7 sys.path.insert(0, ...) could shadow stdlib modules Changed to sys.path.append(_SRC) in all modules

Nit Fixes (n1–n5)

# Issue Fix
n1 parallel_max() value readback assertions are noise Removed tautological readback assertions
n2 cli_plan_tree() patches at module level inconsistently Added comment explaining the tree command resolves DecisionService via container, not lifecycle service
n3 Duplicated sys.path block across modules Kept for defensive standalone invocation (documented as idempotent)
n4 Import ordering after sys.path manipulation Fixed: third-party (helper_m4_e2e_common, typer) before first-party (cleveragents)
n5 CliResult type annotation sourced from click.testing vs typer.testing Added clarifying comment: click.testing.Result is canonical since Typer re-exports CliRunner but not Result

CHANGELOG

Consolidated into a single ### Added entry documenting the complete set of review-fix improvements.

M4 Acceptance Criteria Verification

All 14 M4 E2E verification tests pass (10 happy-path + 4 error-path):

# Criterion Test(s) Status
1 plan use creates plan cli-plan-use + cli-plan-use-not-found (error) PASS
2 plan execute spawns subplans cli-plan-execute + cli-plan-execute-readonly (error) PASS
3 plan tree displays tree cli-plan-tree + cli-plan-tree-empty (error) PASS
4 plan diff shows merged results plan-diff + cli-plan-diff-no-changeset (error) PASS
5 Parallel execution with max_parallel parallel-max PASS
6 Three-way merge non-conflicting merge-clean PASS
7 Merge conflicts surfaced merge-conflict PASS
8 Parent tracks subplan statuses parent-tracking PASS

Quality Gates

Stage Result
lint PASS
format PASS
typecheck PASS (0 errors)
unit_tests PASS (0 failures)
integration_tests PASS (14 M4 E2E tests, 0 failures)
coverage_report PASS (98%, threshold 97%)
security_scan PASS
dead_code PASS
docs PASS
build PASS
benchmark PASS

Closes #495

## Summary Re-verifies all M4 acceptance criteria against the current master branch and closes CLI test coverage gaps identified in code review. Addresses all 3 blocking, 7 non-blocking, and 5 nit issues from the second self-review. ## Changes ### File Split (CONTRIBUTING.md 500-line compliance) Split the 1074-line `helper_m4_e2e_verification.py` into six focused modules: | Module | Lines | Responsibility | |--------|-------|----------------| | `helper_m4_e2e_common.py` | 244 | Constants, `_fail()`, mock assertion wrappers, `_make_subplan_status` factory, plan mock factories, frozen timestamp | | `helper_m4_e2e_domain.py` | 385 | Domain model tests: spawn-subplans, plan-tree, parallel-max, parent-tracking | | `helper_m4_e2e_merge.py` | 146 | Merge tests: merge-clean, merge-conflict (with `shutil.which("git")` pre-check) | | `helper_m4_e2e_cli.py` | 445 | CLI happy-path tests: plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree | | `helper_m4_e2e_cli_errors.py` | 182 | CLI error-path tests: execute-readonly, use-not-found, diff-no-changeset, tree-empty | | `helper_m4_e2e_verification.py` | 105 | Thin dispatcher with lazy `importlib` imports (entry point unchanged for Robot Framework) | ### Major Fixes (M1–M3) | # | Issue | Fix | |---|-------|-----| | M1 | Tautological `plan_tree()` domain test built a tree dict from its own injected data | Removed tautological tree construction; redocumented test as SubplanStatus field completeness + parent linkage validation | | M2 | No CLI error-path tests — all four CLI functions only exercised happy paths | Added 4 error-path tests: `execute` on read-only plan, `use` with `ActionNotAvailableError`, `diff` with `PlanError` (no changeset), `tree` with zero decisions | | M3 | Duplicate CHANGELOG entries for #495 (lines 5-17 and 205-211) | Consolidated into a single entry under `### Added` | ### Minor Fixes (m1–m7) | # | Issue | Fix | |---|-------|-----| | m1 | Environment-dependent merge test failure if `git` not on `$PATH` | Added `_assert_git_available()` pre-check using `shutil.which("git")` | | m2 | CHANGELOG entry placement without subsection | Consolidated into the existing `### Added` entry | | m3 | `helper_m4_e2e_domain.py` at 494 lines — tight headroom | Moved merge functions to `helper_m4_e2e_merge.py`; domain dropped to 385 lines | | m4 | Eager imports in dispatcher negate split benefit | Replaced with `importlib.import_module()` lazy dispatch | | m5 | `plan_diff()` does not verify `fmt` kwarg | Added assertion checking `call_args.kwargs.get("fmt")` | | m6 | `cli_plan_execute()` does not verify `get_plan()` guard call | Added `_assert_mock_called_once_with(mock_service.get_plan, ...)` | | m7 | `sys.path.insert(0, ...)` could shadow stdlib modules | Changed to `sys.path.append(_SRC)` in all modules | ### Nit Fixes (n1–n5) | # | Issue | Fix | |---|-------|-----| | n1 | `parallel_max()` value readback assertions are noise | Removed tautological readback assertions | | n2 | `cli_plan_tree()` patches at module level inconsistently | Added comment explaining the tree command resolves DecisionService via container, not lifecycle service | | n3 | Duplicated `sys.path` block across modules | Kept for defensive standalone invocation (documented as idempotent) | | n4 | Import ordering after `sys.path` manipulation | Fixed: third-party (`helper_m4_e2e_common`, `typer`) before first-party (`cleveragents`) | | n5 | `CliResult` type annotation sourced from `click.testing` vs `typer.testing` | Added clarifying comment: `click.testing.Result` is canonical since Typer re-exports `CliRunner` but not `Result` | ### CHANGELOG Consolidated into a single `### Added` entry documenting the complete set of review-fix improvements. ## M4 Acceptance Criteria Verification All 14 M4 E2E verification tests pass (10 happy-path + 4 error-path): | # | Criterion | Test(s) | Status | |---|-----------|---------|--------| | 1 | `plan use` creates plan | `cli-plan-use` + `cli-plan-use-not-found` (error) | PASS | | 2 | `plan execute` spawns subplans | `cli-plan-execute` + `cli-plan-execute-readonly` (error) | PASS | | 3 | `plan tree` displays tree | `cli-plan-tree` + `cli-plan-tree-empty` (error) | PASS | | 4 | `plan diff` shows merged results | `plan-diff` + `cli-plan-diff-no-changeset` (error) | PASS | | 5 | Parallel execution with max_parallel | `parallel-max` | PASS | | 6 | Three-way merge non-conflicting | `merge-clean` | PASS | | 7 | Merge conflicts surfaced | `merge-conflict` | PASS | | 8 | Parent tracks subplan statuses | `parent-tracking` | PASS | ## Quality Gates | Stage | Result | |-------|--------| | lint | PASS | | format | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (0 failures) | | integration_tests | PASS (14 M4 E2E tests, 0 failures) | | coverage_report | PASS (98%, threshold 97%) | | security_scan | PASS | | dead_code | PASS | | docs | PASS | | build | PASS | | benchmark | PASS | Closes #495
hurui200320 added this to the v3.3.0 milestone 2026-03-11 07:02:14 +00:00
Author
Member

Code Review: PR #681 — M4 E2E Acceptance Test Strengthening

Self-review notes by @hurui200320 (posted as comment since Forgejo does not allow self-reviews)

Overall Assessment

The PR closes 3 of the 4 review-identified gaps (C1, C3, C4) with well-designed assertions. C2 is partially addressed. Quality gates pass. Specification compliance and security posture are sound. However, several issues should be addressed before requesting peer review.

Verdict: NEEDS CHANGES (3 blocking, 7 non-blocking)


Blocking Issues

1. Test Tautology — cli_plan_execute() post-plan assertions verify mock fixture, not system behavior

File: robot/helper_m4_e2e_verification.py:849-858 | Severity: Major

The assertions at lines 849–858 check post_plan.phase, post_plan.subplan_statuses, and post_plan.has_subplans outside the with patch(...) block (which ends at line 848). These verify the Plan object the test itself constructed at lines 815–819 — they can never fail. The C2 gap ("cli-plan-execute didn't verify subplan info") is not genuinely closed at the CLI output level.

The implementation notes acknowledge the CLI does not render subplan_count in its output. If this is a known CLI limitation, the tautological assertions should either be:

  1. Removed (they add no coverage and are misleading), or
  2. Replaced with a comment documenting that subplan info cannot be verified from CLI output, tracked as a follow-up.

2. Unguarded positional arg access in plan_diff() — potential IndexError

File: robot/helper_m4_e2e_verification.py:342 | Severity: Major

if call_args[0][0] != _ROOT_ULID:

If the production CLI calls diff(plan_id=_ROOT_ULID) as a keyword argument, call_args[0] is () and call_args[0][0] raises an unhandled IndexError. Use the kwargs-with-positional-fallback pattern:

plan_id_arg = call_args.kwargs.get("plan_id")
if plan_id_arg is None and call_args.args:
    plan_id_arg = call_args.args[0]

3. File exceeds 500-line limit (1074 lines, 2× over)

File: robot/helper_m4_e2e_verification.py | Severity: Major (CONTRIBUTING.md violation)

CONTRIBUTING.md: "Keep files under 500 lines." While pre-existing, this PR widens the gap by 84 net lines. Natural split point: domain model tests vs. CLI integration tests → helper_m4_e2e_domain.py + helper_m4_e2e_cli.py + shared helper_m4_e2e_common.py.


Non-Blocking Recommendations

4. Unhandled AttributeError risk — robot/helper_m4_e2e_verification.py:784

If project_links contains non-ProjectLink objects, pl.project_name raises an unhandled AttributeError. Add type guarding:

if not all(isinstance(pl, ProjectLink) for pl in project_links):
    _fail(f"project_links should be list of ProjectLink, got {[type(p).__name__ for p in project_links]}")

5. assert_called_once*() bypasses _fail() pattern — lines 340, 767, 772, 835, 965

These raise raw AssertionError on failure instead of the standardized FAIL: messages. Consider wrapping in try/except.

6. Asymmetric positional/keyword fallback — lines 774–786

action_name has kwargs + positional fallback; project_links only checks kwargs. A refactor passing project_links positionally would cause a misleading test failure.

7. Overly broad "execute" string match — line 846

"execute" not in output_lower matches the word anywhere in the output. A more targeted check would be more robust.

8. DRY violations

SubplanStatus construction repeated 3× (lines 152–184, 402–424, 597–637). CLI exit-code checks repeated 4× (lines 336, 759, 827, 957). Extract parameterized factories.

9. Non-deterministic datetime.now() in fixtures (lines 88, 127, 154, 402, 594)

Multiple independent calls produce subtly inconsistent timestamps. Use a frozen constant.

10. Generic _DECISION_ULID_N naming (lines 868–872)

Rename to _DECISION_PROMPT_DEF, _DECISION_STRATEGY, etc. for self-documenting assertions.


M4 Acceptance Criteria Coverage

# Criterion Status
1 plan use creates plan COVERED
2 plan execute spawns subplans PARTIAL (see Issue #1)
3 plan tree displays hierarchy COVERED
4 plan diff shows merged results COVERED
5 Parallel max_parallel COVERED (model-level)
6 Three-way merge COVERED
7 Merge conflicts surfaced COVERED
8 Parent tracks statuses COVERED

C1–C4 Gap Closure

Gap Status
C1: project argument CLOSED
C2: subplan info PARTIAL — CLI phase check real; domain checks tautological
C3: shallow matching CLOSED
C4: diff content CLOSED

Security & Compliance

No security vulnerabilities. All decision types and plan lifecycle phases align with docs/specification.md. PR description includes Closes #495. Milestone and label correctly assigned.

## Code Review: PR #681 — M4 E2E Acceptance Test Strengthening **Self-review notes by @hurui200320** *(posted as comment since Forgejo does not allow self-reviews)* ### Overall Assessment The PR closes 3 of the 4 review-identified gaps (C1, C3, C4) with well-designed assertions. C2 is partially addressed. Quality gates pass. Specification compliance and security posture are sound. However, several issues should be addressed before requesting peer review. **Verdict: NEEDS CHANGES** (3 blocking, 7 non-blocking) --- ### Blocking Issues #### 1. Test Tautology — `cli_plan_execute()` post-plan assertions verify mock fixture, not system behavior **File:** `robot/helper_m4_e2e_verification.py:849-858` | **Severity:** Major The assertions at lines 849–858 check `post_plan.phase`, `post_plan.subplan_statuses`, and `post_plan.has_subplans` **outside** the `with patch(...)` block (which ends at line 848). These verify the `Plan` object the test *itself* constructed at lines 815–819 — they can never fail. The C2 gap ("cli-plan-execute didn't verify subplan info") is not genuinely closed at the CLI output level. The implementation notes acknowledge the CLI does not render `subplan_count` in its output. If this is a known CLI limitation, the tautological assertions should either be: 1. **Removed** (they add no coverage and are misleading), or 2. **Replaced with a comment** documenting that subplan info cannot be verified from CLI output, tracked as a follow-up. #### 2. Unguarded positional arg access in `plan_diff()` — potential `IndexError` **File:** `robot/helper_m4_e2e_verification.py:342` | **Severity:** Major ```python if call_args[0][0] != _ROOT_ULID: ``` If the production CLI calls `diff(plan_id=_ROOT_ULID)` as a keyword argument, `call_args[0]` is `()` and `call_args[0][0]` raises an unhandled `IndexError`. Use the kwargs-with-positional-fallback pattern: ```python plan_id_arg = call_args.kwargs.get("plan_id") if plan_id_arg is None and call_args.args: plan_id_arg = call_args.args[0] ``` #### 3. File exceeds 500-line limit (1074 lines, 2× over) **File:** `robot/helper_m4_e2e_verification.py` | **Severity:** Major (CONTRIBUTING.md violation) CONTRIBUTING.md: *"Keep files under 500 lines."* While pre-existing, this PR widens the gap by 84 net lines. Natural split point: domain model tests vs. CLI integration tests → `helper_m4_e2e_domain.py` + `helper_m4_e2e_cli.py` + shared `helper_m4_e2e_common.py`. --- ### Non-Blocking Recommendations #### 4. Unhandled `AttributeError` risk — `robot/helper_m4_e2e_verification.py:784` If `project_links` contains non-`ProjectLink` objects, `pl.project_name` raises an unhandled `AttributeError`. Add type guarding: ```python if not all(isinstance(pl, ProjectLink) for pl in project_links): _fail(f"project_links should be list of ProjectLink, got {[type(p).__name__ for p in project_links]}") ``` #### 5. `assert_called_once*()` bypasses `_fail()` pattern — lines 340, 767, 772, 835, 965 These raise raw `AssertionError` on failure instead of the standardized `FAIL:` messages. Consider wrapping in try/except. #### 6. Asymmetric positional/keyword fallback — lines 774–786 `action_name` has kwargs + positional fallback; `project_links` only checks kwargs. A refactor passing `project_links` positionally would cause a misleading test failure. #### 7. Overly broad "execute" string match — line 846 `"execute" not in output_lower` matches the word anywhere in the output. A more targeted check would be more robust. #### 8. DRY violations SubplanStatus construction repeated 3× (lines 152–184, 402–424, 597–637). CLI exit-code checks repeated 4× (lines 336, 759, 827, 957). Extract parameterized factories. #### 9. Non-deterministic `datetime.now()` in fixtures (lines 88, 127, 154, 402, 594) Multiple independent calls produce subtly inconsistent timestamps. Use a frozen constant. #### 10. Generic `_DECISION_ULID_N` naming (lines 868–872) Rename to `_DECISION_PROMPT_DEF`, `_DECISION_STRATEGY`, etc. for self-documenting assertions. --- ### M4 Acceptance Criteria Coverage | # | Criterion | Status | |---|-----------|--------| | 1 | `plan use` creates plan | COVERED | | 2 | `plan execute` spawns subplans | **PARTIAL** (see Issue #1) | | 3 | `plan tree` displays hierarchy | COVERED | | 4 | `plan diff` shows merged results | COVERED | | 5 | Parallel max_parallel | COVERED (model-level) | | 6 | Three-way merge | COVERED | | 7 | Merge conflicts surfaced | COVERED | | 8 | Parent tracks statuses | COVERED | ### C1–C4 Gap Closure | Gap | Status | |-----|--------| | C1: project argument | **CLOSED** | | C2: subplan info | **PARTIAL** — CLI phase check real; domain checks tautological | | C3: shallow matching | **CLOSED** | | C4: diff content | **CLOSED** | ### Security & Compliance No security vulnerabilities. All decision types and plan lifecycle phases align with `docs/specification.md`. PR description includes `Closes #495`. Milestone and label correctly assigned.
Owner

PM Review — Day 31 (Specification Update)

This PR is mergeable.

Spec Alignment Check

M4 E2E acceptance tests are NOT directly impacted by protocol or TUI changes. However, note that the broader E2E test mocking issue (#658) may affect how these tests are structured long-term.

Status

Self-review by @hurui200320 identified 3 blocking issues:

  1. Test tautology — assertions verify mock fixture not system behavior
  2. Unguarded positional arg access
  3. File exceeds 500-line limit (1074 lines)

Action Required

@hurui200320 — Address the 3 blocking issues from your self-review, then request external review.

## PM Review — Day 31 (Specification Update) This PR is **mergeable**. ### Spec Alignment Check M4 E2E acceptance tests are NOT directly impacted by protocol or TUI changes. However, note that the broader E2E test mocking issue (#658) may affect how these tests are structured long-term. ### Status Self-review by @hurui200320 identified 3 blocking issues: 1. Test tautology — assertions verify mock fixture not system behavior 2. Unguarded positional arg access 3. File exceeds 500-line limit (1074 lines) ### Action Required @hurui200320 — Address the 3 blocking issues from your self-review, then request external review.
hurui200320 force-pushed test/m4-acceptance-gate from 97c5dfbcd1
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 20s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 2m43s
CI / docker (pull_request) Successful in 9s
CI / coverage (pull_request) Successful in 5m25s
CI / integration_tests (pull_request) Successful in 3m19s
CI / benchmark-regression (pull_request) Successful in 36m21s
to 2e8ea55760
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 55s
CI / typecheck (pull_request) Successful in 57s
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-12 02:35:22 +00:00
Compare
hurui200320 force-pushed test/m4-acceptance-gate from 2e8ea55760
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 55s
CI / typecheck (pull_request) Successful in 57s
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to ecf508797a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 2m56s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m25s
CI / coverage (pull_request) Successful in 5m28s
CI / benchmark-regression (pull_request) Successful in 36m1s
2026-03-12 02:37:21 +00:00
Compare
Author
Member

Review Fixes Complete — All CI Green

@freemo All 3 blocking and 7 non-blocking issues from the self-review have been addressed in commit ecf5087.

Blocking Fixes

  1. Test tautology (cli_plan_execute) — removed tautological domain assertions; replaced with explanatory comment. Subplan coverage is provided by the dedicated spawn-subplans and parent-tracking domain tests.
  2. Unguarded positional arg access (plan_diff) — replaced call_args[0][0] with kwargs.get("plan_id") + positional fallback pattern.
  3. 500-line limit violation — split 1074-line file into 4 modules: _common.py (237), _domain.py (495), _cli.py (426), dispatcher (81). All under 500 lines.

Non-Blocking Fixes

  1. isinstance(pl, ProjectLink) type guard added
  2. assert_called_once*() wrapped in _fail()-pattern helpers
  3. project_links extraction made symmetric with action_name
  4. "execute" in output_lowerre.search(r"\bexecute\b", ...) word-boundary regex
  5. _make_subplan_status() factory and _assert_exit_code() helper extracted (DRY)
  6. datetime.now()_FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)
  7. _DECISION_ULID_N → descriptive names (_DECISION_PROMPT_DEF, _DECISION_STRATEGY, etc.)

CI Status

All 11 stages passed (run #6171, 36m43s):

  • lint, format, typecheck, unit_tests (10,431), integration_tests (1,452), coverage (98%), security_scan, dead_code, docs, build, benchmarks

Ready for external review.

## Review Fixes Complete — All CI Green @freemo All 3 blocking and 7 non-blocking issues from the self-review have been addressed in commit `ecf5087`. ### Blocking Fixes 1. **Test tautology** (`cli_plan_execute`) — removed tautological domain assertions; replaced with explanatory comment. Subplan coverage is provided by the dedicated `spawn-subplans` and `parent-tracking` domain tests. 2. **Unguarded positional arg access** (`plan_diff`) — replaced `call_args[0][0]` with `kwargs.get("plan_id")` + positional fallback pattern. 3. **500-line limit violation** — split 1074-line file into 4 modules: `_common.py` (237), `_domain.py` (495), `_cli.py` (426), dispatcher (81). All under 500 lines. ### Non-Blocking Fixes 4. `isinstance(pl, ProjectLink)` type guard added 5. `assert_called_once*()` wrapped in `_fail()`-pattern helpers 6. `project_links` extraction made symmetric with `action_name` 7. `"execute" in output_lower` → `re.search(r"\bexecute\b", ...)` word-boundary regex 8. `_make_subplan_status()` factory and `_assert_exit_code()` helper extracted (DRY) 9. `datetime.now()` → `_FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)` 10. `_DECISION_ULID_N` → descriptive names (`_DECISION_PROMPT_DEF`, `_DECISION_STRATEGY`, etc.) ### CI Status **All 11 stages passed** (run #6171, 36m43s): - lint, format, typecheck, unit_tests (10,431), integration_tests (1,452), coverage (98%), security_scan, dead_code, docs, build, benchmarks Ready for external review.
Author
Member

Code Review: PR #681test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

Reviewer: @hurui200320 (posted as comment — Forgejo does not allow self-reviews)
Branch: test/m4-acceptance-gatemaster
Commit: ecf5087


Prior Review Items

All 3 blocking and 7 non-blocking issues from the self-review (comment by @hurui200320) and the PM review (comment by @freemo) have been verified as resolved in the latest commit. These are not repeated here.


Verdict: REQUEST CHANGES (3 Major, 7 Minor, 5 Nit)


Major Issues

M1. Tautological plan_tree() domain test

File: robot/helper_m4_e2e_domain.py:120-136

The test manually builds a tree dict from the subplan_statuses it injected into the parent plan, then asserts that dict contains those same IDs. No system-under-test function produces this tree — the test is checking its own construction. Lines 120-123:

tree: dict[str, list[str]] = {
    parent.identity.plan_id: [s.subplan_id for s in parent.subplan_statuses]
}

Then lines 126-136 assert _CHILD_A_ULID in tree[_ROOT_ULID] etc., which can never fail.

The CLI-level cli_plan_tree() test exercises real build_decision_tree() logic, so the M4 criterion has some coverage, but this domain test is misleading about what it verifies.

Recommendation: Either call a real domain function that builds the tree structure from flat data, or rename/redocument this test to clarify it only validates SubplanStatus field completeness (changeset_summary, files_changed, parent linkage) rather than tree construction.

M2. No CLI error-path tests

File: robot/helper_m4_e2e_cli.py (all CLI tests)

All four CLI test functions only exercise the happy path. None test:

  • plan execute on a read-only plan (should abort)
  • plan use with a nonexistent action (ActionNotAvailableError)
  • plan diff with a plan that has no changeset
  • plan tree with zero decisions

Error paths are part of the user-facing contract. Without error-path tests, a regression could silently swallow errors or crash with unhandled exceptions.

Recommendation: Add at least one error-path test per CLI command.

M3. Duplicate CHANGELOG entries for #495

File: CHANGELOG.md:5-17 and CHANGELOG.md:205-211

Issue #495 has two changelog entries:

  1. Lines 5-17 (under ## Unreleased, no ### subsection): The entry added by this PR.
  2. Lines 205-211 (under ### Added): A pre-existing entry from a previous commit.

Per CONTRIBUTING.md, one changelog entry per logical change is expected.

Recommendation: Consolidate into a single entry. Either remove the new entry (if the existing one is sufficient) or merge the content and remove the duplicate.


Minor Issues

m1. Environment-dependent merge test failure

File: robot/helper_m4_e2e_domain.py:278, 322

merge_clean() and merge_conflict() call GitMergeStrategy().merge(), which shells out to git merge-file. If git is not on $PATH, the fallback to SequentialMergeStrategy produces results that fail the assertions with misleading diagnostics (e.g., "missing ours change" instead of "git not found").

Recommendation: Add a pre-check: if shutil.which("git") is None: _fail("git is required on PATH for merge tests")

m2. CHANGELOG entry placement without subsection

File: CHANGELOG.md:5-17

The new entry sits directly under ## Unreleased without a ### subsection header, while the rest of the changelog uses structured ### Added / ### fix(...) subsections.

Recommendation: Move the entry under the appropriate ### Added subsection.

m3. helper_m4_e2e_domain.py at 494 lines — tight headroom

File: robot/helper_m4_e2e_domain.py

At 494 lines, the file has only 6 lines of headroom before the 500-line limit. Any future additions will immediately breach it.

Recommendation: Consider proactively splitting the two merge functions (merge_clean, merge_conflict) into a helper_m4_e2e_merge.py module (~90 lines each), dropping domain to ~310 lines.

m4. Eager imports in dispatcher negate split benefit

File: robot/helper_m4_e2e_verification.py:33-46

The dispatcher unconditionally imports both helper_m4_e2e_cli and helper_m4_e2e_domain at module level. When running a domain-only command (e.g., merge-clean), the CLI module's heavy imports (typer.testing, cleveragents.cli.commands.plan) are loaded unnecessarily.

Recommendation: Use lazy imports inside the dispatch lookup so each command only loads its own module. Estimated savings: 50-200 ms per domain-only invocation.

m5. plan_diff() does not verify fmt kwarg

File: robot/helper_m4_e2e_cli.py:93-99

The production code calls service.diff(plan_id, fmt=fmt), but the test only verifies plan_id was passed correctly. If a regression dropped the fmt argument, this test would not catch it.

Recommendation: Add an assertion checking call_args.kwargs.get("fmt").

m6. cli_plan_execute() does not verify get_plan() guard call

File: robot/helper_m4_e2e_cli.py:198-258

The execute command calls service.get_plan(plan_id) for the read-only guard. The test sets up the mock return but never asserts get_plan was actually called.

Recommendation: Add _assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID).

m7. sys.path.insert(0, ...) could shadow stdlib modules

Files: helper_m4_e2e_common.py:17-19, helper_m4_e2e_cli.py:22-24, helper_m4_e2e_domain.py:21-23, helper_m4_e2e_verification.py:29-31

Using sys.path.insert(0, _SRC) places src/ at the front of the search path. If a file with the same name as a stdlib module were ever placed in src/, it would shadow the stdlib.

Recommendation: Use sys.path.append(_SRC) instead, or centralize the path setup.


Nit

n1. parallel_max() value readback assertions are noise

robot/helper_m4_e2e_domain.py:187-190 — Asserts config.execution_mode == PARALLEL immediately after constructing SubplanConfig(execution_mode=ExecutionMode.PARALLEL). Can never fail.

n2. cli_plan_tree() patches at module level inconsistently

robot/helper_m4_e2e_cli.py:340-343 — Other CLI tests mock private helpers (_get_lifecycle_service), but this one patches get_container at its definition site. Consider extracting a _get_decision_service() helper for consistency.

n3. Duplicated sys.path block across 4 files

All 4 modules repeat the identical 3-line _SRC path insertion. Since helper_m4_e2e_common.py is always imported first, the duplication is technically redundant (though defensible for standalone invocation).

n4. Import ordering after sys.path manipulation

robot/helper_m4_e2e_cli.py:26-49 — Local helper (helper_m4_e2e_common) is imported before third-party (typer.testing). Conventional ordering is third-party then local.

n5. CliResult type annotation sourced from click.testing vs typer.testing

robot/helper_m4_e2e_common.py:37from click.testing import Result as CliResult works because Typer wraps Click, but from typer.testing import Result would be clearer.


Self-Review Issue Verification (All 10 Fixed)

# Issue Status
1 Test tautology in cli_plan_execute() Fixed (lines 249-256 explain removal)
2 Unguarded call_args[0][0] in plan_diff() Fixed (kwargs fallback at lines 94-99)
3 File exceeds 500-line limit Fixed (4-way split: 237+426+494+81)
4 AttributeError risk on project_links Fixed (isinstance guard at line 180)
5 assert_called_once*() bypasses _fail() Fixed (wrapper helpers in common)
6 Asymmetric positional/keyword fallback Fixed (symmetric at lines 164-175)
7 Overly broad "execute" string match Fixed (word-boundary regex at line 246)
8 DRY violations Fixed (factories + helpers extracted)
9 Non-deterministic datetime.now() Fixed (_FROZEN_NOW constant)
10 Generic ULID naming Fixed (descriptive names at lines 54-58)

M4 Acceptance Criteria Coverage

# Criterion Coverage
1 plan use creates plan Covered (cli_plan_use)
2 plan execute spawns subplans Covered (cli_plan_execute + spawn_subplans)
3 plan tree displays hierarchy Covered (cli_plan_tree); domain test is tautological (M1)
4 plan diff shows merged results Covered (plan_diff)
5 Parallel max_parallel Covered (parallel_max)
6 Three-way merge non-conflicting Covered (merge_clean)
7 Merge conflicts surfaced Covered (merge_conflict)
8 Parent tracks statuses Covered (parent_tracking)

Security & Compliance

No security vulnerabilities. All decision types and plan lifecycle phases align with docs/specification.md. PR description includes Closes #495. Milestone and label correctly assigned. Type safety is clean (zero # type: ignore). Commit message follows Conventional Changelog.

## Code Review: PR #681 — `test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure` **Reviewer:** @hurui200320 *(posted as comment — Forgejo does not allow self-reviews)* **Branch:** `test/m4-acceptance-gate` → `master` **Commit:** `ecf5087` --- ### Prior Review Items All 3 blocking and 7 non-blocking issues from the self-review (comment by @hurui200320) and the PM review (comment by @freemo) have been verified as **resolved** in the latest commit. These are not repeated here. --- ### Verdict: **REQUEST CHANGES** (3 Major, 7 Minor, 5 Nit) --- ### Major Issues #### M1. Tautological `plan_tree()` domain test **File:** `robot/helper_m4_e2e_domain.py:120-136` The test manually builds a tree dict from the `subplan_statuses` it injected into the parent plan, then asserts that dict contains those same IDs. No system-under-test function produces this tree — the test is checking its own construction. Lines 120-123: ```python tree: dict[str, list[str]] = { parent.identity.plan_id: [s.subplan_id for s in parent.subplan_statuses] } ``` Then lines 126-136 assert `_CHILD_A_ULID in tree[_ROOT_ULID]` etc., which can never fail. The CLI-level `cli_plan_tree()` test exercises real `build_decision_tree()` logic, so the M4 criterion has *some* coverage, but this domain test is misleading about what it verifies. **Recommendation:** Either call a real domain function that builds the tree structure from flat data, or rename/redocument this test to clarify it only validates `SubplanStatus` field completeness (changeset_summary, files_changed, parent linkage) rather than tree construction. #### M2. No CLI error-path tests **File:** `robot/helper_m4_e2e_cli.py` (all CLI tests) All four CLI test functions only exercise the happy path. None test: - `plan execute` on a read-only plan (should abort) - `plan use` with a nonexistent action (`ActionNotAvailableError`) - `plan diff` with a plan that has no changeset - `plan tree` with zero decisions Error paths are part of the user-facing contract. Without error-path tests, a regression could silently swallow errors or crash with unhandled exceptions. **Recommendation:** Add at least one error-path test per CLI command. #### M3. Duplicate CHANGELOG entries for #495 **File:** `CHANGELOG.md:5-17` and `CHANGELOG.md:205-211` Issue #495 has **two** changelog entries: 1. Lines 5-17 (under `## Unreleased`, no `###` subsection): The entry added by this PR. 2. Lines 205-211 (under `### Added`): A pre-existing entry from a previous commit. Per CONTRIBUTING.md, one changelog entry per logical change is expected. **Recommendation:** Consolidate into a single entry. Either remove the new entry (if the existing one is sufficient) or merge the content and remove the duplicate. --- ### Minor Issues #### m1. Environment-dependent merge test failure **File:** `robot/helper_m4_e2e_domain.py:278, 322` `merge_clean()` and `merge_conflict()` call `GitMergeStrategy().merge()`, which shells out to `git merge-file`. If `git` is not on `$PATH`, the fallback to `SequentialMergeStrategy` produces results that fail the assertions with misleading diagnostics (e.g., "missing ours change" instead of "git not found"). **Recommendation:** Add a pre-check: `if shutil.which("git") is None: _fail("git is required on PATH for merge tests")` #### m2. CHANGELOG entry placement without subsection **File:** `CHANGELOG.md:5-17` The new entry sits directly under `## Unreleased` without a `###` subsection header, while the rest of the changelog uses structured `### Added` / `### fix(...)` subsections. **Recommendation:** Move the entry under the appropriate `### Added` subsection. #### m3. `helper_m4_e2e_domain.py` at 494 lines — tight headroom **File:** `robot/helper_m4_e2e_domain.py` At 494 lines, the file has only 6 lines of headroom before the 500-line limit. Any future additions will immediately breach it. **Recommendation:** Consider proactively splitting the two merge functions (`merge_clean`, `merge_conflict`) into a `helper_m4_e2e_merge.py` module (~90 lines each), dropping domain to ~310 lines. #### m4. Eager imports in dispatcher negate split benefit **File:** `robot/helper_m4_e2e_verification.py:33-46` The dispatcher unconditionally imports both `helper_m4_e2e_cli` and `helper_m4_e2e_domain` at module level. When running a domain-only command (e.g., `merge-clean`), the CLI module's heavy imports (`typer.testing`, `cleveragents.cli.commands.plan`) are loaded unnecessarily. **Recommendation:** Use lazy imports inside the dispatch lookup so each command only loads its own module. Estimated savings: 50-200 ms per domain-only invocation. #### m5. `plan_diff()` does not verify `fmt` kwarg **File:** `robot/helper_m4_e2e_cli.py:93-99` The production code calls `service.diff(plan_id, fmt=fmt)`, but the test only verifies `plan_id` was passed correctly. If a regression dropped the `fmt` argument, this test would not catch it. **Recommendation:** Add an assertion checking `call_args.kwargs.get("fmt")`. #### m6. `cli_plan_execute()` does not verify `get_plan()` guard call **File:** `robot/helper_m4_e2e_cli.py:198-258` The `execute` command calls `service.get_plan(plan_id)` for the read-only guard. The test sets up the mock return but never asserts `get_plan` was actually called. **Recommendation:** Add `_assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID)`. #### m7. `sys.path.insert(0, ...)` could shadow stdlib modules **Files:** `helper_m4_e2e_common.py:17-19`, `helper_m4_e2e_cli.py:22-24`, `helper_m4_e2e_domain.py:21-23`, `helper_m4_e2e_verification.py:29-31` Using `sys.path.insert(0, _SRC)` places `src/` at the front of the search path. If a file with the same name as a stdlib module were ever placed in `src/`, it would shadow the stdlib. **Recommendation:** Use `sys.path.append(_SRC)` instead, or centralize the path setup. --- ### Nit #### n1. `parallel_max()` value readback assertions are noise `robot/helper_m4_e2e_domain.py:187-190` — Asserts `config.execution_mode == PARALLEL` immediately after constructing `SubplanConfig(execution_mode=ExecutionMode.PARALLEL)`. Can never fail. #### n2. `cli_plan_tree()` patches at module level inconsistently `robot/helper_m4_e2e_cli.py:340-343` — Other CLI tests mock private helpers (`_get_lifecycle_service`), but this one patches `get_container` at its definition site. Consider extracting a `_get_decision_service()` helper for consistency. #### n3. Duplicated `sys.path` block across 4 files All 4 modules repeat the identical 3-line `_SRC` path insertion. Since `helper_m4_e2e_common.py` is always imported first, the duplication is technically redundant (though defensible for standalone invocation). #### n4. Import ordering after `sys.path` manipulation `robot/helper_m4_e2e_cli.py:26-49` — Local helper (`helper_m4_e2e_common`) is imported before third-party (`typer.testing`). Conventional ordering is third-party then local. #### n5. `CliResult` type annotation sourced from `click.testing` vs `typer.testing` `robot/helper_m4_e2e_common.py:37` — `from click.testing import Result as CliResult` works because Typer wraps Click, but `from typer.testing import Result` would be clearer. --- ### Self-Review Issue Verification (All 10 Fixed) | # | Issue | Status | |---|-------|--------| | 1 | Test tautology in `cli_plan_execute()` | Fixed (lines 249-256 explain removal) | | 2 | Unguarded `call_args[0][0]` in `plan_diff()` | Fixed (kwargs fallback at lines 94-99) | | 3 | File exceeds 500-line limit | Fixed (4-way split: 237+426+494+81) | | 4 | `AttributeError` risk on `project_links` | Fixed (`isinstance` guard at line 180) | | 5 | `assert_called_once*()` bypasses `_fail()` | Fixed (wrapper helpers in common) | | 6 | Asymmetric positional/keyword fallback | Fixed (symmetric at lines 164-175) | | 7 | Overly broad "execute" string match | Fixed (word-boundary regex at line 246) | | 8 | DRY violations | Fixed (factories + helpers extracted) | | 9 | Non-deterministic `datetime.now()` | Fixed (`_FROZEN_NOW` constant) | | 10 | Generic ULID naming | Fixed (descriptive names at lines 54-58) | --- ### M4 Acceptance Criteria Coverage | # | Criterion | Coverage | |---|-----------|----------| | 1 | `plan use` creates plan | Covered (`cli_plan_use`) | | 2 | `plan execute` spawns subplans | Covered (`cli_plan_execute` + `spawn_subplans`) | | 3 | `plan tree` displays hierarchy | Covered (`cli_plan_tree`); domain test is tautological (M1) | | 4 | `plan diff` shows merged results | Covered (`plan_diff`) | | 5 | Parallel max_parallel | Covered (`parallel_max`) | | 6 | Three-way merge non-conflicting | Covered (`merge_clean`) | | 7 | Merge conflicts surfaced | Covered (`merge_conflict`) | | 8 | Parent tracks statuses | Covered (`parent_tracking`) | ### Security & Compliance No security vulnerabilities. All decision types and plan lifecycle phases align with `docs/specification.md`. PR description includes `Closes #495`. Milestone and label correctly assigned. Type safety is clean (zero `# type: ignore`). Commit message follows Conventional Changelog.
hurui200320 force-pushed test/m4-acceptance-gate from ecf508797a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 2m56s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m25s
CI / coverage (pull_request) Successful in 5m28s
CI / benchmark-regression (pull_request) Successful in 36m1s
to 5b0d934a50
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 42s
CI / integration_tests (pull_request) Successful in 3m20s
CI / unit_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 41s
CI / coverage (pull_request) Successful in 5m37s
CI / benchmark-regression (pull_request) Successful in 37m13s
2026-03-12 08:56:59 +00:00
Compare
Author
Member

Second Review Fixes Complete — All CI Green

All 3 major, 7 minor, and 5 nit issues from the second self-review have been addressed in commit 5b0d934.

Major Fixes

  1. M1: Tautological plan_tree() domain test — Removed tautological tree dict construction (lines 120-136 of old file). The test now validates SubplanStatus field completeness (subplan_id, action_name, changeset_summary, files_changed) and parent→child PlanIdentity linkage. Tree construction logic is exercised by the CLI-level cli_plan_tree() test which invokes real build_decision_tree().

  2. M2: No CLI error-path tests — Added 4 new error-path tests in helper_m4_e2e_cli_errors.py:

    • cli_plan_execute_readonly: read-only plan → abort with "read-only" message
    • cli_plan_use_not_found: ActionNotAvailableError(ARCHIVED) → abort with error
    • cli_plan_diff_no_changeset: PlanError → abort with "Diff Error"
    • cli_plan_tree_empty: empty decisions → "No decisions found" info message, exit 0
  3. M3: Duplicate CHANGELOG entries — Removed the new entry at lines 5-17 (under ## Unreleased without subsection). Consolidated into the existing entry under ### Added.

Minor Fixes (m1–m7)

  1. m1: shutil.which("git") pre-check in _assert_git_available() before merge tests
  2. m2: CHANGELOG entry now under proper ### Added subsection
  3. m3: Merge functions → helper_m4_e2e_merge.py (146 lines); domain dropped to 385 lines
  4. m4: importlib.import_module() lazy dispatch — domain commands no longer load CLI dependencies
  5. m5: plan_diff() verifies fmt kwarg forwarded correctly
  6. m6: cli_plan_execute() verifies get_plan() guard call via _assert_mock_called_once_with
  7. m7: sys.path.insert(0, _SRC)sys.path.append(_SRC) in all 6 modules

Nit Fixes (n1–n5)

  1. n1: Removed tautological parallel_max() readback assertions (config.execution_mode, config.max_parallel)
  2. n2: Added comment in cli_plan_tree() explaining get_container patching vs _get_lifecycle_service
  3. n3: Kept sys.path blocks for standalone invocation safety (documented as idempotent)
  4. n4: Fixed import ordering: helper_m4_e2e_commontypercleveragents
  5. n5: Added comment explaining from click.testing import Result as CliResult is canonical

File Structure

Module Lines Status
helper_m4_e2e_common.py 244 Under limit
helper_m4_e2e_domain.py 385 Under limit
helper_m4_e2e_merge.py 146 NEW
helper_m4_e2e_cli.py 445 Under limit
helper_m4_e2e_cli_errors.py 182 NEW
helper_m4_e2e_verification.py 105 Under limit

CI Status

All 11 nox sessions passed (32m total):

  • lint, format, typecheck (0 errors), security_scan, dead_code
  • unit_tests (0 failures), integration_tests (14/14 M4 E2E tests pass)
  • coverage (98%, threshold 97%), docs, build, benchmark

Ready for external review.

## Second Review Fixes Complete — All CI Green All 3 major, 7 minor, and 5 nit issues from the second self-review have been addressed in commit `5b0d934`. ### Major Fixes 1. **M1: Tautological `plan_tree()` domain test** — Removed tautological tree dict construction (lines 120-136 of old file). The test now validates SubplanStatus field completeness (subplan_id, action_name, changeset_summary, files_changed) and parent→child `PlanIdentity` linkage. Tree *construction* logic is exercised by the CLI-level `cli_plan_tree()` test which invokes real `build_decision_tree()`. 2. **M2: No CLI error-path tests** — Added 4 new error-path tests in `helper_m4_e2e_cli_errors.py`: - `cli_plan_execute_readonly`: read-only plan → abort with "read-only" message - `cli_plan_use_not_found`: `ActionNotAvailableError(ARCHIVED)` → abort with error - `cli_plan_diff_no_changeset`: `PlanError` → abort with "Diff Error" - `cli_plan_tree_empty`: empty decisions → "No decisions found" info message, exit 0 3. **M3: Duplicate CHANGELOG entries** — Removed the new entry at lines 5-17 (under `## Unreleased` without subsection). Consolidated into the existing entry under `### Added`. ### Minor Fixes (m1–m7) 4. **m1**: `shutil.which("git")` pre-check in `_assert_git_available()` before merge tests 5. **m2**: CHANGELOG entry now under proper `### Added` subsection 6. **m3**: Merge functions → `helper_m4_e2e_merge.py` (146 lines); domain dropped to 385 lines 7. **m4**: `importlib.import_module()` lazy dispatch — domain commands no longer load CLI dependencies 8. **m5**: `plan_diff()` verifies `fmt` kwarg forwarded correctly 9. **m6**: `cli_plan_execute()` verifies `get_plan()` guard call via `_assert_mock_called_once_with` 10. **m7**: `sys.path.insert(0, _SRC)` → `sys.path.append(_SRC)` in all 6 modules ### Nit Fixes (n1–n5) 11. **n1**: Removed tautological `parallel_max()` readback assertions (config.execution_mode, config.max_parallel) 12. **n2**: Added comment in `cli_plan_tree()` explaining `get_container` patching vs `_get_lifecycle_service` 13. **n3**: Kept `sys.path` blocks for standalone invocation safety (documented as idempotent) 14. **n4**: Fixed import ordering: `helper_m4_e2e_common` → `typer` → `cleveragents` 15. **n5**: Added comment explaining `from click.testing import Result as CliResult` is canonical ### File Structure | Module | Lines | Status | |--------|-------|--------| | `helper_m4_e2e_common.py` | 244 | Under limit | | `helper_m4_e2e_domain.py` | 385 | Under limit | | `helper_m4_e2e_merge.py` | 146 | NEW | | `helper_m4_e2e_cli.py` | 445 | Under limit | | `helper_m4_e2e_cli_errors.py` | 182 | NEW | | `helper_m4_e2e_verification.py` | 105 | Under limit | ### CI Status **All 11 nox sessions passed** (32m total): - lint, format, typecheck (0 errors), security_scan, dead_code - unit_tests (0 failures), integration_tests (14/14 M4 E2E tests pass) - coverage (98%, threshold 97%), docs, build, benchmark Ready for external review.
Member

PR Review Report: #681

PR Details

Reviewer: Aditya
Branch: test/m4-acceptance-gatemaster
Commit: 5b0d934a50

  • Scope Reviewed (branch-only changes):
    • CHANGELOG.md
    • robot/helper_m4_e2e_common.py
    • robot/helper_m4_e2e_domain.py
    • robot/helper_m4_e2e_merge.py
    • robot/helper_m4_e2e_cli.py
    • robot/helper_m4_e2e_cli_errors.py
    • robot/helper_m4_e2e_verification.py
    • robot/m4_e2e_verification.robot
    • robot/helpers_common.py

Findings (Priority Ordered)

P1 - fmt forwarding check can silently pass even when fmt is not forwarded

  • File: robot/helper_m4_e2e_cli.py
  • Area: plan_diff()
  • Why this is an issue:
    • The current check allows fmt_arg to remain None and still pass.
    • This means a regression where production code stops forwarding fmt would not be detected, even though this test claims to validate that behavior.
  • Risk:
    • False confidence in coverage for one of the previously identified review gaps.
    • Behavior drift in CLI output format forwarding can go unnoticed.
  • Recommended fix:
    • Make fmt assertion strict by requiring explicit presence and expected value.
    • Example intent: fail when fmt_arg is None; otherwise require fmt_arg == "rich".

P3 - Error-path tests use broad message matching and may accept wrong failure mode

  • File: robot/helper_m4_e2e_cli_errors.py
  • Area: cli_plan_use_not_found() and cli_plan_diff_no_changeset()
  • Why this is an issue:
    • Checks such as "error" in output_lower are permissive and can pass for unrelated failures.
    • This weakens the guarantee that the specific expected user-facing error path is exercised.
  • Risk:
    • Tests can remain green while command behavior regresses to a different error path.
  • Recommended fix:
    • Assert for command-specific message fragments (e.g., "Action not available" for use, "Diff Error" or "has no ChangeSet" for diff).
    • Keep a fallback alternative only if necessary for formatting differences.

Compliance Notes

  • CONTRIBUTING.md 500-line file rule: compliant for all newly introduced/split helper modules.
  • CONTRIBUTING.md changelog update requirement: compliant (CHANGELOG.md updated in this branch).
  • docs/specification.md alignment for M4 command coverage (plan use/execute/tree/diff and error paths): generally aligned, with the above robustness caveats in test assertions.

Final Assessment

  • The branch significantly improves structure, readability, and test coverage (including CLI error paths).
  • One meaningful assertion gap remains (fmt forwarding check), plus one low-severity robustness gap in error-message assertions.
# PR Review Report: #681 ## PR Details **Reviewer:** Aditya **Branch:** `test/m4-acceptance-gate` → `master` **Commit:** `5b0d934a50` - **Scope Reviewed (branch-only changes)**: - `CHANGELOG.md` - `robot/helper_m4_e2e_common.py` - `robot/helper_m4_e2e_domain.py` - `robot/helper_m4_e2e_merge.py` - `robot/helper_m4_e2e_cli.py` - `robot/helper_m4_e2e_cli_errors.py` - `robot/helper_m4_e2e_verification.py` - `robot/m4_e2e_verification.robot` - `robot/helpers_common.py` ## Findings (Priority Ordered) ### P1 - fmt forwarding check can silently pass even when `fmt` is not forwarded - **File**: `robot/helper_m4_e2e_cli.py` - **Area**: `plan_diff()` - **Why this is an issue**: - The current check allows `fmt_arg` to remain `None` and still pass. - This means a regression where production code stops forwarding `fmt` would not be detected, even though this test claims to validate that behavior. - **Risk**: - False confidence in coverage for one of the previously identified review gaps. - Behavior drift in CLI output format forwarding can go unnoticed. - **Recommended fix**: - Make `fmt` assertion strict by requiring explicit presence and expected value. - Example intent: fail when `fmt_arg is None`; otherwise require `fmt_arg == "rich"`. ### P3 - Error-path tests use broad message matching and may accept wrong failure mode - **File**: `robot/helper_m4_e2e_cli_errors.py` - **Area**: `cli_plan_use_not_found()` and `cli_plan_diff_no_changeset()` - **Why this is an issue**: - Checks such as `"error" in output_lower` are permissive and can pass for unrelated failures. - This weakens the guarantee that the specific expected user-facing error path is exercised. - **Risk**: - Tests can remain green while command behavior regresses to a different error path. - **Recommended fix**: - Assert for command-specific message fragments (e.g., `"Action not available"` for use, `"Diff Error"` or `"has no ChangeSet"` for diff). - Keep a fallback alternative only if necessary for formatting differences. ## Compliance Notes - `CONTRIBUTING.md` 500-line file rule: **compliant** for all newly introduced/split helper modules. - `CONTRIBUTING.md` changelog update requirement: **compliant** (`CHANGELOG.md` updated in this branch). - `docs/specification.md` alignment for M4 command coverage (`plan use/execute/tree/diff` and error paths): **generally aligned**, with the above robustness caveats in test assertions. ## Final Assessment - The branch significantly improves structure, readability, and test coverage (including CLI error paths). - One meaningful assertion gap remains (`fmt` forwarding check), plus one low-severity robustness gap in error-message assertions.
freemo approved these changes 2026-03-12 20:44:09 +00:00
Dismissed
freemo left a comment

PM Review — Day 32

Verdict: APPROVED

Summary

Excellent M4 acceptance criteria validation PR from @hurui200320. This is exactly the kind of thorough E2E verification needed to close milestone v3.3.0 (M4).

Process Compliance

  • PR body: Comprehensive — summary, changes table, file split rationale, quality gates, acceptance criteria checklist. Exceeds template requirements.
  • Closes keyword: Closes #495 — correct.
  • Labels: Added missing labels (Priority/High, MoSCoW/Must have, Points/8, State/In Progress).
  • Milestone: v3.3.0 — correct, this is the M4 acceptance gate.
  • Assignee: @hurui200320 — correct.

Technical Assessment (from PR body)

  • 14 E2E tests (10 happy-path + 4 error-path) covering all 8 M4 acceptance criteria.
  • File split from 1074-line monolith to 6 focused modules (all under 500-line limit per CONTRIBUTING.md).
  • All 11 nox quality gates pass including 98% coverage.
  • Addressed 3 blocking + 7 non-blocking + 5 nit findings from self-review.

Recommendation

@freemo: Please merge at your earliest convenience. This PR gates M4 milestone closure and is already 10 days overdue (M4 due Mar 2). Mergeable, no conflicts.

Labels Added

  • Priority/High, MoSCoW/Must have, Points/8, State/In Progress
## PM Review — Day 32 ### Verdict: APPROVED ### Summary Excellent M4 acceptance criteria validation PR from @hurui200320. This is exactly the kind of thorough E2E verification needed to close milestone v3.3.0 (M4). ### Process Compliance - **PR body**: Comprehensive — summary, changes table, file split rationale, quality gates, acceptance criteria checklist. Exceeds template requirements. - **Closes keyword**: `Closes #495` — correct. - **Labels**: Added missing labels (Priority/High, MoSCoW/Must have, Points/8, State/In Progress). - **Milestone**: v3.3.0 — correct, this is the M4 acceptance gate. - **Assignee**: @hurui200320 — correct. ### Technical Assessment (from PR body) - 14 E2E tests (10 happy-path + 4 error-path) covering all 8 M4 acceptance criteria. - File split from 1074-line monolith to 6 focused modules (all under 500-line limit per CONTRIBUTING.md). - All 11 nox quality gates pass including 98% coverage. - Addressed 3 blocking + 7 non-blocking + 5 nit findings from self-review. ### Recommendation **@freemo**: Please merge at your earliest convenience. This PR gates M4 milestone closure and is already 10 days overdue (M4 due Mar 2). Mergeable, no conflicts. ### Labels Added - Priority/High, MoSCoW/Must have, Points/8, State/In Progress
hurui200320 force-pushed test/m4-acceptance-gate from 5b0d934a50
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 42s
CI / integration_tests (pull_request) Successful in 3m20s
CI / unit_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 41s
CI / coverage (pull_request) Successful in 5m37s
CI / benchmark-regression (pull_request) Successful in 37m13s
to 766fa74a94
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 26s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 3m2s
CI / integration_tests (pull_request) Successful in 3m36s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 05:59:30 +00:00
Compare
hurui200320 dismissed freemo's review 2026-03-13 05:59:30 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Review Fixes Applied — Aditya's Findings Addressed

@aditya Both findings from your review have been addressed in the force-pushed commit 766fa74a. Branch has been rebased onto latest master.

P1: fmt forwarding check made strict

File: robot/helper_m4_e2e_cli.py, plan_diff()

The fmt assertion was previously permissive — fmt_arg could remain None and the test would still pass. Fixed by:

  • Failing explicitly when fmt_arg is None (regression where fmt is not forwarded is now detected)
  • Requiring fmt_arg == "rich" (the CLI default)

P3: Error-path tests use command-specific message fragments

File: robot/helper_m4_e2e_cli_errors.py

  • cli_plan_use_not_found(): Changed from "not available" or "error" to asserting "action not available" — matches the production CLI output "[red]Action not available:[/red] {e}"
  • cli_plan_diff_no_changeset(): Changed from "error" or "changeset" to asserting "diff error" or "has no changeset" — matches the production CLI output "[red]Diff Error:[/red] {e.message}"

Additional fix: DI container mock alignment

Files: robot/helper_m4_e2e_cli.py (cli_plan_tree()), robot/helper_m4_e2e_cli_errors.py (cli_plan_tree_empty())

The production tree command now calls container.decision_service() instead of container.resolve(). Updated both tree tests to mock decision_service() to match the current production API.

Quality Gates

Stage Result
lint PASS
typecheck PASS (0 errors)
unit_tests PASS (10,700 scenarios, 0 failures)
integration_tests PASS (1,502 tests, 0 failures)
coverage_report PASS (98%, threshold 97%)
## Review Fixes Applied — Aditya's Findings Addressed @aditya Both findings from your review have been addressed in the force-pushed commit `766fa74a`. Branch has been rebased onto latest master. ### P1: `fmt` forwarding check made strict **File:** `robot/helper_m4_e2e_cli.py`, `plan_diff()` The `fmt` assertion was previously permissive — `fmt_arg` could remain `None` and the test would still pass. Fixed by: - Failing explicitly when `fmt_arg is None` (regression where `fmt` is not forwarded is now detected) - Requiring `fmt_arg == "rich"` (the CLI default) ### P3: Error-path tests use command-specific message fragments **File:** `robot/helper_m4_e2e_cli_errors.py` - `cli_plan_use_not_found()`: Changed from `"not available" or "error"` to asserting `"action not available"` — matches the production CLI output `"[red]Action not available:[/red] {e}"` - `cli_plan_diff_no_changeset()`: Changed from `"error" or "changeset"` to asserting `"diff error" or "has no changeset"` — matches the production CLI output `"[red]Diff Error:[/red] {e.message}"` ### Additional fix: DI container mock alignment **Files:** `robot/helper_m4_e2e_cli.py` (`cli_plan_tree()`), `robot/helper_m4_e2e_cli_errors.py` (`cli_plan_tree_empty()`) The production `tree` command now calls `container.decision_service()` instead of `container.resolve()`. Updated both tree tests to mock `decision_service()` to match the current production API. ### Quality Gates | Stage | Result | |-------|--------| | lint | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (10,700 scenarios, 0 failures) | | integration_tests | PASS (1,502 tests, 0 failures) | | coverage_report | PASS (98%, threshold 97%) |
hurui200320 force-pushed test/m4-acceptance-gate from 766fa74a94
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 26s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 3m2s
CI / integration_tests (pull_request) Successful in 3m36s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 317d015260
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 27s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 3m0s
CI / docker (pull_request) Successful in 10s
CI / integration_tests (pull_request) Successful in 3m29s
CI / coverage (pull_request) Successful in 5m54s
CI / lint (push) Successful in 13s
CI / build (push) Successful in 14s
CI / quality (push) Successful in 17s
CI / security (push) Successful in 38s
CI / e2e_tests (push) Successful in 36s
CI / typecheck (push) Successful in 39s
CI / benchmark-regression (push) Has been skipped
CI / unit_tests (push) Successful in 2m59s
CI / integration_tests (push) Successful in 3m24s
CI / docker (push) Successful in 38s
CI / coverage (push) Successful in 5m54s
CI / benchmark-publish (push) Successful in 19m34s
CI / benchmark-regression (pull_request) Successful in 36m21s
2026-03-13 06:03:46 +00:00
Compare
hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-13 06:04:10 +00:00
hurui200320 deleted branch test/m4-acceptance-gate 2026-03-13 06:10:24 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!681
No description provided.