test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #681

2026-03-11T07:02:08Z

hurui200320 commented

2026-03-11 07:02:08 +00:00

Summary

Re-verifies all M4 acceptance criteria against the current master branch and closes CLI test coverage gaps identified in code review. Addresses all 3 blocking, 7 non-blocking, and 5 nit issues from the second self-review.

Changes

File Split (CONTRIBUTING.md 500-line compliance)

Split the 1074-line helper_m4_e2e_verification.py into six focused modules:

Module	Lines	Responsibility
`helper_m4_e2e_common.py`	244	Constants, `_fail()`, mock assertion wrappers, `_make_subplan_status` factory, plan mock factories, frozen timestamp
`helper_m4_e2e_domain.py`	385	Domain model tests: spawn-subplans, plan-tree, parallel-max, parent-tracking
`helper_m4_e2e_merge.py`	146	Merge tests: merge-clean, merge-conflict (with `shutil.which("git")` pre-check)
`helper_m4_e2e_cli.py`	445	CLI happy-path tests: plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree
`helper_m4_e2e_cli_errors.py`	182	CLI error-path tests: execute-readonly, use-not-found, diff-no-changeset, tree-empty
`helper_m4_e2e_verification.py`	105	Thin dispatcher with lazy `importlib` imports (entry point unchanged for Robot Framework)

Major Fixes (M1–M3)

#	Issue	Fix
M1	Tautological `plan_tree()` domain test built a tree dict from its own injected data	Removed tautological tree construction; redocumented test as SubplanStatus field completeness + parent linkage validation
M2	No CLI error-path tests — all four CLI functions only exercised happy paths	Added 4 error-path tests: `execute` on read-only plan, `use` with `ActionNotAvailableError`, `diff` with `PlanError` (no changeset), `tree` with zero decisions
M3	Duplicate CHANGELOG entries for #495 (lines 5-17 and 205-211)	Consolidated into a single entry under `### Added`

Minor Fixes (m1–m7)

#	Issue	Fix
m1	Environment-dependent merge test failure if `git` not on `$PATH`	Added `_assert_git_available()` pre-check using `shutil.which("git")`
m2	CHANGELOG entry placement without subsection	Consolidated into the existing `### Added` entry
m3	`helper_m4_e2e_domain.py` at 494 lines — tight headroom	Moved merge functions to `helper_m4_e2e_merge.py`; domain dropped to 385 lines
m4	Eager imports in dispatcher negate split benefit	Replaced with `importlib.import_module()` lazy dispatch
m5	`plan_diff()` does not verify `fmt` kwarg	Added assertion checking `call_args.kwargs.get("fmt")`
m6	`cli_plan_execute()` does not verify `get_plan()` guard call	Added `_assert_mock_called_once_with(mock_service.get_plan, ...)`
m7	`sys.path.insert(0, ...)` could shadow stdlib modules	Changed to `sys.path.append(_SRC)` in all modules

Nit Fixes (n1–n5)

#	Issue	Fix
n1	`parallel_max()` value readback assertions are noise	Removed tautological readback assertions
n2	`cli_plan_tree()` patches at module level inconsistently	Added comment explaining the tree command resolves DecisionService via container, not lifecycle service
n3	Duplicated `sys.path` block across modules	Kept for defensive standalone invocation (documented as idempotent)
n4	Import ordering after `sys.path` manipulation	Fixed: third-party (`helper_m4_e2e_common`, `typer`) before first-party (`cleveragents`)
n5	`CliResult` type annotation sourced from `click.testing` vs `typer.testing`	Added clarifying comment: `click.testing.Result` is canonical since Typer re-exports `CliRunner` but not `Result`

CHANGELOG

Consolidated into a single ### Added entry documenting the complete set of review-fix improvements.

M4 Acceptance Criteria Verification

All 14 M4 E2E verification tests pass (10 happy-path + 4 error-path):

#	Criterion	Test(s)	Status
1	`plan use` creates plan	`cli-plan-use` + `cli-plan-use-not-found` (error)	PASS
2	`plan execute` spawns subplans	`cli-plan-execute` + `cli-plan-execute-readonly` (error)	PASS
3	`plan tree` displays tree	`cli-plan-tree` + `cli-plan-tree-empty` (error)	PASS
4	`plan diff` shows merged results	`plan-diff` + `cli-plan-diff-no-changeset` (error)	PASS
5	Parallel execution with max_parallel	`parallel-max`	PASS
6	Three-way merge non-conflicting	`merge-clean`	PASS
7	Merge conflicts surfaced	`merge-conflict`	PASS
8	Parent tracks subplan statuses	`parent-tracking`	PASS

Quality Gates

Stage	Result
lint	PASS
format	PASS
typecheck	PASS (0 errors)
unit_tests	PASS (0 failures)
integration_tests	PASS (14 M4 E2E tests, 0 failures)
coverage_report	PASS (98%, threshold 97%)
security_scan	PASS
dead_code	PASS
docs	PASS
build	PASS
benchmark	PASS

Closes #495

## Summary Re-verifies all M4 acceptance criteria against the current master branch and closes CLI test coverage gaps identified in code review. Addresses all 3 blocking, 7 non-blocking, and 5 nit issues from the second self-review. ## Changes ### File Split (CONTRIBUTING.md 500-line compliance) Split the 1074-line `helper_m4_e2e_verification.py` into six focused modules: | Module | Lines | Responsibility | |--------|-------|----------------| | `helper_m4_e2e_common.py` | 244 | Constants, `_fail()`, mock assertion wrappers, `_make_subplan_status` factory, plan mock factories, frozen timestamp | | `helper_m4_e2e_domain.py` | 385 | Domain model tests: spawn-subplans, plan-tree, parallel-max, parent-tracking | | `helper_m4_e2e_merge.py` | 146 | Merge tests: merge-clean, merge-conflict (with `shutil.which("git")` pre-check) | | `helper_m4_e2e_cli.py` | 445 | CLI happy-path tests: plan-diff, cli-plan-use, cli-plan-execute, cli-plan-tree | | `helper_m4_e2e_cli_errors.py` | 182 | CLI error-path tests: execute-readonly, use-not-found, diff-no-changeset, tree-empty | | `helper_m4_e2e_verification.py` | 105 | Thin dispatcher with lazy `importlib` imports (entry point unchanged for Robot Framework) | ### Major Fixes (M1–M3) | # | Issue | Fix | |---|-------|-----| | M1 | Tautological `plan_tree()` domain test built a tree dict from its own injected data | Removed tautological tree construction; redocumented test as SubplanStatus field completeness + parent linkage validation | | M2 | No CLI error-path tests — all four CLI functions only exercised happy paths | Added 4 error-path tests: `execute` on read-only plan, `use` with `ActionNotAvailableError`, `diff` with `PlanError` (no changeset), `tree` with zero decisions | | M3 | Duplicate CHANGELOG entries for #495 (lines 5-17 and 205-211) | Consolidated into a single entry under `### Added` | ### Minor Fixes (m1–m7) | # | Issue | Fix | |---|-------|-----| | m1 | Environment-dependent merge test failure if `git` not on `$PATH` | Added `_assert_git_available()` pre-check using `shutil.which("git")` | | m2 | CHANGELOG entry placement without subsection | Consolidated into the existing `### Added` entry | | m3 | `helper_m4_e2e_domain.py` at 494 lines — tight headroom | Moved merge functions to `helper_m4_e2e_merge.py`; domain dropped to 385 lines | | m4 | Eager imports in dispatcher negate split benefit | Replaced with `importlib.import_module()` lazy dispatch | | m5 | `plan_diff()` does not verify `fmt` kwarg | Added assertion checking `call_args.kwargs.get("fmt")` | | m6 | `cli_plan_execute()` does not verify `get_plan()` guard call | Added `_assert_mock_called_once_with(mock_service.get_plan, ...)` | | m7 | `sys.path.insert(0, ...)` could shadow stdlib modules | Changed to `sys.path.append(_SRC)` in all modules | ### Nit Fixes (n1–n5) | # | Issue | Fix | |---|-------|-----| | n1 | `parallel_max()` value readback assertions are noise | Removed tautological readback assertions | | n2 | `cli_plan_tree()` patches at module level inconsistently | Added comment explaining the tree command resolves DecisionService via container, not lifecycle service | | n3 | Duplicated `sys.path` block across modules | Kept for defensive standalone invocation (documented as idempotent) | | n4 | Import ordering after `sys.path` manipulation | Fixed: third-party (`helper_m4_e2e_common`, `typer`) before first-party (`cleveragents`) | | n5 | `CliResult` type annotation sourced from `click.testing` vs `typer.testing` | Added clarifying comment: `click.testing.Result` is canonical since Typer re-exports `CliRunner` but not `Result` | ### CHANGELOG Consolidated into a single `### Added` entry documenting the complete set of review-fix improvements. ## M4 Acceptance Criteria Verification All 14 M4 E2E verification tests pass (10 happy-path + 4 error-path): | # | Criterion | Test(s) | Status | |---|-----------|---------|--------| | 1 | `plan use` creates plan | `cli-plan-use` + `cli-plan-use-not-found` (error) | PASS | | 2 | `plan execute` spawns subplans | `cli-plan-execute` + `cli-plan-execute-readonly` (error) | PASS | | 3 | `plan tree` displays tree | `cli-plan-tree` + `cli-plan-tree-empty` (error) | PASS | | 4 | `plan diff` shows merged results | `plan-diff` + `cli-plan-diff-no-changeset` (error) | PASS | | 5 | Parallel execution with max_parallel | `parallel-max` | PASS | | 6 | Three-way merge non-conflicting | `merge-clean` | PASS | | 7 | Merge conflicts surfaced | `merge-conflict` | PASS | | 8 | Parent tracks subplan statuses | `parent-tracking` | PASS | ## Quality Gates | Stage | Result | |-------|--------| | lint | PASS | | format | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (0 failures) | | integration_tests | PASS (14 M4 E2E tests, 0 failures) | | coverage_report | PASS (98%, threshold 97%) | | security_scan | PASS | | dead_code | PASS | | docs | PASS | | build | PASS | | benchmark | PASS | Closes #495

hurui200320 added this to the v3.3.0 milestone 2026-03-11 07:02:14 +00:00

hurui200320 added the

Type

Testing

label 2026-03-11 07:02:15 +00:00

hurui200320 referenced this pull request

2026-03-11 07:03:25 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #495

hurui200320 commented

2026-03-11 08:01:42 +00:00

Code Review: PR #681 — M4 E2E Acceptance Test Strengthening

Self-review notes by @hurui200320 (posted as comment since Forgejo does not allow self-reviews)

Overall Assessment

The PR closes 3 of the 4 review-identified gaps (C1, C3, C4) with well-designed assertions. C2 is partially addressed. Quality gates pass. Specification compliance and security posture are sound. However, several issues should be addressed before requesting peer review.

Verdict: NEEDS CHANGES (3 blocking, 7 non-blocking)

Blocking Issues

1. Test Tautology — `cli_plan_execute()` post-plan assertions verify mock fixture, not system behavior

File: robot/helper_m4_e2e_verification.py:849-858 | Severity: Major

The assertions at lines 849–858 check post_plan.phase, post_plan.subplan_statuses, and post_plan.has_subplans outside the with patch(...) block (which ends at line 848). These verify the Plan object the test itself constructed at lines 815–819 — they can never fail. The C2 gap ("cli-plan-execute didn't verify subplan info") is not genuinely closed at the CLI output level.

The implementation notes acknowledge the CLI does not render subplan_count in its output. If this is a known CLI limitation, the tautological assertions should either be:

Removed (they add no coverage and are misleading), or
Replaced with a comment documenting that subplan info cannot be verified from CLI output, tracked as a follow-up.

2. Unguarded positional arg access in `plan_diff()` — potential `IndexError`

File: robot/helper_m4_e2e_verification.py:342 | Severity: Major

if call_args[0][0] != _ROOT_ULID:

If the production CLI calls diff(plan_id=_ROOT_ULID) as a keyword argument, call_args[0] is () and call_args[0][0] raises an unhandled IndexError. Use the kwargs-with-positional-fallback pattern:

plan_id_arg = call_args.kwargs.get("plan_id")
if plan_id_arg is None and call_args.args:
    plan_id_arg = call_args.args[0]

3. File exceeds 500-line limit (1074 lines, 2× over)

File: robot/helper_m4_e2e_verification.py | Severity: Major (CONTRIBUTING.md violation)

CONTRIBUTING.md: "Keep files under 500 lines." While pre-existing, this PR widens the gap by 84 net lines. Natural split point: domain model tests vs. CLI integration tests → helper_m4_e2e_domain.py + helper_m4_e2e_cli.py + shared helper_m4_e2e_common.py.

Non-Blocking Recommendations

4. Unhandled `AttributeError` risk — `robot/helper_m4_e2e_verification.py:784`

If project_links contains non-ProjectLink objects, pl.project_name raises an unhandled AttributeError. Add type guarding:

if not all(isinstance(pl, ProjectLink) for pl in project_links):
    _fail(f"project_links should be list of ProjectLink, got {[type(p).__name__ for p in project_links]}")

5. `assert_called_once*()` bypasses `_fail()` pattern — lines 340, 767, 772, 835, 965

These raise raw AssertionError on failure instead of the standardized FAIL: messages. Consider wrapping in try/except.

6. Asymmetric positional/keyword fallback — lines 774–786

action_name has kwargs + positional fallback; project_links only checks kwargs. A refactor passing project_links positionally would cause a misleading test failure.

7. Overly broad "execute" string match — line 846

"execute" not in output_lower matches the word anywhere in the output. A more targeted check would be more robust.

8. DRY violations

SubplanStatus construction repeated 3× (lines 152–184, 402–424, 597–637). CLI exit-code checks repeated 4× (lines 336, 759, 827, 957). Extract parameterized factories.

9. Non-deterministic `datetime.now()` in fixtures (lines 88, 127, 154, 402, 594)

Multiple independent calls produce subtly inconsistent timestamps. Use a frozen constant.

10. Generic `_DECISION_ULID_N` naming (lines 868–872)

Rename to _DECISION_PROMPT_DEF, _DECISION_STRATEGY, etc. for self-documenting assertions.

M4 Acceptance Criteria Coverage

#	Criterion	Status
1	`plan use` creates plan	COVERED
2	`plan execute` spawns subplans	PARTIAL (see Issue #1)
3	`plan tree` displays hierarchy	COVERED
4	`plan diff` shows merged results	COVERED
5	Parallel max_parallel	COVERED (model-level)
6	Three-way merge	COVERED
7	Merge conflicts surfaced	COVERED
8	Parent tracks statuses	COVERED

C1–C4 Gap Closure

Gap	Status
C1: project argument	CLOSED
C2: subplan info	PARTIAL — CLI phase check real; domain checks tautological
C3: shallow matching	CLOSED
C4: diff content	CLOSED

Security & Compliance

No security vulnerabilities. All decision types and plan lifecycle phases align with docs/specification.md. PR description includes Closes #495. Milestone and label correctly assigned.

## Code Review: PR #681 — M4 E2E Acceptance Test Strengthening **Self-review notes by @hurui200320** *(posted as comment since Forgejo does not allow self-reviews)* ### Overall Assessment The PR closes 3 of the 4 review-identified gaps (C1, C3, C4) with well-designed assertions. C2 is partially addressed. Quality gates pass. Specification compliance and security posture are sound. However, several issues should be addressed before requesting peer review. **Verdict: NEEDS CHANGES** (3 blocking, 7 non-blocking) --- ### Blocking Issues #### 1. Test Tautology — `cli_plan_execute()` post-plan assertions verify mock fixture, not system behavior **File:** `robot/helper_m4_e2e_verification.py:849-858` | **Severity:** Major The assertions at lines 849–858 check `post_plan.phase`, `post_plan.subplan_statuses`, and `post_plan.has_subplans` **outside** the `with patch(...)` block (which ends at line 848). These verify the `Plan` object the test *itself* constructed at lines 815–819 — they can never fail. The C2 gap ("cli-plan-execute didn't verify subplan info") is not genuinely closed at the CLI output level. The implementation notes acknowledge the CLI does not render `subplan_count` in its output. If this is a known CLI limitation, the tautological assertions should either be: 1. **Removed** (they add no coverage and are misleading), or 2. **Replaced with a comment** documenting that subplan info cannot be verified from CLI output, tracked as a follow-up. #### 2. Unguarded positional arg access in `plan_diff()` — potential `IndexError` **File:** `robot/helper_m4_e2e_verification.py:342` | **Severity:** Major ```python if call_args[0][0] != _ROOT_ULID: ``` If the production CLI calls `diff(plan_id=_ROOT_ULID)` as a keyword argument, `call_args[0]` is `()` and `call_args[0][0]` raises an unhandled `IndexError`. Use the kwargs-with-positional-fallback pattern: ```python plan_id_arg = call_args.kwargs.get("plan_id") if plan_id_arg is None and call_args.args: plan_id_arg = call_args.args[0] ``` #### 3. File exceeds 500-line limit (1074 lines, 2× over) **File:** `robot/helper_m4_e2e_verification.py` | **Severity:** Major (CONTRIBUTING.md violation) CONTRIBUTING.md: *"Keep files under 500 lines."* While pre-existing, this PR widens the gap by 84 net lines. Natural split point: domain model tests vs. CLI integration tests → `helper_m4_e2e_domain.py` + `helper_m4_e2e_cli.py` + shared `helper_m4_e2e_common.py`. --- ### Non-Blocking Recommendations #### 4. Unhandled `AttributeError` risk — `robot/helper_m4_e2e_verification.py:784` If `project_links` contains non-`ProjectLink` objects, `pl.project_name` raises an unhandled `AttributeError`. Add type guarding: ```python if not all(isinstance(pl, ProjectLink) for pl in project_links): _fail(f"project_links should be list of ProjectLink, got {[type(p).__name__ for p in project_links]}") ``` #### 5. `assert_called_once*()` bypasses `_fail()` pattern — lines 340, 767, 772, 835, 965 These raise raw `AssertionError` on failure instead of the standardized `FAIL:` messages. Consider wrapping in try/except. #### 6. Asymmetric positional/keyword fallback — lines 774–786 `action_name` has kwargs + positional fallback; `project_links` only checks kwargs. A refactor passing `project_links` positionally would cause a misleading test failure. #### 7. Overly broad "execute" string match — line 846 `"execute" not in output_lower` matches the word anywhere in the output. A more targeted check would be more robust. #### 8. DRY violations SubplanStatus construction repeated 3× (lines 152–184, 402–424, 597–637). CLI exit-code checks repeated 4× (lines 336, 759, 827, 957). Extract parameterized factories. #### 9. Non-deterministic `datetime.now()` in fixtures (lines 88, 127, 154, 402, 594) Multiple independent calls produce subtly inconsistent timestamps. Use a frozen constant. #### 10. Generic `_DECISION_ULID_N` naming (lines 868–872) Rename to `_DECISION_PROMPT_DEF`, `_DECISION_STRATEGY`, etc. for self-documenting assertions. --- ### M4 Acceptance Criteria Coverage | # | Criterion | Status | |---|-----------|--------| | 1 | `plan use` creates plan | COVERED | | 2 | `plan execute` spawns subplans | **PARTIAL** (see Issue #1) | | 3 | `plan tree` displays hierarchy | COVERED | | 4 | `plan diff` shows merged results | COVERED | | 5 | Parallel max_parallel | COVERED (model-level) | | 6 | Three-way merge | COVERED | | 7 | Merge conflicts surfaced | COVERED | | 8 | Parent tracks statuses | COVERED | ### C1–C4 Gap Closure | Gap | Status | |-----|--------| | C1: project argument | **CLOSED** | | C2: subplan info | **PARTIAL** — CLI phase check real; domain checks tautological | | C3: shallow matching | **CLOSED** | | C4: diff content | **CLOSED** | ### Security & Compliance No security vulnerabilities. All decision types and plan lifecycle phases align with `docs/specification.md`. PR description includes `Closes #495`. Milestone and label correctly assigned.

freemo added the

labels 2026-03-11 18:15:30 +00:00

freemo commented

2026-03-11 18:18:04 +00:00

PM Review — Day 31 (Specification Update)

This PR is mergeable.

Spec Alignment Check

M4 E2E acceptance tests are NOT directly impacted by protocol or TUI changes. However, note that the broader E2E test mocking issue (#658) may affect how these tests are structured long-term.

Status

Self-review by @hurui200320 identified 3 blocking issues:

Test tautology — assertions verify mock fixture not system behavior
Unguarded positional arg access
File exceeds 500-line limit (1074 lines)

Action Required

@hurui200320 — Address the 3 blocking issues from your self-review, then request external review.

## PM Review — Day 31 (Specification Update) This PR is **mergeable**. ### Spec Alignment Check M4 E2E acceptance tests are NOT directly impacted by protocol or TUI changes. However, note that the broader E2E test mocking issue (#658) may affect how these tests are structured long-term. ### Status Self-review by @hurui200320 identified 3 blocking issues: 1. Test tautology — assertions verify mock fixture not system behavior 2. Unguarded positional arg access 3. File exceeds 500-line limit (1074 lines) ### Action Required @hurui200320 — Address the 3 blocking issues from your self-review, then request external review.

hurui200320 referenced this issue from a commit

2026-03-12 02:35:22 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

hurui200320 force-pushed test/m4-acceptance-gate from 97c5dfbcd1 to 2e8ea55760

2026-03-12 02:35:22 +00:00

Compare

hurui200320 referenced this pull request

2026-03-12 02:36:34 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #495

hurui200320 force-pushed test/m4-acceptance-gate from 2e8ea55760 to ecf508797a

2026-03-12 02:37:21 +00:00

Compare

hurui200320 referenced this issue from a commit

2026-03-12 02:37:21 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

hurui200320 removed the

labels 2026-03-12 02:37:27 +00:00

hurui200320 commented

2026-03-12 05:04:42 +00:00

Review Fixes Complete — All CI Green

@freemo All 3 blocking and 7 non-blocking issues from the self-review have been addressed in commit ecf5087.

Blocking Fixes

Test tautology (cli_plan_execute) — removed tautological domain assertions; replaced with explanatory comment. Subplan coverage is provided by the dedicated spawn-subplans and parent-tracking domain tests.
Unguarded positional arg access (plan_diff) — replaced call_args[0][0] with kwargs.get("plan_id") + positional fallback pattern.
500-line limit violation — split 1074-line file into 4 modules: _common.py (237), _domain.py (495), _cli.py (426), dispatcher (81). All under 500 lines.

Non-Blocking Fixes

isinstance(pl, ProjectLink) type guard added
assert_called_once*() wrapped in _fail()-pattern helpers
project_links extraction made symmetric with action_name
"execute" in output_lower → re.search(r"\bexecute\b", ...) word-boundary regex
_make_subplan_status() factory and _assert_exit_code() helper extracted (DRY)
datetime.now() → _FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)
_DECISION_ULID_N → descriptive names (_DECISION_PROMPT_DEF, _DECISION_STRATEGY, etc.)

CI Status

All 11 stages passed (run #6171, 36m43s):

lint, format, typecheck, unit_tests (10,431), integration_tests (1,452), coverage (98%), security_scan, dead_code, docs, build, benchmarks

Ready for external review.

## Review Fixes Complete — All CI Green @freemo All 3 blocking and 7 non-blocking issues from the self-review have been addressed in commit `ecf5087`. ### Blocking Fixes 1. **Test tautology** (`cli_plan_execute`) — removed tautological domain assertions; replaced with explanatory comment. Subplan coverage is provided by the dedicated `spawn-subplans` and `parent-tracking` domain tests. 2. **Unguarded positional arg access** (`plan_diff`) — replaced `call_args[0][0]` with `kwargs.get("plan_id")` + positional fallback pattern. 3. **500-line limit violation** — split 1074-line file into 4 modules: `_common.py` (237), `_domain.py` (495), `_cli.py` (426), dispatcher (81). All under 500 lines. ### Non-Blocking Fixes 4. `isinstance(pl, ProjectLink)` type guard added 5. `assert_called_once*()` wrapped in `_fail()`-pattern helpers 6. `project_links` extraction made symmetric with `action_name` 7. `"execute" in output_lower` → `re.search(r"\bexecute\b", ...)` word-boundary regex 8. `_make_subplan_status()` factory and `_assert_exit_code()` helper extracted (DRY) 9. `datetime.now()` → `_FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)` 10. `_DECISION_ULID_N` → descriptive names (`_DECISION_PROMPT_DEF`, `_DECISION_STRATEGY`, etc.) ### CI Status **All 11 stages passed** (run #6171, 36m43s): - lint, format, typecheck, unit_tests (10,431), integration_tests (1,452), coverage (98%), security_scan, dead_code, docs, build, benchmarks Ready for external review.

hurui200320 self-assigned this 2026-03-12 05:04:59 +00:00

hurui200320 commented

2026-03-12 07:38:38 +00:00

Code Review: PR #681 — `test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure`

Reviewer: @hurui200320 (posted as comment — Forgejo does not allow self-reviews)
Branch: test/m4-acceptance-gate → master
Commit: ecf5087

Prior Review Items

All 3 blocking and 7 non-blocking issues from the self-review (comment by @hurui200320) and the PM review (comment by @freemo) have been verified as resolved in the latest commit. These are not repeated here.

Verdict: REQUEST CHANGES (3 Major, 7 Minor, 5 Nit)

Major Issues

M1. Tautological `plan_tree()` domain test

File: robot/helper_m4_e2e_domain.py:120-136

The test manually builds a tree dict from the subplan_statuses it injected into the parent plan, then asserts that dict contains those same IDs. No system-under-test function produces this tree — the test is checking its own construction. Lines 120-123:

tree: dict[str, list[str]] = {
    parent.identity.plan_id: [s.subplan_id for s in parent.subplan_statuses]
}

Then lines 126-136 assert _CHILD_A_ULID in tree[_ROOT_ULID] etc., which can never fail.

The CLI-level cli_plan_tree() test exercises real build_decision_tree() logic, so the M4 criterion has some coverage, but this domain test is misleading about what it verifies.

Recommendation: Either call a real domain function that builds the tree structure from flat data, or rename/redocument this test to clarify it only validates SubplanStatus field completeness (changeset_summary, files_changed, parent linkage) rather than tree construction.

M2. No CLI error-path tests

File: robot/helper_m4_e2e_cli.py (all CLI tests)

All four CLI test functions only exercise the happy path. None test:

plan execute on a read-only plan (should abort)
plan use with a nonexistent action (ActionNotAvailableError)
plan diff with a plan that has no changeset
plan tree with zero decisions

Error paths are part of the user-facing contract. Without error-path tests, a regression could silently swallow errors or crash with unhandled exceptions.

Recommendation: Add at least one error-path test per CLI command.

M3. Duplicate CHANGELOG entries for #495

File: CHANGELOG.md:5-17 and CHANGELOG.md:205-211

Issue #495 has two changelog entries:

Lines 5-17 (under ## Unreleased, no ### subsection): The entry added by this PR.
Lines 205-211 (under ### Added): A pre-existing entry from a previous commit.

Per CONTRIBUTING.md, one changelog entry per logical change is expected.

Recommendation: Consolidate into a single entry. Either remove the new entry (if the existing one is sufficient) or merge the content and remove the duplicate.

Minor Issues

m1. Environment-dependent merge test failure

File: robot/helper_m4_e2e_domain.py:278, 322

merge_clean() and merge_conflict() call GitMergeStrategy().merge(), which shells out to git merge-file. If git is not on $PATH, the fallback to SequentialMergeStrategy produces results that fail the assertions with misleading diagnostics (e.g., "missing ours change" instead of "git not found").

Recommendation: Add a pre-check: if shutil.which("git") is None: _fail("git is required on PATH for merge tests")

m2. CHANGELOG entry placement without subsection

File: CHANGELOG.md:5-17

The new entry sits directly under ## Unreleased without a ### subsection header, while the rest of the changelog uses structured ### Added / ### fix(...) subsections.

Recommendation: Move the entry under the appropriate ### Added subsection.

m3. `helper_m4_e2e_domain.py` at 494 lines — tight headroom

File: robot/helper_m4_e2e_domain.py

At 494 lines, the file has only 6 lines of headroom before the 500-line limit. Any future additions will immediately breach it.

Recommendation: Consider proactively splitting the two merge functions (merge_clean, merge_conflict) into a helper_m4_e2e_merge.py module (~90 lines each), dropping domain to ~310 lines.

m4. Eager imports in dispatcher negate split benefit

File: robot/helper_m4_e2e_verification.py:33-46

The dispatcher unconditionally imports both helper_m4_e2e_cli and helper_m4_e2e_domain at module level. When running a domain-only command (e.g., merge-clean), the CLI module's heavy imports (typer.testing, cleveragents.cli.commands.plan) are loaded unnecessarily.

Recommendation: Use lazy imports inside the dispatch lookup so each command only loads its own module. Estimated savings: 50-200 ms per domain-only invocation.

m5. `plan_diff()` does not verify `fmt` kwarg

File: robot/helper_m4_e2e_cli.py:93-99

The production code calls service.diff(plan_id, fmt=fmt), but the test only verifies plan_id was passed correctly. If a regression dropped the fmt argument, this test would not catch it.

Recommendation: Add an assertion checking call_args.kwargs.get("fmt").

m6. `cli_plan_execute()` does not verify `get_plan()` guard call

File: robot/helper_m4_e2e_cli.py:198-258

The execute command calls service.get_plan(plan_id) for the read-only guard. The test sets up the mock return but never asserts get_plan was actually called.

Recommendation: Add _assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID).

m7. `sys.path.insert(0, ...)` could shadow stdlib modules

Files: helper_m4_e2e_common.py:17-19, helper_m4_e2e_cli.py:22-24, helper_m4_e2e_domain.py:21-23, helper_m4_e2e_verification.py:29-31

Using sys.path.insert(0, _SRC) places src/ at the front of the search path. If a file with the same name as a stdlib module were ever placed in src/, it would shadow the stdlib.

Recommendation: Use sys.path.append(_SRC) instead, or centralize the path setup.

Nit

n1. `parallel_max()` value readback assertions are noise

robot/helper_m4_e2e_domain.py:187-190 — Asserts config.execution_mode == PARALLEL immediately after constructing SubplanConfig(execution_mode=ExecutionMode.PARALLEL). Can never fail.

n2. `cli_plan_tree()` patches at module level inconsistently

robot/helper_m4_e2e_cli.py:340-343 — Other CLI tests mock private helpers (_get_lifecycle_service), but this one patches get_container at its definition site. Consider extracting a _get_decision_service() helper for consistency.

n3. Duplicated `sys.path` block across 4 files

All 4 modules repeat the identical 3-line _SRC path insertion. Since helper_m4_e2e_common.py is always imported first, the duplication is technically redundant (though defensible for standalone invocation).

n4. Import ordering after `sys.path` manipulation

robot/helper_m4_e2e_cli.py:26-49 — Local helper (helper_m4_e2e_common) is imported before third-party (typer.testing). Conventional ordering is third-party then local.

n5. `CliResult` type annotation sourced from `click.testing` vs `typer.testing`

robot/helper_m4_e2e_common.py:37 — from click.testing import Result as CliResult works because Typer wraps Click, but from typer.testing import Result would be clearer.

Self-Review Issue Verification (All 10 Fixed)

#	Issue	Status
1	Test tautology in `cli_plan_execute()`	Fixed (lines 249-256 explain removal)
2	Unguarded `call_args[0][0]` in `plan_diff()`	Fixed (kwargs fallback at lines 94-99)
3	File exceeds 500-line limit	Fixed (4-way split: 237+426+494+81)
4	`AttributeError` risk on `project_links`	Fixed (`isinstance` guard at line 180)
5	`assert_called_once*()` bypasses `_fail()`	Fixed (wrapper helpers in common)
6	Asymmetric positional/keyword fallback	Fixed (symmetric at lines 164-175)
7	Overly broad "execute" string match	Fixed (word-boundary regex at line 246)
8	DRY violations	Fixed (factories + helpers extracted)
9	Non-deterministic `datetime.now()`	Fixed (`_FROZEN_NOW` constant)
10	Generic ULID naming	Fixed (descriptive names at lines 54-58)

M4 Acceptance Criteria Coverage

#	Criterion	Coverage
1	`plan use` creates plan	Covered (`cli_plan_use`)
2	`plan execute` spawns subplans	Covered (`cli_plan_execute` + `spawn_subplans`)
3	`plan tree` displays hierarchy	Covered (`cli_plan_tree`); domain test is tautological (M1)
4	`plan diff` shows merged results	Covered (`plan_diff`)
5	Parallel max_parallel	Covered (`parallel_max`)
6	Three-way merge non-conflicting	Covered (`merge_clean`)
7	Merge conflicts surfaced	Covered (`merge_conflict`)
8	Parent tracks statuses	Covered (`parent_tracking`)

Security & Compliance

No security vulnerabilities. All decision types and plan lifecycle phases align with docs/specification.md. PR description includes Closes #495. Milestone and label correctly assigned. Type safety is clean (zero # type: ignore). Commit message follows Conventional Changelog.

## Code Review: PR #681 — `test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure` **Reviewer:** @hurui200320 *(posted as comment — Forgejo does not allow self-reviews)* **Branch:** `test/m4-acceptance-gate` → `master` **Commit:** `ecf5087` --- ### Prior Review Items All 3 blocking and 7 non-blocking issues from the self-review (comment by @hurui200320) and the PM review (comment by @freemo) have been verified as **resolved** in the latest commit. These are not repeated here. --- ### Verdict: **REQUEST CHANGES** (3 Major, 7 Minor, 5 Nit) --- ### Major Issues #### M1. Tautological `plan_tree()` domain test **File:** `robot/helper_m4_e2e_domain.py:120-136` The test manually builds a tree dict from the `subplan_statuses` it injected into the parent plan, then asserts that dict contains those same IDs. No system-under-test function produces this tree — the test is checking its own construction. Lines 120-123: ```python tree: dict[str, list[str]] = { parent.identity.plan_id: [s.subplan_id for s in parent.subplan_statuses] } ``` Then lines 126-136 assert `_CHILD_A_ULID in tree[_ROOT_ULID]` etc., which can never fail. The CLI-level `cli_plan_tree()` test exercises real `build_decision_tree()` logic, so the M4 criterion has *some* coverage, but this domain test is misleading about what it verifies. **Recommendation:** Either call a real domain function that builds the tree structure from flat data, or rename/redocument this test to clarify it only validates `SubplanStatus` field completeness (changeset_summary, files_changed, parent linkage) rather than tree construction. #### M2. No CLI error-path tests **File:** `robot/helper_m4_e2e_cli.py` (all CLI tests) All four CLI test functions only exercise the happy path. None test: - `plan execute` on a read-only plan (should abort) - `plan use` with a nonexistent action (`ActionNotAvailableError`) - `plan diff` with a plan that has no changeset - `plan tree` with zero decisions Error paths are part of the user-facing contract. Without error-path tests, a regression could silently swallow errors or crash with unhandled exceptions. **Recommendation:** Add at least one error-path test per CLI command. #### M3. Duplicate CHANGELOG entries for #495 **File:** `CHANGELOG.md:5-17` and `CHANGELOG.md:205-211` Issue #495 has **two** changelog entries: 1. Lines 5-17 (under `## Unreleased`, no `###` subsection): The entry added by this PR. 2. Lines 205-211 (under `### Added`): A pre-existing entry from a previous commit. Per CONTRIBUTING.md, one changelog entry per logical change is expected. **Recommendation:** Consolidate into a single entry. Either remove the new entry (if the existing one is sufficient) or merge the content and remove the duplicate. --- ### Minor Issues #### m1. Environment-dependent merge test failure **File:** `robot/helper_m4_e2e_domain.py:278, 322` `merge_clean()` and `merge_conflict()` call `GitMergeStrategy().merge()`, which shells out to `git merge-file`. If `git` is not on `$PATH`, the fallback to `SequentialMergeStrategy` produces results that fail the assertions with misleading diagnostics (e.g., "missing ours change" instead of "git not found"). **Recommendation:** Add a pre-check: `if shutil.which("git") is None: _fail("git is required on PATH for merge tests")` #### m2. CHANGELOG entry placement without subsection **File:** `CHANGELOG.md:5-17` The new entry sits directly under `## Unreleased` without a `###` subsection header, while the rest of the changelog uses structured `### Added` / `### fix(...)` subsections. **Recommendation:** Move the entry under the appropriate `### Added` subsection. #### m3. `helper_m4_e2e_domain.py` at 494 lines — tight headroom **File:** `robot/helper_m4_e2e_domain.py` At 494 lines, the file has only 6 lines of headroom before the 500-line limit. Any future additions will immediately breach it. **Recommendation:** Consider proactively splitting the two merge functions (`merge_clean`, `merge_conflict`) into a `helper_m4_e2e_merge.py` module (~90 lines each), dropping domain to ~310 lines. #### m4. Eager imports in dispatcher negate split benefit **File:** `robot/helper_m4_e2e_verification.py:33-46` The dispatcher unconditionally imports both `helper_m4_e2e_cli` and `helper_m4_e2e_domain` at module level. When running a domain-only command (e.g., `merge-clean`), the CLI module's heavy imports (`typer.testing`, `cleveragents.cli.commands.plan`) are loaded unnecessarily. **Recommendation:** Use lazy imports inside the dispatch lookup so each command only loads its own module. Estimated savings: 50-200 ms per domain-only invocation. #### m5. `plan_diff()` does not verify `fmt` kwarg **File:** `robot/helper_m4_e2e_cli.py:93-99` The production code calls `service.diff(plan_id, fmt=fmt)`, but the test only verifies `plan_id` was passed correctly. If a regression dropped the `fmt` argument, this test would not catch it. **Recommendation:** Add an assertion checking `call_args.kwargs.get("fmt")`. #### m6. `cli_plan_execute()` does not verify `get_plan()` guard call **File:** `robot/helper_m4_e2e_cli.py:198-258` The `execute` command calls `service.get_plan(plan_id)` for the read-only guard. The test sets up the mock return but never asserts `get_plan` was actually called. **Recommendation:** Add `_assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID)`. #### m7. `sys.path.insert(0, ...)` could shadow stdlib modules **Files:** `helper_m4_e2e_common.py:17-19`, `helper_m4_e2e_cli.py:22-24`, `helper_m4_e2e_domain.py:21-23`, `helper_m4_e2e_verification.py:29-31` Using `sys.path.insert(0, _SRC)` places `src/` at the front of the search path. If a file with the same name as a stdlib module were ever placed in `src/`, it would shadow the stdlib. **Recommendation:** Use `sys.path.append(_SRC)` instead, or centralize the path setup. --- ### Nit #### n1. `parallel_max()` value readback assertions are noise `robot/helper_m4_e2e_domain.py:187-190` — Asserts `config.execution_mode == PARALLEL` immediately after constructing `SubplanConfig(execution_mode=ExecutionMode.PARALLEL)`. Can never fail. #### n2. `cli_plan_tree()` patches at module level inconsistently `robot/helper_m4_e2e_cli.py:340-343` — Other CLI tests mock private helpers (`_get_lifecycle_service`), but this one patches `get_container` at its definition site. Consider extracting a `_get_decision_service()` helper for consistency. #### n3. Duplicated `sys.path` block across 4 files All 4 modules repeat the identical 3-line `_SRC` path insertion. Since `helper_m4_e2e_common.py` is always imported first, the duplication is technically redundant (though defensible for standalone invocation). #### n4. Import ordering after `sys.path` manipulation `robot/helper_m4_e2e_cli.py:26-49` — Local helper (`helper_m4_e2e_common`) is imported before third-party (`typer.testing`). Conventional ordering is third-party then local. #### n5. `CliResult` type annotation sourced from `click.testing` vs `typer.testing` `robot/helper_m4_e2e_common.py:37` — `from click.testing import Result as CliResult` works because Typer wraps Click, but `from typer.testing import Result` would be clearer. --- ### Self-Review Issue Verification (All 10 Fixed) | # | Issue | Status | |---|-------|--------| | 1 | Test tautology in `cli_plan_execute()` | Fixed (lines 249-256 explain removal) | | 2 | Unguarded `call_args[0][0]` in `plan_diff()` | Fixed (kwargs fallback at lines 94-99) | | 3 | File exceeds 500-line limit | Fixed (4-way split: 237+426+494+81) | | 4 | `AttributeError` risk on `project_links` | Fixed (`isinstance` guard at line 180) | | 5 | `assert_called_once*()` bypasses `_fail()` | Fixed (wrapper helpers in common) | | 6 | Asymmetric positional/keyword fallback | Fixed (symmetric at lines 164-175) | | 7 | Overly broad "execute" string match | Fixed (word-boundary regex at line 246) | | 8 | DRY violations | Fixed (factories + helpers extracted) | | 9 | Non-deterministic `datetime.now()` | Fixed (`_FROZEN_NOW` constant) | | 10 | Generic ULID naming | Fixed (descriptive names at lines 54-58) | --- ### M4 Acceptance Criteria Coverage | # | Criterion | Coverage | |---|-----------|----------| | 1 | `plan use` creates plan | Covered (`cli_plan_use`) | | 2 | `plan execute` spawns subplans | Covered (`cli_plan_execute` + `spawn_subplans`) | | 3 | `plan tree` displays hierarchy | Covered (`cli_plan_tree`); domain test is tautological (M1) | | 4 | `plan diff` shows merged results | Covered (`plan_diff`) | | 5 | Parallel max_parallel | Covered (`parallel_max`) | | 6 | Three-way merge non-conflicting | Covered (`merge_clean`) | | 7 | Merge conflicts surfaced | Covered (`merge_conflict`) | | 8 | Parent tracks statuses | Covered (`parent_tracking`) | ### Security & Compliance No security vulnerabilities. All decision types and plan lifecycle phases align with `docs/specification.md`. PR description includes `Closes #495`. Milestone and label correctly assigned. Type safety is clean (zero `# type: ignore`). Commit message follows Conventional Changelog.

hurui200320 referenced this issue from a commit

2026-03-12 08:56:59 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

hurui200320 force-pushed test/m4-acceptance-gate from ecf508797a to 5b0d934a50

2026-03-12 08:56:59 +00:00

Compare

hurui200320 commented

2026-03-12 08:57:58 +00:00

Second Review Fixes Complete — All CI Green

All 3 major, 7 minor, and 5 nit issues from the second self-review have been addressed in commit 5b0d934.

Major Fixes

M1: Tautological plan_tree() domain test — Removed tautological tree dict construction (lines 120-136 of old file). The test now validates SubplanStatus field completeness (subplan_id, action_name, changeset_summary, files_changed) and parent→child PlanIdentity linkage. Tree construction logic is exercised by the CLI-level cli_plan_tree() test which invokes real build_decision_tree().
M2: No CLI error-path tests — Added 4 new error-path tests in helper_m4_e2e_cli_errors.py:
- cli_plan_execute_readonly: read-only plan → abort with "read-only" message
- cli_plan_use_not_found: ActionNotAvailableError(ARCHIVED) → abort with error
- cli_plan_diff_no_changeset: PlanError → abort with "Diff Error"
- cli_plan_tree_empty: empty decisions → "No decisions found" info message, exit 0
M3: Duplicate CHANGELOG entries — Removed the new entry at lines 5-17 (under ## Unreleased without subsection). Consolidated into the existing entry under ### Added.

Minor Fixes (m1–m7)

m1: shutil.which("git") pre-check in _assert_git_available() before merge tests
m2: CHANGELOG entry now under proper ### Added subsection
m3: Merge functions → helper_m4_e2e_merge.py (146 lines); domain dropped to 385 lines
m4: importlib.import_module() lazy dispatch — domain commands no longer load CLI dependencies
m5: plan_diff() verifies fmt kwarg forwarded correctly
m6: cli_plan_execute() verifies get_plan() guard call via _assert_mock_called_once_with
m7: sys.path.insert(0, _SRC) → sys.path.append(_SRC) in all 6 modules

Nit Fixes (n1–n5)

n1: Removed tautological parallel_max() readback assertions (config.execution_mode, config.max_parallel)
n2: Added comment in cli_plan_tree() explaining get_container patching vs _get_lifecycle_service
n3: Kept sys.path blocks for standalone invocation safety (documented as idempotent)
n4: Fixed import ordering: helper_m4_e2e_common → typer → cleveragents
n5: Added comment explaining from click.testing import Result as CliResult is canonical

File Structure

Module	Lines	Status
`helper_m4_e2e_common.py`	244	Under limit
`helper_m4_e2e_domain.py`	385	Under limit
`helper_m4_e2e_merge.py`	146	NEW
`helper_m4_e2e_cli.py`	445	Under limit
`helper_m4_e2e_cli_errors.py`	182	NEW
`helper_m4_e2e_verification.py`	105	Under limit

CI Status

All 11 nox sessions passed (32m total):

lint, format, typecheck (0 errors), security_scan, dead_code
unit_tests (0 failures), integration_tests (14/14 M4 E2E tests pass)
coverage (98%, threshold 97%), docs, build, benchmark

Ready for external review.

## Second Review Fixes Complete — All CI Green All 3 major, 7 minor, and 5 nit issues from the second self-review have been addressed in commit `5b0d934`. ### Major Fixes 1. **M1: Tautological `plan_tree()` domain test** — Removed tautological tree dict construction (lines 120-136 of old file). The test now validates SubplanStatus field completeness (subplan_id, action_name, changeset_summary, files_changed) and parent→child `PlanIdentity` linkage. Tree *construction* logic is exercised by the CLI-level `cli_plan_tree()` test which invokes real `build_decision_tree()`. 2. **M2: No CLI error-path tests** — Added 4 new error-path tests in `helper_m4_e2e_cli_errors.py`: - `cli_plan_execute_readonly`: read-only plan → abort with "read-only" message - `cli_plan_use_not_found`: `ActionNotAvailableError(ARCHIVED)` → abort with error - `cli_plan_diff_no_changeset`: `PlanError` → abort with "Diff Error" - `cli_plan_tree_empty`: empty decisions → "No decisions found" info message, exit 0 3. **M3: Duplicate CHANGELOG entries** — Removed the new entry at lines 5-17 (under `## Unreleased` without subsection). Consolidated into the existing entry under `### Added`. ### Minor Fixes (m1–m7) 4. **m1**: `shutil.which("git")` pre-check in `_assert_git_available()` before merge tests 5. **m2**: CHANGELOG entry now under proper `### Added` subsection 6. **m3**: Merge functions → `helper_m4_e2e_merge.py` (146 lines); domain dropped to 385 lines 7. **m4**: `importlib.import_module()` lazy dispatch — domain commands no longer load CLI dependencies 8. **m5**: `plan_diff()` verifies `fmt` kwarg forwarded correctly 9. **m6**: `cli_plan_execute()` verifies `get_plan()` guard call via `_assert_mock_called_once_with` 10. **m7**: `sys.path.insert(0, _SRC)` → `sys.path.append(_SRC)` in all 6 modules ### Nit Fixes (n1–n5) 11. **n1**: Removed tautological `parallel_max()` readback assertions (config.execution_mode, config.max_parallel) 12. **n2**: Added comment in `cli_plan_tree()` explaining `get_container` patching vs `_get_lifecycle_service` 13. **n3**: Kept `sys.path` blocks for standalone invocation safety (documented as idempotent) 14. **n4**: Fixed import ordering: `helper_m4_e2e_common` → `typer` → `cleveragents` 15. **n5**: Added comment explaining `from click.testing import Result as CliResult` is canonical ### File Structure | Module | Lines | Status | |--------|-------|--------| | `helper_m4_e2e_common.py` | 244 | Under limit | | `helper_m4_e2e_domain.py` | 385 | Under limit | | `helper_m4_e2e_merge.py` | 146 | NEW | | `helper_m4_e2e_cli.py` | 445 | Under limit | | `helper_m4_e2e_cli_errors.py` | 182 | NEW | | `helper_m4_e2e_verification.py` | 105 | Under limit | ### CI Status **All 11 nox sessions passed** (32m total): - lint, format, typecheck (0 errors), security_scan, dead_code - unit_tests (0 failures), integration_tests (14/14 M4 E2E tests pass) - coverage (98%, threshold 97%), docs, build, benchmark Ready for external review.

hurui200320 referenced this pull request

2026-03-12 08:58:39 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #495

aditya commented

2026-03-12 13:19:23 +00:00

PR Review Report: #681

PR Details

Reviewer: Aditya
Branch: test/m4-acceptance-gate → master
Commit: 5b0d934a50

Scope Reviewed (branch-only changes):
- CHANGELOG.md
- robot/helper_m4_e2e_common.py
- robot/helper_m4_e2e_domain.py
- robot/helper_m4_e2e_merge.py
- robot/helper_m4_e2e_cli.py
- robot/helper_m4_e2e_cli_errors.py
- robot/helper_m4_e2e_verification.py
- robot/m4_e2e_verification.robot
- robot/helpers_common.py

Findings (Priority Ordered)

P1 - fmt forwarding check can silently pass even when `fmt` is not forwarded

File: robot/helper_m4_e2e_cli.py
Area: plan_diff()
Why this is an issue:
- The current check allows fmt_arg to remain None and still pass.
- This means a regression where production code stops forwarding fmt would not be detected, even though this test claims to validate that behavior.
Risk:
- False confidence in coverage for one of the previously identified review gaps.
- Behavior drift in CLI output format forwarding can go unnoticed.
Recommended fix:
- Make fmt assertion strict by requiring explicit presence and expected value.
- Example intent: fail when fmt_arg is None; otherwise require fmt_arg == "rich".

P3 - Error-path tests use broad message matching and may accept wrong failure mode

File: robot/helper_m4_e2e_cli_errors.py
Area: cli_plan_use_not_found() and cli_plan_diff_no_changeset()
Why this is an issue:
- Checks such as "error" in output_lower are permissive and can pass for unrelated failures.
- This weakens the guarantee that the specific expected user-facing error path is exercised.
Risk:
- Tests can remain green while command behavior regresses to a different error path.
Recommended fix:
- Assert for command-specific message fragments (e.g., "Action not available" for use, "Diff Error" or "has no ChangeSet" for diff).
- Keep a fallback alternative only if necessary for formatting differences.

Compliance Notes

CONTRIBUTING.md 500-line file rule: compliant for all newly introduced/split helper modules.
CONTRIBUTING.md changelog update requirement: compliant (CHANGELOG.md updated in this branch).
docs/specification.md alignment for M4 command coverage (plan use/execute/tree/diff and error paths): generally aligned, with the above robustness caveats in test assertions.

Final Assessment

The branch significantly improves structure, readability, and test coverage (including CLI error paths).
One meaningful assertion gap remains (fmt forwarding check), plus one low-severity robustness gap in error-message assertions.

# PR Review Report: #681 ## PR Details **Reviewer:** Aditya **Branch:** `test/m4-acceptance-gate` → `master` **Commit:** `5b0d934a50` - **Scope Reviewed (branch-only changes)**: - `CHANGELOG.md` - `robot/helper_m4_e2e_common.py` - `robot/helper_m4_e2e_domain.py` - `robot/helper_m4_e2e_merge.py` - `robot/helper_m4_e2e_cli.py` - `robot/helper_m4_e2e_cli_errors.py` - `robot/helper_m4_e2e_verification.py` - `robot/m4_e2e_verification.robot` - `robot/helpers_common.py` ## Findings (Priority Ordered) ### P1 - fmt forwarding check can silently pass even when `fmt` is not forwarded - **File**: `robot/helper_m4_e2e_cli.py` - **Area**: `plan_diff()` - **Why this is an issue**: - The current check allows `fmt_arg` to remain `None` and still pass. - This means a regression where production code stops forwarding `fmt` would not be detected, even though this test claims to validate that behavior. - **Risk**: - False confidence in coverage for one of the previously identified review gaps. - Behavior drift in CLI output format forwarding can go unnoticed. - **Recommended fix**: - Make `fmt` assertion strict by requiring explicit presence and expected value. - Example intent: fail when `fmt_arg is None`; otherwise require `fmt_arg == "rich"`. ### P3 - Error-path tests use broad message matching and may accept wrong failure mode - **File**: `robot/helper_m4_e2e_cli_errors.py` - **Area**: `cli_plan_use_not_found()` and `cli_plan_diff_no_changeset()` - **Why this is an issue**: - Checks such as `"error" in output_lower` are permissive and can pass for unrelated failures. - This weakens the guarantee that the specific expected user-facing error path is exercised. - **Risk**: - Tests can remain green while command behavior regresses to a different error path. - **Recommended fix**: - Assert for command-specific message fragments (e.g., `"Action not available"` for use, `"Diff Error"` or `"has no ChangeSet"` for diff). - Keep a fallback alternative only if necessary for formatting differences. ## Compliance Notes - `CONTRIBUTING.md` 500-line file rule: **compliant** for all newly introduced/split helper modules. - `CONTRIBUTING.md` changelog update requirement: **compliant** (`CHANGELOG.md` updated in this branch). - `docs/specification.md` alignment for M4 command coverage (`plan use/execute/tree/diff` and error paths): **generally aligned**, with the above robustness caveats in test assertions. ## Final Assessment - The branch significantly improves structure, readability, and test coverage (including CLI error paths). - One meaningful assertion gap remains (`fmt` forwarding check), plus one low-severity robustness gap in error-message assertions.

freemo added the

labels 2026-03-12 20:43:49 +00:00

freemo approved these changes 2026-03-12 20:44:09 +00:00

Dismissed

freemo left a comment

PM Review — Day 32

Verdict: APPROVED

Summary

Excellent M4 acceptance criteria validation PR from @hurui200320. This is exactly the kind of thorough E2E verification needed to close milestone v3.3.0 (M4).

Process Compliance

PR body: Comprehensive — summary, changes table, file split rationale, quality gates, acceptance criteria checklist. Exceeds template requirements.
Closes keyword: Closes #495 — correct.
Labels: Added missing labels (Priority/High, MoSCoW/Must have, Points/8, State/In Progress).
Milestone: v3.3.0 — correct, this is the M4 acceptance gate.
Assignee: @hurui200320 — correct.

Technical Assessment (from PR body)

14 E2E tests (10 happy-path + 4 error-path) covering all 8 M4 acceptance criteria.
File split from 1074-line monolith to 6 focused modules (all under 500-line limit per CONTRIBUTING.md).
All 11 nox quality gates pass including 98% coverage.
Addressed 3 blocking + 7 non-blocking + 5 nit findings from self-review.

Recommendation

@freemo: Please merge at your earliest convenience. This PR gates M4 milestone closure and is already 10 days overdue (M4 due Mar 2). Mergeable, no conflicts.

Labels Added

Priority/High, MoSCoW/Must have, Points/8, State/In Progress

## PM Review — Day 32 ### Verdict: APPROVED ### Summary Excellent M4 acceptance criteria validation PR from @hurui200320. This is exactly the kind of thorough E2E verification needed to close milestone v3.3.0 (M4). ### Process Compliance - **PR body**: Comprehensive — summary, changes table, file split rationale, quality gates, acceptance criteria checklist. Exceeds template requirements. - **Closes keyword**: `Closes #495` — correct. - **Labels**: Added missing labels (Priority/High, MoSCoW/Must have, Points/8, State/In Progress). - **Milestone**: v3.3.0 — correct, this is the M4 acceptance gate. - **Assignee**: @hurui200320 — correct. ### Technical Assessment (from PR body) - 14 E2E tests (10 happy-path + 4 error-path) covering all 8 M4 acceptance criteria. - File split from 1074-line monolith to 6 focused modules (all under 500-line limit per CONTRIBUTING.md). - All 11 nox quality gates pass including 98% coverage. - Addressed 3 blocking + 7 non-blocking + 5 nit findings from self-review. ### Recommendation **@freemo**: Please merge at your earliest convenience. This PR gates M4 milestone closure and is already 10 days overdue (M4 due Mar 2). Mergeable, no conflicts. ### Labels Added - Priority/High, MoSCoW/Must have, Points/8, State/In Progress

freemo referenced this pull request

2026-03-12 20:49:20 +00:00

feat(a2a): implement JSON-RPC 2.0 wire format and method routing #708

hurui200320 force-pushed test/m4-acceptance-gate from 5b0d934a50 to 766fa74a94

2026-03-13 05:59:30 +00:00

Compare

hurui200320 dismissed freemo's review 2026-03-13 05:59:30 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hurui200320 referenced this issue from a commit

2026-03-13 05:59:30 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

hurui200320 commented

2026-03-13 05:59:45 +00:00

Review Fixes Applied — Aditya's Findings Addressed

@aditya Both findings from your review have been addressed in the force-pushed commit 766fa74a. Branch has been rebased onto latest master.

P1: `fmt` forwarding check made strict

File: robot/helper_m4_e2e_cli.py, plan_diff()

The fmt assertion was previously permissive — fmt_arg could remain None and the test would still pass. Fixed by:

Failing explicitly when fmt_arg is None (regression where fmt is not forwarded is now detected)
Requiring fmt_arg == "rich" (the CLI default)

P3: Error-path tests use command-specific message fragments

File: robot/helper_m4_e2e_cli_errors.py

cli_plan_use_not_found(): Changed from "not available" or "error" to asserting "action not available" — matches the production CLI output "[red]Action not available:[/red] {e}"
cli_plan_diff_no_changeset(): Changed from "error" or "changeset" to asserting "diff error" or "has no changeset" — matches the production CLI output "[red]Diff Error:[/red] {e.message}"

Additional fix: DI container mock alignment

Files: robot/helper_m4_e2e_cli.py (cli_plan_tree()), robot/helper_m4_e2e_cli_errors.py (cli_plan_tree_empty())

The production tree command now calls container.decision_service() instead of container.resolve(). Updated both tree tests to mock decision_service() to match the current production API.

Quality Gates

Stage	Result
lint	PASS
typecheck	PASS (0 errors)
unit_tests	PASS (10,700 scenarios, 0 failures)
integration_tests	PASS (1,502 tests, 0 failures)
coverage_report	PASS (98%, threshold 97%)

## Review Fixes Applied — Aditya's Findings Addressed @aditya Both findings from your review have been addressed in the force-pushed commit `766fa74a`. Branch has been rebased onto latest master. ### P1: `fmt` forwarding check made strict **File:** `robot/helper_m4_e2e_cli.py`, `plan_diff()` The `fmt` assertion was previously permissive — `fmt_arg` could remain `None` and the test would still pass. Fixed by: - Failing explicitly when `fmt_arg is None` (regression where `fmt` is not forwarded is now detected) - Requiring `fmt_arg == "rich"` (the CLI default) ### P3: Error-path tests use command-specific message fragments **File:** `robot/helper_m4_e2e_cli_errors.py` - `cli_plan_use_not_found()`: Changed from `"not available" or "error"` to asserting `"action not available"` — matches the production CLI output `"[red]Action not available:[/red] {e}"` - `cli_plan_diff_no_changeset()`: Changed from `"error" or "changeset"` to asserting `"diff error" or "has no changeset"` — matches the production CLI output `"[red]Diff Error:[/red] {e.message}"` ### Additional fix: DI container mock alignment **Files:** `robot/helper_m4_e2e_cli.py` (`cli_plan_tree()`), `robot/helper_m4_e2e_cli_errors.py` (`cli_plan_tree_empty()`) The production `tree` command now calls `container.decision_service()` instead of `container.resolve()`. Updated both tree tests to mock `decision_service()` to match the current production API. ### Quality Gates | Stage | Result | |-------|--------| | lint | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (10,700 scenarios, 0 failures) | | integration_tests | PASS (1,502 tests, 0 failures) | | coverage_report | PASS (98%, threshold 97%) |

hurui200320 force-pushed test/m4-acceptance-gate from 766fa74a94 to 317d015260

2026-03-13 06:03:46 +00:00

Compare

hurui200320 referenced this issue from a commit

2026-03-13 06:03:46 +00:00

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-13 06:04:10 +00:00

hurui200320 merged commit 317d015260 into master

2026-03-13 06:10:23 +00:00

hurui200320 deleted branch test/m4-acceptance-gate

2026-03-13 06:10:24 +00:00

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#681

test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #681

Summary

Changes

File Split (CONTRIBUTING.md 500-line compliance)

Major Fixes (M1–M3)

Minor Fixes (m1–m7)

Nit Fixes (n1–n5)

CHANGELOG

M4 Acceptance Criteria Verification

Quality Gates

Code Review: PR #681 — M4 E2E Acceptance Test Strengthening

Overall Assessment

Blocking Issues

1. Test Tautology — cli_plan_execute() post-plan assertions verify mock fixture, not system behavior

2. Unguarded positional arg access in plan_diff() — potential IndexError

3. File exceeds 500-line limit (1074 lines, 2× over)

Non-Blocking Recommendations

4. Unhandled AttributeError risk — robot/helper_m4_e2e_verification.py:784

5. assert_called_once*() bypasses _fail() pattern — lines 340, 767, 772, 835, 965

6. Asymmetric positional/keyword fallback — lines 774–786

7. Overly broad "execute" string match — line 846

8. DRY violations

9. Non-deterministic datetime.now() in fixtures (lines 88, 127, 154, 402, 594)

10. Generic _DECISION_ULID_N naming (lines 868–872)

M4 Acceptance Criteria Coverage

C1–C4 Gap Closure

Security & Compliance

PM Review — Day 31 (Specification Update)

Spec Alignment Check

Status

Action Required

Review Fixes Complete — All CI Green

Blocking Fixes

Non-Blocking Fixes

CI Status

Code Review: PR #681 — test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure

Prior Review Items

Verdict: REQUEST CHANGES (3 Major, 7 Minor, 5 Nit)

Major Issues

M1. Tautological plan_tree() domain test

M2. No CLI error-path tests

M3. Duplicate CHANGELOG entries for #495

Minor Issues

m1. Environment-dependent merge test failure

m2. CHANGELOG entry placement without subsection

m3. helper_m4_e2e_domain.py at 494 lines — tight headroom

m4. Eager imports in dispatcher negate split benefit

m5. plan_diff() does not verify fmt kwarg

m6. cli_plan_execute() does not verify get_plan() guard call

m7. sys.path.insert(0, ...) could shadow stdlib modules

Nit

n1. parallel_max() value readback assertions are noise

n2. cli_plan_tree() patches at module level inconsistently

n3. Duplicated sys.path block across 4 files

n4. Import ordering after sys.path manipulation

n5. CliResult type annotation sourced from click.testing vs typer.testing

Self-Review Issue Verification (All 10 Fixed)

M4 Acceptance Criteria Coverage

Security & Compliance

Second Review Fixes Complete — All CI Green

Major Fixes

Minor Fixes (m1–m7)

Nit Fixes (n1–n5)

File Structure

CI Status

PR Review Report: #681

PR Details

Findings (Priority Ordered)

P1 - fmt forwarding check can silently pass even when fmt is not forwarded

P3 - Error-path tests use broad message matching and may accept wrong failure mode

Compliance Notes

Final Assessment

PM Review — Day 32

Verdict: APPROVED

Summary

Process Compliance

Technical Assessment (from PR body)

Recommendation

Labels Added

Review Fixes Applied — Aditya's Findings Addressed

1. Test Tautology — `cli_plan_execute()` post-plan assertions verify mock fixture, not system behavior

2. Unguarded positional arg access in `plan_diff()` — potential `IndexError`

4. Unhandled `AttributeError` risk — `robot/helper_m4_e2e_verification.py:784`

5. `assert_called_once*()` bypasses `_fail()` pattern — lines 340, 767, 772, 835, 965

9. Non-deterministic `datetime.now()` in fixtures (lines 88, 127, 154, 402, 594)

10. Generic `_DECISION_ULID_N` naming (lines 868–872)

Code Review: PR #681 — `test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure`

M1. Tautological `plan_tree()` domain test

m3. `helper_m4_e2e_domain.py` at 494 lines — tight headroom

m5. `plan_diff()` does not verify `fmt` kwarg

m6. `cli_plan_execute()` does not verify `get_plan()` guard call

m7. `sys.path.insert(0, ...)` could shadow stdlib modules

n1. `parallel_max()` value readback assertions are noise

n2. `cli_plan_tree()` patches at module level inconsistently

n3. Duplicated `sys.path` block across 4 files

n4. Import ordering after `sys.path` manipulation

n5. `CliResult` type annotation sourced from `click.testing` vs `typer.testing`

P1 - fmt forwarding check can silently pass even when `fmt` is not forwarded

P1: `fmt` forwarding check made strict