test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closure #681
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!681
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "test/m4-acceptance-gate"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Re-verifies all M4 acceptance criteria against the current master branch and closes CLI test coverage gaps identified in code review. Addresses all 3 blocking, 7 non-blocking, and 5 nit issues from the second self-review.
Changes
File Split (CONTRIBUTING.md 500-line compliance)
Split the 1074-line
helper_m4_e2e_verification.pyinto six focused modules:helper_m4_e2e_common.py_fail(), mock assertion wrappers,_make_subplan_statusfactory, plan mock factories, frozen timestamphelper_m4_e2e_domain.pyhelper_m4_e2e_merge.pyshutil.which("git")pre-check)helper_m4_e2e_cli.pyhelper_m4_e2e_cli_errors.pyhelper_m4_e2e_verification.pyimportlibimports (entry point unchanged for Robot Framework)Major Fixes (M1–M3)
plan_tree()domain test built a tree dict from its own injected dataexecuteon read-only plan,usewithActionNotAvailableError,diffwithPlanError(no changeset),treewith zero decisions### AddedMinor Fixes (m1–m7)
gitnot on$PATH_assert_git_available()pre-check usingshutil.which("git")### Addedentryhelper_m4_e2e_domain.pyat 494 lines — tight headroomhelper_m4_e2e_merge.py; domain dropped to 385 linesimportlib.import_module()lazy dispatchplan_diff()does not verifyfmtkwargcall_args.kwargs.get("fmt")cli_plan_execute()does not verifyget_plan()guard call_assert_mock_called_once_with(mock_service.get_plan, ...)sys.path.insert(0, ...)could shadow stdlib modulessys.path.append(_SRC)in all modulesNit Fixes (n1–n5)
parallel_max()value readback assertions are noisecli_plan_tree()patches at module level inconsistentlysys.pathblock across modulessys.pathmanipulationhelper_m4_e2e_common,typer) before first-party (cleveragents)CliResulttype annotation sourced fromclick.testingvstyper.testingclick.testing.Resultis canonical since Typer re-exportsCliRunnerbut notResultCHANGELOG
Consolidated into a single
### Addedentry documenting the complete set of review-fix improvements.M4 Acceptance Criteria Verification
All 14 M4 E2E verification tests pass (10 happy-path + 4 error-path):
plan usecreates plancli-plan-use+cli-plan-use-not-found(error)plan executespawns subplanscli-plan-execute+cli-plan-execute-readonly(error)plan treedisplays treecli-plan-tree+cli-plan-tree-empty(error)plan diffshows merged resultsplan-diff+cli-plan-diff-no-changeset(error)parallel-maxmerge-cleanmerge-conflictparent-trackingQuality Gates
Closes #495
Code Review: PR #681 — M4 E2E Acceptance Test Strengthening
Self-review notes by @hurui200320 (posted as comment since Forgejo does not allow self-reviews)
Overall Assessment
The PR closes 3 of the 4 review-identified gaps (C1, C3, C4) with well-designed assertions. C2 is partially addressed. Quality gates pass. Specification compliance and security posture are sound. However, several issues should be addressed before requesting peer review.
Verdict: NEEDS CHANGES (3 blocking, 7 non-blocking)
Blocking Issues
1. Test Tautology —
cli_plan_execute()post-plan assertions verify mock fixture, not system behaviorFile:
robot/helper_m4_e2e_verification.py:849-858| Severity: MajorThe assertions at lines 849–858 check
post_plan.phase,post_plan.subplan_statuses, andpost_plan.has_subplansoutside thewith patch(...)block (which ends at line 848). These verify thePlanobject the test itself constructed at lines 815–819 — they can never fail. The C2 gap ("cli-plan-execute didn't verify subplan info") is not genuinely closed at the CLI output level.The implementation notes acknowledge the CLI does not render
subplan_countin its output. If this is a known CLI limitation, the tautological assertions should either be:2. Unguarded positional arg access in
plan_diff()— potentialIndexErrorFile:
robot/helper_m4_e2e_verification.py:342| Severity: MajorIf the production CLI calls
diff(plan_id=_ROOT_ULID)as a keyword argument,call_args[0]is()andcall_args[0][0]raises an unhandledIndexError. Use the kwargs-with-positional-fallback pattern:3. File exceeds 500-line limit (1074 lines, 2× over)
File:
robot/helper_m4_e2e_verification.py| Severity: Major (CONTRIBUTING.md violation)CONTRIBUTING.md: "Keep files under 500 lines." While pre-existing, this PR widens the gap by 84 net lines. Natural split point: domain model tests vs. CLI integration tests →
helper_m4_e2e_domain.py+helper_m4_e2e_cli.py+ sharedhelper_m4_e2e_common.py.Non-Blocking Recommendations
4. Unhandled
AttributeErrorrisk —robot/helper_m4_e2e_verification.py:784If
project_linkscontains non-ProjectLinkobjects,pl.project_nameraises an unhandledAttributeError. Add type guarding:5.
assert_called_once*()bypasses_fail()pattern — lines 340, 767, 772, 835, 965These raise raw
AssertionErroron failure instead of the standardizedFAIL:messages. Consider wrapping in try/except.6. Asymmetric positional/keyword fallback — lines 774–786
action_namehas kwargs + positional fallback;project_linksonly checks kwargs. A refactor passingproject_linkspositionally would cause a misleading test failure.7. Overly broad "execute" string match — line 846
"execute" not in output_lowermatches the word anywhere in the output. A more targeted check would be more robust.8. DRY violations
SubplanStatus construction repeated 3× (lines 152–184, 402–424, 597–637). CLI exit-code checks repeated 4× (lines 336, 759, 827, 957). Extract parameterized factories.
9. Non-deterministic
datetime.now()in fixtures (lines 88, 127, 154, 402, 594)Multiple independent calls produce subtly inconsistent timestamps. Use a frozen constant.
10. Generic
_DECISION_ULID_Nnaming (lines 868–872)Rename to
_DECISION_PROMPT_DEF,_DECISION_STRATEGY, etc. for self-documenting assertions.M4 Acceptance Criteria Coverage
plan usecreates planplan executespawns subplansplan treedisplays hierarchyplan diffshows merged resultsC1–C4 Gap Closure
Security & Compliance
No security vulnerabilities. All decision types and plan lifecycle phases align with
docs/specification.md. PR description includesCloses #495. Milestone and label correctly assigned.PM Review — Day 31 (Specification Update)
This PR is mergeable.
Spec Alignment Check
M4 E2E acceptance tests are NOT directly impacted by protocol or TUI changes. However, note that the broader E2E test mocking issue (#658) may affect how these tests are structured long-term.
Status
Self-review by @hurui200320 identified 3 blocking issues:
Action Required
@hurui200320 — Address the 3 blocking issues from your self-review, then request external review.
97c5dfbcd12e8ea557602e8ea55760ecf508797aReview Fixes Complete — All CI Green
@freemo All 3 blocking and 7 non-blocking issues from the self-review have been addressed in commit
ecf5087.Blocking Fixes
cli_plan_execute) — removed tautological domain assertions; replaced with explanatory comment. Subplan coverage is provided by the dedicatedspawn-subplansandparent-trackingdomain tests.plan_diff) — replacedcall_args[0][0]withkwargs.get("plan_id")+ positional fallback pattern._common.py(237),_domain.py(495),_cli.py(426), dispatcher (81). All under 500 lines.Non-Blocking Fixes
isinstance(pl, ProjectLink)type guard addedassert_called_once*()wrapped in_fail()-pattern helpersproject_linksextraction made symmetric withaction_name"execute" in output_lower→re.search(r"\bexecute\b", ...)word-boundary regex_make_subplan_status()factory and_assert_exit_code()helper extracted (DRY)datetime.now()→_FROZEN_NOW = datetime(2026, 3, 1, 12, 0, 0)_DECISION_ULID_N→ descriptive names (_DECISION_PROMPT_DEF,_DECISION_STRATEGY, etc.)CI Status
All 11 stages passed (run #6171, 36m43s):
Ready for external review.
Code Review: PR #681 —
test(e2e): validate M4 acceptance criteria for v3.3.0 milestone closureReviewer: @hurui200320 (posted as comment — Forgejo does not allow self-reviews)
Branch:
test/m4-acceptance-gate→masterCommit:
ecf5087Prior Review Items
All 3 blocking and 7 non-blocking issues from the self-review (comment by @hurui200320) and the PM review (comment by @freemo) have been verified as resolved in the latest commit. These are not repeated here.
Verdict: REQUEST CHANGES (3 Major, 7 Minor, 5 Nit)
Major Issues
M1. Tautological
plan_tree()domain testFile:
robot/helper_m4_e2e_domain.py:120-136The test manually builds a tree dict from the
subplan_statusesit injected into the parent plan, then asserts that dict contains those same IDs. No system-under-test function produces this tree — the test is checking its own construction. Lines 120-123:Then lines 126-136 assert
_CHILD_A_ULID in tree[_ROOT_ULID]etc., which can never fail.The CLI-level
cli_plan_tree()test exercises realbuild_decision_tree()logic, so the M4 criterion has some coverage, but this domain test is misleading about what it verifies.Recommendation: Either call a real domain function that builds the tree structure from flat data, or rename/redocument this test to clarify it only validates
SubplanStatusfield completeness (changeset_summary, files_changed, parent linkage) rather than tree construction.M2. No CLI error-path tests
File:
robot/helper_m4_e2e_cli.py(all CLI tests)All four CLI test functions only exercise the happy path. None test:
plan executeon a read-only plan (should abort)plan usewith a nonexistent action (ActionNotAvailableError)plan diffwith a plan that has no changesetplan treewith zero decisionsError paths are part of the user-facing contract. Without error-path tests, a regression could silently swallow errors or crash with unhandled exceptions.
Recommendation: Add at least one error-path test per CLI command.
M3. Duplicate CHANGELOG entries for #495
File:
CHANGELOG.md:5-17andCHANGELOG.md:205-211Issue #495 has two changelog entries:
## Unreleased, no###subsection): The entry added by this PR.### Added): A pre-existing entry from a previous commit.Per CONTRIBUTING.md, one changelog entry per logical change is expected.
Recommendation: Consolidate into a single entry. Either remove the new entry (if the existing one is sufficient) or merge the content and remove the duplicate.
Minor Issues
m1. Environment-dependent merge test failure
File:
robot/helper_m4_e2e_domain.py:278, 322merge_clean()andmerge_conflict()callGitMergeStrategy().merge(), which shells out togit merge-file. Ifgitis not on$PATH, the fallback toSequentialMergeStrategyproduces results that fail the assertions with misleading diagnostics (e.g., "missing ours change" instead of "git not found").Recommendation: Add a pre-check:
if shutil.which("git") is None: _fail("git is required on PATH for merge tests")m2. CHANGELOG entry placement without subsection
File:
CHANGELOG.md:5-17The new entry sits directly under
## Unreleasedwithout a###subsection header, while the rest of the changelog uses structured### Added/### fix(...)subsections.Recommendation: Move the entry under the appropriate
### Addedsubsection.m3.
helper_m4_e2e_domain.pyat 494 lines — tight headroomFile:
robot/helper_m4_e2e_domain.pyAt 494 lines, the file has only 6 lines of headroom before the 500-line limit. Any future additions will immediately breach it.
Recommendation: Consider proactively splitting the two merge functions (
merge_clean,merge_conflict) into ahelper_m4_e2e_merge.pymodule (~90 lines each), dropping domain to ~310 lines.m4. Eager imports in dispatcher negate split benefit
File:
robot/helper_m4_e2e_verification.py:33-46The dispatcher unconditionally imports both
helper_m4_e2e_cliandhelper_m4_e2e_domainat module level. When running a domain-only command (e.g.,merge-clean), the CLI module's heavy imports (typer.testing,cleveragents.cli.commands.plan) are loaded unnecessarily.Recommendation: Use lazy imports inside the dispatch lookup so each command only loads its own module. Estimated savings: 50-200 ms per domain-only invocation.
m5.
plan_diff()does not verifyfmtkwargFile:
robot/helper_m4_e2e_cli.py:93-99The production code calls
service.diff(plan_id, fmt=fmt), but the test only verifiesplan_idwas passed correctly. If a regression dropped thefmtargument, this test would not catch it.Recommendation: Add an assertion checking
call_args.kwargs.get("fmt").m6.
cli_plan_execute()does not verifyget_plan()guard callFile:
robot/helper_m4_e2e_cli.py:198-258The
executecommand callsservice.get_plan(plan_id)for the read-only guard. The test sets up the mock return but never assertsget_planwas actually called.Recommendation: Add
_assert_mock_called_once_with(mock_service.get_plan, "get_plan", _ROOT_ULID).m7.
sys.path.insert(0, ...)could shadow stdlib modulesFiles:
helper_m4_e2e_common.py:17-19,helper_m4_e2e_cli.py:22-24,helper_m4_e2e_domain.py:21-23,helper_m4_e2e_verification.py:29-31Using
sys.path.insert(0, _SRC)placessrc/at the front of the search path. If a file with the same name as a stdlib module were ever placed insrc/, it would shadow the stdlib.Recommendation: Use
sys.path.append(_SRC)instead, or centralize the path setup.Nit
n1.
parallel_max()value readback assertions are noiserobot/helper_m4_e2e_domain.py:187-190— Assertsconfig.execution_mode == PARALLELimmediately after constructingSubplanConfig(execution_mode=ExecutionMode.PARALLEL). Can never fail.n2.
cli_plan_tree()patches at module level inconsistentlyrobot/helper_m4_e2e_cli.py:340-343— Other CLI tests mock private helpers (_get_lifecycle_service), but this one patchesget_containerat its definition site. Consider extracting a_get_decision_service()helper for consistency.n3. Duplicated
sys.pathblock across 4 filesAll 4 modules repeat the identical 3-line
_SRCpath insertion. Sincehelper_m4_e2e_common.pyis always imported first, the duplication is technically redundant (though defensible for standalone invocation).n4. Import ordering after
sys.pathmanipulationrobot/helper_m4_e2e_cli.py:26-49— Local helper (helper_m4_e2e_common) is imported before third-party (typer.testing). Conventional ordering is third-party then local.n5.
CliResulttype annotation sourced fromclick.testingvstyper.testingrobot/helper_m4_e2e_common.py:37—from click.testing import Result as CliResultworks because Typer wraps Click, butfrom typer.testing import Resultwould be clearer.Self-Review Issue Verification (All 10 Fixed)
cli_plan_execute()call_args[0][0]inplan_diff()AttributeErrorrisk onproject_linksisinstanceguard at line 180)assert_called_once*()bypasses_fail()datetime.now()_FROZEN_NOWconstant)M4 Acceptance Criteria Coverage
plan usecreates plancli_plan_use)plan executespawns subplanscli_plan_execute+spawn_subplans)plan treedisplays hierarchycli_plan_tree); domain test is tautological (M1)plan diffshows merged resultsplan_diff)parallel_max)merge_clean)merge_conflict)parent_tracking)Security & Compliance
No security vulnerabilities. All decision types and plan lifecycle phases align with
docs/specification.md. PR description includesCloses #495. Milestone and label correctly assigned. Type safety is clean (zero# type: ignore). Commit message follows Conventional Changelog.ecf508797a5b0d934a50Second Review Fixes Complete — All CI Green
All 3 major, 7 minor, and 5 nit issues from the second self-review have been addressed in commit
5b0d934.Major Fixes
M1: Tautological
plan_tree()domain test — Removed tautological tree dict construction (lines 120-136 of old file). The test now validates SubplanStatus field completeness (subplan_id, action_name, changeset_summary, files_changed) and parent→childPlanIdentitylinkage. Tree construction logic is exercised by the CLI-levelcli_plan_tree()test which invokes realbuild_decision_tree().M2: No CLI error-path tests — Added 4 new error-path tests in
helper_m4_e2e_cli_errors.py:cli_plan_execute_readonly: read-only plan → abort with "read-only" messagecli_plan_use_not_found:ActionNotAvailableError(ARCHIVED)→ abort with errorcli_plan_diff_no_changeset:PlanError→ abort with "Diff Error"cli_plan_tree_empty: empty decisions → "No decisions found" info message, exit 0M3: Duplicate CHANGELOG entries — Removed the new entry at lines 5-17 (under
## Unreleasedwithout subsection). Consolidated into the existing entry under### Added.Minor Fixes (m1–m7)
shutil.which("git")pre-check in_assert_git_available()before merge tests### Addedsubsectionhelper_m4_e2e_merge.py(146 lines); domain dropped to 385 linesimportlib.import_module()lazy dispatch — domain commands no longer load CLI dependenciesplan_diff()verifiesfmtkwarg forwarded correctlycli_plan_execute()verifiesget_plan()guard call via_assert_mock_called_once_withsys.path.insert(0, _SRC)→sys.path.append(_SRC)in all 6 modulesNit Fixes (n1–n5)
parallel_max()readback assertions (config.execution_mode, config.max_parallel)cli_plan_tree()explainingget_containerpatching vs_get_lifecycle_servicesys.pathblocks for standalone invocation safety (documented as idempotent)helper_m4_e2e_common→typer→cleveragentsfrom click.testing import Result as CliResultis canonicalFile Structure
helper_m4_e2e_common.pyhelper_m4_e2e_domain.pyhelper_m4_e2e_merge.pyhelper_m4_e2e_cli.pyhelper_m4_e2e_cli_errors.pyhelper_m4_e2e_verification.pyCI Status
All 11 nox sessions passed (32m total):
Ready for external review.
PR Review Report: #681
PR Details
Reviewer: Aditya
Branch:
test/m4-acceptance-gate→masterCommit:
5b0d934a50CHANGELOG.mdrobot/helper_m4_e2e_common.pyrobot/helper_m4_e2e_domain.pyrobot/helper_m4_e2e_merge.pyrobot/helper_m4_e2e_cli.pyrobot/helper_m4_e2e_cli_errors.pyrobot/helper_m4_e2e_verification.pyrobot/m4_e2e_verification.robotrobot/helpers_common.pyFindings (Priority Ordered)
P1 - fmt forwarding check can silently pass even when
fmtis not forwardedrobot/helper_m4_e2e_cli.pyplan_diff()fmt_argto remainNoneand still pass.fmtwould not be detected, even though this test claims to validate that behavior.fmtassertion strict by requiring explicit presence and expected value.fmt_arg is None; otherwise requirefmt_arg == "rich".P3 - Error-path tests use broad message matching and may accept wrong failure mode
robot/helper_m4_e2e_cli_errors.pycli_plan_use_not_found()andcli_plan_diff_no_changeset()"error" in output_lowerare permissive and can pass for unrelated failures."Action not available"for use,"Diff Error"or"has no ChangeSet"for diff).Compliance Notes
CONTRIBUTING.md500-line file rule: compliant for all newly introduced/split helper modules.CONTRIBUTING.mdchangelog update requirement: compliant (CHANGELOG.mdupdated in this branch).docs/specification.mdalignment for M4 command coverage (plan use/execute/tree/diffand error paths): generally aligned, with the above robustness caveats in test assertions.Final Assessment
fmtforwarding check), plus one low-severity robustness gap in error-message assertions.PM Review — Day 32
Verdict: APPROVED
Summary
Excellent M4 acceptance criteria validation PR from @hurui200320. This is exactly the kind of thorough E2E verification needed to close milestone v3.3.0 (M4).
Process Compliance
Closes #495— correct.Technical Assessment (from PR body)
Recommendation
@freemo: Please merge at your earliest convenience. This PR gates M4 milestone closure and is already 10 days overdue (M4 due Mar 2). Mergeable, no conflicts.
Labels Added
5b0d934a50766fa74a94New commits pushed, approval review dismissed automatically according to repository settings
Review Fixes Applied — Aditya's Findings Addressed
@aditya Both findings from your review have been addressed in the force-pushed commit
766fa74a. Branch has been rebased onto latest master.P1:
fmtforwarding check made strictFile:
robot/helper_m4_e2e_cli.py,plan_diff()The
fmtassertion was previously permissive —fmt_argcould remainNoneand the test would still pass. Fixed by:fmt_arg is None(regression wherefmtis not forwarded is now detected)fmt_arg == "rich"(the CLI default)P3: Error-path tests use command-specific message fragments
File:
robot/helper_m4_e2e_cli_errors.pycli_plan_use_not_found(): Changed from"not available" or "error"to asserting"action not available"— matches the production CLI output"[red]Action not available:[/red] {e}"cli_plan_diff_no_changeset(): Changed from"error" or "changeset"to asserting"diff error" or "has no changeset"— matches the production CLI output"[red]Diff Error:[/red] {e.message}"Additional fix: DI container mock alignment
Files:
robot/helper_m4_e2e_cli.py(cli_plan_tree()),robot/helper_m4_e2e_cli_errors.py(cli_plan_tree_empty())The production
treecommand now callscontainer.decision_service()instead ofcontainer.resolve(). Updated both tree tests to mockdecision_service()to match the current production API.Quality Gates
766fa74a94317d015260