test(e2e): verify M3 success criteria — decision tree and correction #439

Closed
brent.edwards wants to merge 2 commits from test/m3-e2e-verification into master
Member

Summary

Closes #404

  • Added robot/m3_e2e_verification.robot (10 test cases) covering all M3 acceptance criteria
  • Added robot/helper_m3_e2e_verification.py (10 subcommands)
  • Verifies: plan execution generates decisions, decision tree view, decision explain, invariant add/list, correction dry-run, correction live revert, context snapshots, tree persistence, revert re-execution, invariant enforcement

Local checks: lint passed, typecheck passed (0 errors)

## Summary Closes #404 - Added `robot/m3_e2e_verification.robot` (10 test cases) covering all M3 acceptance criteria - Added `robot/helper_m3_e2e_verification.py` (10 subcommands) - Verifies: plan execution generates decisions, decision tree view, decision explain, invariant add/list, correction dry-run, correction live revert, context snapshots, tree persistence, revert re-execution, invariant enforcement **Local checks**: lint passed, typecheck passed (0 errors)
test(e2e): verify M3 success criteria — decision tree and correction
Some checks failed
CI / lint (pull_request) Successful in 30s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 32s
CI / security (pull_request) Successful in 49s
CI / build (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 1m13s
CI / integration_tests (pull_request) Failing after 3m56s
CI / benchmark-regression (pull_request) Successful in 20m57s
CI / unit_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
aeb5cc5110
Robot Framework E2E test suite for M3 milestone verification covering:
- Plan execution generating decisions during Strategize phase
- Decision tree viewing with parent-child relationships and BFS traversal
- Decision explanation with full context snapshot verification
- Invariant add/list via CLI and InvariantService with scope filtering
- Dry-run correction via CorrectionService with impact analysis
- Live revert correction execution with decision re-creation
- Context snapshot round-trip serialisation assertions
- Decision tree persistence via model_dump/model_validate
- Correction revert re-execution from decision point
- Invariant enforcement during strategize with merge precedence

ISSUES CLOSED: #404
fix(test): correct patch target for CorrectionService in M3 e2e helper
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 20s
CI / build (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m2s
CI / integration_tests (pull_request) Successful in 4m16s
CI / unit_tests (pull_request) Successful in 23m40s
CI / benchmark-regression (pull_request) Successful in 25m34s
CI / docker (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 1h40m2s
2e53caaef1
The CorrectionService import in plan.py is a lazy import inside the
correct() function body, so it does not exist as a module-level attribute.
Patch the class at its definition site instead:
  cleveragents.application.services.correction_service.CorrectionService

Fixes CI integration_tests failure for PR #439.
Author
Member

Do not merge this PR individually. All changes are consolidated into PR #442 (develop-brent-5). Please review and merge #442 instead.

**Do not merge this PR individually.** All changes are consolidated into PR #442 (`develop-brent-5`). Please review and merge #442 instead.
Author
Member

Code Review — PR #439: test(e2e): verify M3 success criteria — decision tree and correction

Reviewer: @brent.edwards | Review type: Comment-only

The helper is thorough, but a couple of the "E2E" checks don’t actually exercise the CLI/persistence paths they claim to validate.


P2:should-fix — "E2E" subcommands don’t call the CLI for tree/explain

robot/helper_m3_e2e_verification.py uses in-memory Decision objects for decision-tree-view, decision-explain, and decision-tree-persistence. Those subcommands never call agents plan tree or agents plan explain, so regressions in the CLI output or persistence layer won’t be caught. Consider invoking the CLI commands (with mocked lifecycle service if needed) so the tests validate real rendering and serialization paths.


P2:should-fix — "Plan generates decisions" doesn’t validate decision recording

plan-generates-decisions runs agents plan use, but then builds decisions directly via _build_decision_tree(). That bypasses the actual strategize/decision-recording code path. If decision recording breaks, this test still passes. Suggest asserting on CLI/service output that includes decisions, or on mocked lifecycle service interactions that indicate decisions were recorded.


Happy to re-review after those are addressed.

## Code Review — PR #439: test(e2e): verify M3 success criteria — decision tree and correction **Reviewer:** @brent.edwards | **Review type:** Comment-only The helper is thorough, but a couple of the "E2E" checks don’t actually exercise the CLI/persistence paths they claim to validate. --- ### P2:should-fix — "E2E" subcommands don’t call the CLI for tree/explain `robot/helper_m3_e2e_verification.py` uses in-memory `Decision` objects for `decision-tree-view`, `decision-explain`, and `decision-tree-persistence`. Those subcommands never call `agents plan tree` or `agents plan explain`, so regressions in the CLI output or persistence layer won’t be caught. Consider invoking the CLI commands (with mocked lifecycle service if needed) so the tests validate real rendering and serialization paths. --- ### P2:should-fix — "Plan generates decisions" doesn’t validate decision recording `plan-generates-decisions` runs `agents plan use`, but then builds decisions directly via `_build_decision_tree()`. That bypasses the actual strategize/decision-recording code path. If decision recording breaks, this test still passes. Suggest asserting on CLI/service output that includes decisions, or on mocked lifecycle service interactions that indicate decisions were recorded. --- Happy to re-review after those are addressed.
fix(test): route M3 E2E subcommands through CLI rendering path
Some checks failed
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 1m2s
CI / security (pull_request) Successful in 1m1s
CI / quality (pull_request) Successful in 42s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 25s
CI / integration_tests (pull_request) Successful in 5m18s
CI / unit_tests (pull_request) Successful in 30m5s
CI / docker (pull_request) Successful in 15s
CI / benchmark-regression (pull_request) Successful in 26m25s
CI / coverage (pull_request) Has been cancelled
4f079c20ea
- decision-tree-view, decision-explain, and decision-tree-persistence
  now invoke 'plan status --format plain' via mocked lifecycle service
  so regressions in CLI rendering/serialization are caught
- plan-generates-decisions now asserts use_action was called by the CLI
  and verifies plan status renders the strategize phase after creation
- Updated robot test case documentation to reflect CLI integration
Merge branch 'master' into test/m3-e2e-verification
All checks were successful
CI / lint (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 58s
CI / quality (pull_request) Successful in 29s
CI / security (pull_request) Successful in 55s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 6m2s
CI / unit_tests (pull_request) Successful in 36m32s
CI / benchmark-regression (pull_request) Successful in 24m5s
CI / docker (pull_request) Successful in 16s
CI / coverage (pull_request) Successful in 2h6m50s
4d66674acc
brent.edwards closed this pull request 2026-02-26 23:53:52 +00:00
brent.edwards deleted branch test/m3-e2e-verification 2026-02-26 23:53:58 +00:00
All checks were successful
CI / lint (pull_request) Successful in 30s
Required
Details
CI / typecheck (pull_request) Successful in 58s
Required
Details
CI / quality (pull_request) Successful in 29s
Required
Details
CI / security (pull_request) Successful in 55s
Required
Details
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
Required
Details
CI / integration_tests (pull_request) Successful in 6m2s
Required
Details
CI / unit_tests (pull_request) Successful in 36m32s
Required
Details
CI / benchmark-regression (pull_request) Successful in 24m5s
CI / docker (pull_request) Successful in 16s
Required
Details
CI / coverage (pull_request) Successful in 2h6m50s
Required
Details

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!439
No description provided.