test(e2e): verify M3 success criteria — decision tree and correction #439

2026-02-25T21:27:35Z

brent.edwards commented

2026-02-25 21:27:35 +00:00

Summary

Closes #404

Added robot/m3_e2e_verification.robot (10 test cases) covering all M3 acceptance criteria
Added robot/helper_m3_e2e_verification.py (10 subcommands)
Verifies: plan execution generates decisions, decision tree view, decision explain, invariant add/list, correction dry-run, correction live revert, context snapshots, tree persistence, revert re-execution, invariant enforcement

Local checks: lint passed, typecheck passed (0 errors)

## Summary Closes #404 - Added `robot/m3_e2e_verification.robot` (10 test cases) covering all M3 acceptance criteria - Added `robot/helper_m3_e2e_verification.py` (10 subcommands) - Verifies: plan execution generates decisions, decision tree view, decision explain, invariant add/list, correction dry-run, correction live revert, context snapshots, tree persistence, revert re-execution, invariant enforcement **Local checks**: lint passed, typecheck passed (0 errors)

brent.edwards added 1 commit 2026-02-25 21:27:35 +00:00

test(e2e): verify M3 success criteria — decision tree and correction

CI / lint (pull_request) Successful in 30s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / quality (pull_request) Successful in 32s

Details

CI / security (pull_request) Successful in 49s

Details

CI / build (pull_request) Successful in 29s

Details

CI / typecheck (pull_request) Successful in 1m13s

Details

CI / integration_tests (pull_request) Failing after 3m56s

Details

CI / benchmark-regression (pull_request) Successful in 20m57s

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

aeb5cc5110

Robot Framework E2E test suite for M3 milestone verification covering:
- Plan execution generating decisions during Strategize phase
- Decision tree viewing with parent-child relationships and BFS traversal
- Decision explanation with full context snapshot verification
- Invariant add/list via CLI and InvariantService with scope filtering
- Dry-run correction via CorrectionService with impact analysis
- Live revert correction execution with decision re-creation
- Context snapshot round-trip serialisation assertions
- Decision tree persistence via model_dump/model_validate
- Correction revert re-execution from decision point
- Invariant enforcement during strategize with merge precedence

ISSUES CLOSED: #404

brent.edwards referenced this issue from a commit

2026-02-25 22:00:48 +00:00

fix(test): correct patch target for CorrectionService in M3 e2e helper

brent.edwards added 1 commit 2026-02-25 22:00:48 +00:00

fix(test): correct patch target for CorrectionService in M3 e2e helper

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 20s

Details

CI / quality (pull_request) Successful in 20s

Details

CI / build (pull_request) Successful in 26s

Details

CI / typecheck (pull_request) Successful in 57s

Details

CI / security (pull_request) Successful in 1m2s

Details

CI / integration_tests (pull_request) Successful in 4m16s

Details

CI / unit_tests (pull_request) Successful in 23m40s

Details

CI / benchmark-regression (pull_request) Successful in 25m34s

Details

CI / docker (pull_request) Successful in 15s

Details

CI / coverage (pull_request) Successful in 1h40m2s

Details

2e53caaef1

The CorrectionService import in plan.py is a lazy import inside the
correct() function body, so it does not exist as a module-level attribute.
Patch the class at its definition site instead:
  cleveragents.application.services.correction_service.CorrectionService

Fixes CI integration_tests failure for PR #439.

brent.edwards referenced this pull request

2026-02-25 22:14:58 +00:00

test: consolidated Brent QA batch — issues #179, #180, #404, #405, #187 #442

brent.edwards commented

2026-02-25 22:15:08 +00:00

Do not merge this PR individually. All changes are consolidated into PR #442 (develop-brent-5). Please review and merge #442 instead.

**Do not merge this PR individually.** All changes are consolidated into PR #442 (`develop-brent-5`). Please review and merge #442 instead.

brent.edwards commented

2026-02-25 23:53:02 +00:00

Code Review — PR #439: test(e2e): verify M3 success criteria — decision tree and correction

Reviewer: @brent.edwards | Review type: Comment-only

The helper is thorough, but a couple of the "E2E" checks don’t actually exercise the CLI/persistence paths they claim to validate.

P2:should-fix — "E2E" subcommands don’t call the CLI for tree/explain

robot/helper_m3_e2e_verification.py uses in-memory Decision objects for decision-tree-view, decision-explain, and decision-tree-persistence. Those subcommands never call agents plan tree or agents plan explain, so regressions in the CLI output or persistence layer won’t be caught. Consider invoking the CLI commands (with mocked lifecycle service if needed) so the tests validate real rendering and serialization paths.

P2:should-fix — "Plan generates decisions" doesn’t validate decision recording

plan-generates-decisions runs agents plan use, but then builds decisions directly via _build_decision_tree(). That bypasses the actual strategize/decision-recording code path. If decision recording breaks, this test still passes. Suggest asserting on CLI/service output that includes decisions, or on mocked lifecycle service interactions that indicate decisions were recorded.

Happy to re-review after those are addressed.

## Code Review — PR #439: test(e2e): verify M3 success criteria — decision tree and correction **Reviewer:** @brent.edwards | **Review type:** Comment-only The helper is thorough, but a couple of the "E2E" checks don’t actually exercise the CLI/persistence paths they claim to validate. --- ### P2:should-fix — "E2E" subcommands don’t call the CLI for tree/explain `robot/helper_m3_e2e_verification.py` uses in-memory `Decision` objects for `decision-tree-view`, `decision-explain`, and `decision-tree-persistence`. Those subcommands never call `agents plan tree` or `agents plan explain`, so regressions in the CLI output or persistence layer won’t be caught. Consider invoking the CLI commands (with mocked lifecycle service if needed) so the tests validate real rendering and serialization paths. --- ### P2:should-fix — "Plan generates decisions" doesn’t validate decision recording `plan-generates-decisions` runs `agents plan use`, but then builds decisions directly via `_build_decision_tree()`. That bypasses the actual strategize/decision-recording code path. If decision recording breaks, this test still passes. Suggest asserting on CLI/service output that includes decisions, or on mocked lifecycle service interactions that indicate decisions were recorded. --- Happy to re-review after those are addressed.

brent.edwards added 1 commit 2026-02-26 01:26:13 +00:00

fix(test): route M3 E2E subcommands through CLI rendering path

CI / lint (pull_request) Successful in 24s

Details

CI / typecheck (pull_request) Successful in 1m2s

Details

CI / security (pull_request) Successful in 1m1s

Details

CI / quality (pull_request) Successful in 42s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 25s

Details

CI / integration_tests (pull_request) Successful in 5m18s

Details

CI / unit_tests (pull_request) Successful in 30m5s

Details

CI / docker (pull_request) Successful in 15s

Details

CI / benchmark-regression (pull_request) Successful in 26m25s

Details

CI / coverage (pull_request) Has been cancelled

Details

4f079c20ea

- decision-tree-view, decision-explain, and decision-tree-persistence
  now invoke 'plan status --format plain' via mocked lifecycle service
  so regressions in CLI rendering/serialization are caught
- plan-generates-decisions now asserts use_action was called by the CLI
  and verifies plan status renders the strategize phase after creation
- Updated robot test case documentation to reflect CLI integration

brent.edwards added 1 commit 2026-02-26 03:43:46 +00:00

Merge branch 'master' into test/m3-e2e-verification

CI / lint (pull_request) Successful in 30s

Details

CI / typecheck (pull_request) Successful in 58s

Details

CI / quality (pull_request) Successful in 29s

Details

CI / security (pull_request) Successful in 55s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 24s

Details

CI / integration_tests (pull_request) Successful in 6m2s

Details

CI / unit_tests (pull_request) Successful in 36m32s

Details

CI / benchmark-regression (pull_request) Successful in 24m5s

Details

CI / docker (pull_request) Successful in 16s

Details

CI / coverage (pull_request) Successful in 2h6m50s

Details

4d66674acc

brent.edwards closed this pull request

2026-02-26 23:53:52 +00:00

brent.edwards deleted branch test/m3-e2e-verification

2026-02-26 23:53:58 +00:00

freemo added the

State

Wont Do

label 2026-03-04 00:58:51 +00:00

CI / lint (pull_request) Successful in 30s

Required

Details

CI / typecheck (pull_request) Successful in 58s

Required

Details

CI / quality (pull_request) Successful in 29s

Required

Details

CI / security (pull_request) Successful in 55s

Required

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 24s

Required

Details

CI / integration_tests (pull_request) Successful in 6m2s

Required

Details

CI / unit_tests (pull_request) Successful in 36m32s

Required

Details

CI / benchmark-regression (pull_request) Successful in 24m5s

Details

CI / docker (pull_request) Successful in 16s

Required

Details

CI / coverage (pull_request) Successful in 2h6m50s

Required

Details

Pull request closed

This pull request cannot be reopened because the branch was deleted.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#439