test(e2e): add M3 decision + validation suites #437

Closed
brent.edwards wants to merge 2 commits from feature/m3-decision-validation-smoke into master
Member

Summary

Closes #179

  • Added features/m3_decision_validation_smoke.feature (25 scenarios) covering decision tree fixtures, invariant CLI, validation CLI, plan correct (dry-run/revert/append), and negative cases
  • Added features/steps/m3_decision_validation_smoke_steps.py with step implementations
  • Added fixtures in features/fixtures/m3/ (decision trees, validation attachments, invariant configs)
  • Added robot/m3_decision_validation_smoke.robot (8 integration tests) + helper
  • Added benchmarks/m3_smoke_bench.py (5 ASV suites)
  • Updated docs/development/testing.md with M3 suite documentation

Local checks: lint passed, typecheck passed (0 errors)

## Summary Closes #179 - Added `features/m3_decision_validation_smoke.feature` (25 scenarios) covering decision tree fixtures, invariant CLI, validation CLI, plan correct (dry-run/revert/append), and negative cases - Added `features/steps/m3_decision_validation_smoke_steps.py` with step implementations - Added fixtures in `features/fixtures/m3/` (decision trees, validation attachments, invariant configs) - Added `robot/m3_decision_validation_smoke.robot` (8 integration tests) + helper - Added `benchmarks/m3_smoke_bench.py` (5 ASV suites) - Updated `docs/development/testing.md` with M3 suite documentation **Local checks**: lint passed, typecheck passed (0 errors)
test(e2e): add M3 decision + validation suites
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 20s
CI / security (pull_request) Successful in 55s
CI / typecheck (pull_request) Successful in 1m5s
CI / integration_tests (pull_request) Successful in 4m4s
CI / benchmark-regression (pull_request) Successful in 22m21s
CI / unit_tests (pull_request) Failing after 34m7s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been cancelled
ace7311de4
fix(test): correct dry-run assertion in M3 decision validation smoke
All checks were successful
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 58s
CI / security (pull_request) Successful in 56s
CI / quality (pull_request) Successful in 42s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 25s
CI / integration_tests (pull_request) Successful in 5m7s
CI / unit_tests (pull_request) Successful in 25m18s
CI / benchmark-regression (pull_request) Successful in 26m14s
CI / docker (pull_request) Successful in 16s
CI / coverage (pull_request) Successful in 1h40m20s
22986aaaf9
The CLI plain format output does not include the literal text "Dry Run"
(that string only appears in the Rich panel title). Assert against
"risk_level" which is present in the dry-run impact output instead.
Author
Member

Do not merge this PR individually. All changes are consolidated into PR #442 (develop-brent-5). Please review and merge #442 instead.

**Do not merge this PR individually.** All changes are consolidated into PR #442 (`develop-brent-5`). Please review and merge #442 instead.
Author
Member

Code Review — PR #437: test(e2e): add M3 decision + validation suites

Reviewer: @brent.edwards | Review type: Comment-only

Nice coverage on the M3 fixtures and CLI flows. A few issues to fix before merge:


P2:should-fix — Behave hook won’t run (patchers never stop)

features/steps/m3_decision_validation_smoke_steps.py defines after_scenario() at the bottom, but Behave only picks up hooks from features/environment.py. That means invariant_patcher, validation_patcher, plan_patcher, and correction_patcher never get stopped, and the mocks can leak into later scenarios/features. Please move the hook into features/environment.py or wrap patchers in context.add_cleanup / per-step with patch(...).


P2:should-fix — Doc update includes files not in this PR

docs/development/testing.md adds the decision persistence benchmark/feature sections, but PR #437 doesn’t include those files. If this merges first, the docs will reference missing files. Please split the decision-persistence doc section into PR #438 or remove it here.


P3:nit — Invalid ULID constants in tests

The hard-coded IDs in features/steps/m3_decision_validation_smoke_steps.py and robot/helper_m3_decision_validation_smoke.py include non-ULID strings (e.g., 25-char IDs and _INVARIANT_ULID contains I). This is fine today, but will break if any ULID validation is introduced on these fields. Safer to use str(ULID()) or 26-char Crockford ULIDs.


P3:nit — Temporary validation configs never deleted

Both the Behave steps and Robot helper create NamedTemporaryFile(..., delete=False) without cleanup. Not a blocker, but consider deleting in teardown to avoid temp accumulation.


Happy to re-review after the P2s are addressed.

## Code Review — PR #437: test(e2e): add M3 decision + validation suites **Reviewer:** @brent.edwards | **Review type:** Comment-only Nice coverage on the M3 fixtures and CLI flows. A few issues to fix before merge: --- ### P2:should-fix — Behave hook won’t run (patchers never stop) `features/steps/m3_decision_validation_smoke_steps.py` defines `after_scenario()` at the bottom, but Behave only picks up hooks from `features/environment.py`. That means `invariant_patcher`, `validation_patcher`, `plan_patcher`, and `correction_patcher` never get stopped, and the mocks can leak into later scenarios/features. Please move the hook into `features/environment.py` or wrap patchers in `context.add_cleanup` / per-step `with patch(...)`. --- ### P2:should-fix — Doc update includes files not in this PR `docs/development/testing.md` adds the decision persistence benchmark/feature sections, but PR #437 doesn’t include those files. If this merges first, the docs will reference missing files. Please split the decision-persistence doc section into PR #438 or remove it here. --- ### P3:nit — Invalid ULID constants in tests The hard-coded IDs in `features/steps/m3_decision_validation_smoke_steps.py` and `robot/helper_m3_decision_validation_smoke.py` include non-ULID strings (e.g., 25-char IDs and `_INVARIANT_ULID` contains `I`). This is fine today, but will break if any ULID validation is introduced on these fields. Safer to use `str(ULID())` or 26-char Crockford ULIDs. --- ### P3:nit — Temporary validation configs never deleted Both the Behave steps and Robot helper create `NamedTemporaryFile(..., delete=False)` without cleanup. Not a blocker, but consider deleting in teardown to avoid temp accumulation. --- Happy to re-review after the P2s are addressed.
fix(test): address review findings in M3 smoke tests
All checks were successful
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 55s
CI / security (pull_request) Successful in 47s
CI / quality (pull_request) Successful in 28s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / integration_tests (pull_request) Successful in 5m7s
CI / unit_tests (pull_request) Successful in 20m55s
CI / benchmark-regression (pull_request) Successful in 25m40s
CI / docker (pull_request) Successful in 14s
CI / coverage (pull_request) Successful in 1h48m41s
93730067a2
- Replace dead after_scenario() hook with context.add_cleanup() calls
  so patchers are properly stopped (Behave only runs hooks from
  environment.py)
- Fix invalid ULID constants: pad DECISION/CORRECTION to 26 chars,
  replace invalid 'I' with 'J' in INVARIANT_ULID
- Add temp file cleanup for NamedTemporaryFile(delete=False) via
  context.add_cleanup() in behave steps and try/finally in robot helper
- Remove decision-persistence doc sections from testing.md that belong
  in PR #438, not this PR
Merge branch 'master' into feature/m3-decision-validation-smoke
All checks were successful
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 20s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 39s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 20s
CI / integration_tests (pull_request) Successful in 4m46s
CI / unit_tests (pull_request) Successful in 31m4s
CI / benchmark-regression (pull_request) Successful in 24m29s
CI / docker (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 1h25m36s
f40f1202fb
brent.edwards closed this pull request 2026-02-26 23:54:30 +00:00
brent.edwards deleted branch feature/m3-decision-validation-smoke 2026-02-26 23:54:43 +00:00
All checks were successful
CI / lint (pull_request) Successful in 17s
Required
Details
CI / quality (pull_request) Successful in 20s
Required
Details
CI / security (pull_request) Successful in 37s
Required
Details
CI / typecheck (pull_request) Successful in 39s
Required
Details
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 20s
Required
Details
CI / integration_tests (pull_request) Successful in 4m46s
Required
Details
CI / unit_tests (pull_request) Successful in 31m4s
Required
Details
CI / benchmark-regression (pull_request) Successful in 24m29s
CI / docker (pull_request) Successful in 15s
Required
Details
CI / coverage (pull_request) Successful in 1h25m36s
Required
Details

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!437
No description provided.