test(e2e): add M4 correction + subplan suites #441

Closed
brent.edwards wants to merge 2 commits from feature/m4-correction-subplan-smoke into master
Member

Summary

  • 20 Behave scenarios (features/m4_correction_subplan_smoke.feature) covering correction flows, subplan scheduling, and conflict resolution
  • 8 Robot Framework integration tests (robot/m4_correction_subplan_smoke.robot) with helper script
  • 4 ASV benchmark suites (benchmarks/m4_smoke_bench.py) for correction and subplan performance
  • 3 fixture files: correction_flows.json, subplan_execution.json, conflict_simulations.json
  • Updated docs/development/testing.md

Closes #187

## Summary - 20 Behave scenarios (`features/m4_correction_subplan_smoke.feature`) covering correction flows, subplan scheduling, and conflict resolution - 8 Robot Framework integration tests (`robot/m4_correction_subplan_smoke.robot`) with helper script - 4 ASV benchmark suites (`benchmarks/m4_smoke_bench.py`) for correction and subplan performance - 3 fixture files: `correction_flows.json`, `subplan_execution.json`, `conflict_simulations.json` - Updated `docs/development/testing.md` Closes #187
fix(test): correct patch targets and ULID validation in M4 smoke helper
Some checks failed
CI / lint (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 59s
CI / security (pull_request) Successful in 51s
CI / quality (pull_request) Successful in 35s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 30s
CI / integration_tests (pull_request) Successful in 5m11s
CI / unit_tests (pull_request) Failing after 25m35s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 26m22s
CI / coverage (pull_request) Failing after 1h27m30s
55df1915f1
- Patch CorrectionService at its definition site instead of the lazy
  import location in plan.py (same fix as M3 helper)
- Replace hardcoded fake subplan/correction IDs with real ULIDs
  generated via ulid.ULID() to satisfy Pydantic pattern validation
- Add missing --guidance flag to correction dry-run CLI invocation
- Fix subplan-status-sequential assertion to check for plan_id
  (subplan_count is not part of the status JSON output)
Author
Member

Do not merge this PR individually. All changes are consolidated into PR #442 (develop-brent-5). Please review and merge #442 instead.

**Do not merge this PR individually.** All changes are consolidated into PR #442 (`develop-brent-5`). Please review and merge #442 instead.
fix(test): correct mock targets, method names, ULIDs, and CLI args in M4 smoke suite
Some checks failed
CI / lint (pull_request) Successful in 22s
CI / typecheck (pull_request) Successful in 55s
CI / security (pull_request) Successful in 48s
CI / quality (pull_request) Successful in 28s
CI / integration_tests (pull_request) Successful in 5m6s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / unit_tests (pull_request) Successful in 34m47s
CI / benchmark-regression (pull_request) Successful in 25m45s
CI / docker (pull_request) Successful in 1m15s
CI / coverage (pull_request) Has been cancelled
3c9a3efdf1
- Patch CorrectionService at its module path, not the local-import site
- Use request_correction/execute_correction/analyze_impact (real API)
- Replace invalid 27-char subplan_id values with valid 26-char ULIDs
- Add required --guidance flag to dry-run CLI invocation
- Fix feature file assertions to match actual CLI output

Resolves CI failures in unit_tests (job 4) and coverage (job 6) for
run 645 on PR #441.
Author
Member

CI Fix — All Nox Stages Pass

Pushed 3c9a3ef to fix the failing unit_tests (job 4) and coverage (job 6) in CI run 645.

Bugs Fixed

# Description Files
1 Wrong patch targetCorrectionService is a local import inside correct_decision(), never a module-level attr on plan. Patched at its actual module path instead. steps, bench
2 Wrong mock method names — used apply_correction/generate_dry_run_report but real API is request_correction/execute_correction/analyze_impact. steps, bench
3 Feature file assertions — checked for strings ("append", "subplan_count", "parallel") that the CLI never emits; changed to match actual output. feature
4 Invalid ULID subplan_ids — 27-char strings with lowercase hex chars failing Pydantic ^[0-9A-HJKMNP-TV-Z]{26}$ validation. Replaced with valid 26-char Crockford Base32 ULIDs. steps, bench
5 Missing --guidance on dry-run — CLI requires --guidance even with --dry-run; added the flag. steps, bench

Local Verification

  • lint: pass
  • typecheck: 0 errors
  • unit_tests: 297 features, 6350 scenarios, 27729 steps — 0 failures
  • integration_tests: pass (8 min)
  • coverage_report: 97.1% (threshold 97%) — pass
## CI Fix — All Nox Stages Pass Pushed `3c9a3ef` to fix the failing unit_tests (job 4) and coverage (job 6) in CI run 645. ### Bugs Fixed | # | Description | Files | |---|-------------|-------| | 1 | **Wrong patch target** — `CorrectionService` is a local import inside `correct_decision()`, never a module-level attr on `plan`. Patched at its actual module path instead. | steps, bench | | 2 | **Wrong mock method names** — used `apply_correction`/`generate_dry_run_report` but real API is `request_correction`/`execute_correction`/`analyze_impact`. | steps, bench | | 3 | **Feature file assertions** — checked for strings (`"append"`, `"subplan_count"`, `"parallel"`) that the CLI never emits; changed to match actual output. | feature | | 4 | **Invalid ULID subplan_ids** — 27-char strings with lowercase hex chars failing Pydantic `^[0-9A-HJKMNP-TV-Z]{26}$` validation. Replaced with valid 26-char Crockford Base32 ULIDs. | steps, bench | | 5 | **Missing `--guidance` on dry-run** — CLI requires `--guidance` even with `--dry-run`; added the flag. | steps, bench | ### Local Verification - **lint**: pass - **typecheck**: 0 errors - **unit_tests**: 297 features, 6350 scenarios, 27729 steps — 0 failures - **integration_tests**: pass (8 min) - **coverage_report**: **97.1%** (threshold 97%) — pass
Merge branch 'master' into feature/m4-correction-subplan-smoke
All checks were successful
CI / lint (pull_request) Successful in 24s
CI / quality (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 1m3s
CI / security (pull_request) Successful in 1m4s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / integration_tests (pull_request) Successful in 5m45s
CI / unit_tests (pull_request) Successful in 31m15s
CI / benchmark-regression (pull_request) Successful in 24m31s
CI / docker (pull_request) Successful in 1m2s
CI / coverage (pull_request) Successful in 2h14m48s
dc3b2bf08f
brent.edwards closed this pull request 2026-02-26 23:53:02 +00:00
brent.edwards deleted branch feature/m4-correction-subplan-smoke 2026-02-26 23:53:11 +00:00
All checks were successful
CI / lint (pull_request) Successful in 24s
Required
Details
CI / quality (pull_request) Successful in 38s
Required
Details
CI / typecheck (pull_request) Successful in 1m3s
Required
Details
CI / security (pull_request) Successful in 1m4s
Required
Details
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
Required
Details
CI / integration_tests (pull_request) Successful in 5m45s
Required
Details
CI / unit_tests (pull_request) Successful in 31m15s
Required
Details
CI / benchmark-regression (pull_request) Successful in 24m31s
CI / docker (pull_request) Successful in 1m2s
Required
Details
CI / coverage (pull_request) Successful in 2h14m48s
Required
Details

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!441
No description provided.