test(e2e): verify M6 success criteria — Firefox-scale autonomous porting #457

Merged
brent.edwards merged 4 commits from test/m6-e2e-verification into master 2026-02-27 23:00:15 +00:00
Member

Summary

Robot Framework E2E test suite exercising the complete M6 success criteria verification sequence for Firefox-scale autonomous porting.

Changes

  • robot/helper_m6_e2e_verification.py — Python helper with 10 subcommands
  • robot/m6_e2e_verification.robot — Robot Framework test suite with 10 test cases

Acceptance Criteria Covered

  • Robot test creates a porting action from YAML config
  • Robot test executes the porting plan on a large project
  • Robot test monitors hierarchical decomposition via plan tree structure
  • Robot test applies completed results
  • Assertions verify hierarchical decomposition creates 4+ levels of subplans (5 levels: root + L1-L4)
  • Assertions verify decision correction recomputes only affected subtree
  • Assertions verify parallel execution scales to 10+ concurrent subplans (15 tested)
  • Assertions verify a realistic porting task completes autonomously

Test Subcommands

Subcommand What it verifies
action-create-porting Porting action creation from YAML config via CLI
plan-use-execute Plan use + execute via mocked lifecycle service
hierarchical-decomposition 4+ levels: root → L1(4) → L2(8) → L3(4) → L4(4) = 21 plans
correction-affected-subtree CorrectionImpact scoped to target subtree only
parallel-execution-scale 15 concurrent subplans with PARALLEL mode
porting-task-autonomous Full lifecycle: ACTION → STRATEGIZE → EXECUTE → APPLY
plan-apply-lifecycle lifecycle-apply CLI transitions to APPLIED state
failure-handler-logic SubplanFailureHandler retry/stop-others decisions
subplan-config-modes All ExecutionMode + SubplanMergeStrategy values
decision-tree-porting PROMPT_DEFINITION root, SUBPLAN_PARALLEL_SPAWN children

Verification

  • nox -s lint — passes
  • nox -s typecheck — passes
  • All 10 subcommands verified passing locally

Closes #407

## Summary Robot Framework E2E test suite exercising the complete M6 success criteria verification sequence for Firefox-scale autonomous porting. ## Changes - `robot/helper_m6_e2e_verification.py` — Python helper with 10 subcommands - `robot/m6_e2e_verification.robot` — Robot Framework test suite with 10 test cases ## Acceptance Criteria Covered - [x] Robot test creates a porting action from YAML config - [x] Robot test executes the porting plan on a large project - [x] Robot test monitors hierarchical decomposition via plan tree structure - [x] Robot test applies completed results - [x] Assertions verify hierarchical decomposition creates 4+ levels of subplans (5 levels: root + L1-L4) - [x] Assertions verify decision correction recomputes only affected subtree - [x] Assertions verify parallel execution scales to 10+ concurrent subplans (15 tested) - [x] Assertions verify a realistic porting task completes autonomously ## Test Subcommands | Subcommand | What it verifies | |---|---| | `action-create-porting` | Porting action creation from YAML config via CLI | | `plan-use-execute` | Plan use + execute via mocked lifecycle service | | `hierarchical-decomposition` | 4+ levels: root → L1(4) → L2(8) → L3(4) → L4(4) = 21 plans | | `correction-affected-subtree` | CorrectionImpact scoped to target subtree only | | `parallel-execution-scale` | 15 concurrent subplans with PARALLEL mode | | `porting-task-autonomous` | Full lifecycle: ACTION → STRATEGIZE → EXECUTE → APPLY | | `plan-apply-lifecycle` | lifecycle-apply CLI transitions to APPLIED state | | `failure-handler-logic` | SubplanFailureHandler retry/stop-others decisions | | `subplan-config-modes` | All ExecutionMode + SubplanMergeStrategy values | | `decision-tree-porting` | PROMPT_DEFINITION root, SUBPLAN_PARALLEL_SPAWN children | ## Verification - `nox -s lint` — passes - `nox -s typecheck` — passes - All 10 subcommands verified passing locally Closes #407
test(e2e): verify M6 success criteria — Firefox-scale autonomous porting
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 34s
CI / integration_tests (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 12m21s
CI / docker (pull_request) Successful in 15s
CI / benchmark-regression (pull_request) Successful in 25m32s
CI / coverage (pull_request) Successful in 40m18s
1ec2200e34
Robot Framework E2E test suite exercising the complete M6 success
criteria verification sequence:

- Porting action creation from YAML config via CLI
- Plan use + execute via CLI with mocked lifecycle service
- Hierarchical decomposition: 4+ levels (root + L1-L4 = 21+ plans)
- Decision correction recomputes only affected subtree (CorrectionImpact)
- Parallel execution scales to 15 concurrent subplans (10+ required)
- Realistic porting task: full ACTION → STRATEGIZE → EXECUTE → APPLY
  lifecycle with 10 subplans completing autonomously
- Plan apply transitions to APPLIED terminal state
- SubplanFailureHandler retry/stop-others logic verification
- SubplanConfig supports all execution modes and merge strategies
- Decision tree structure: PROMPT_DEFINITION root with
  SUBPLAN_PARALLEL_SPAWN children and superseded_by flow

Ten subcommands in the Python helper, each printing a sentinel
string on success. All subcommands verified passing locally.

Closes #407
Merge branch 'master' into test/m6-e2e-verification
All checks were successful
CI / lint (pull_request) Successful in 25s
CI / quality (pull_request) Successful in 21s
CI / security (pull_request) Successful in 32s
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 50s
CI / build (pull_request) Successful in 22s
CI / integration_tests (pull_request) Successful in 5m18s
CI / unit_tests (pull_request) Successful in 12m10s
CI / docker (pull_request) Successful in 39s
CI / benchmark-regression (pull_request) Successful in 25m18s
CI / coverage (pull_request) Successful in 1h16m20s
f7b8186b3f
Merge branch 'master' into test/m6-e2e-verification
All checks were successful
CI / lint (pull_request) Successful in 20s
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 38s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 52s
CI / build (pull_request) Successful in 35s
CI / integration_tests (pull_request) Successful in 5m16s
CI / benchmark-regression (pull_request) Successful in 27m54s
CI / unit_tests (pull_request) Successful in 33m2s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 53m19s
78bd3584e0
CoreRasurae left a comment

Tests are passing and this increases the integration tests coverage. Approved

Tests are passing and this increases the integration tests coverage. Approved
Merge branch 'master' into test/m6-e2e-verification
All checks were successful
CI / lint (pull_request) Successful in 14s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 33s
CI / build (pull_request) Successful in 26s
CI / integration_tests (pull_request) Successful in 5m24s
CI / benchmark-regression (pull_request) Successful in 25m29s
CI / unit_tests (pull_request) Successful in 33m25s
CI / docker (pull_request) Successful in 14s
CI / coverage (pull_request) Successful in 42m43s
ea060ec7a9
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-02-27 22:17:07 +00:00
brent.edwards deleted branch test/m6-e2e-verification 2026-02-27 23:00:15 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!457
No description provided.