test(e2e): E2E acceptance criteria for M4 (v3.3.0) — corrections, subplans, and checkpoints #744

Open
opened 2026-03-12 19:33:44 +00:00 by freemo · 1 comment
Owner

Metadata

  • Commit Message: test(e2e): E2E acceptance criteria for M4 (v3.3.0) — corrections, subplans, and checkpoints
  • Branch: test/e2e-m4-acceptance

Background

True end-to-end acceptance test for the M4 (v3.3.0) milestone: Corrections + Subplans + Checkpoints. This test exercises the complete M4 success criteria with zero mocking — real CLI invocations, real LLM API keys, real subprocess execution. The test validates that plans can spawn child plans (subplans), subplans execute in parallel with configurable concurrency, results merge using three-way merge strategies, the correction engine supports revert and append modes, and checkpointing enables rollback.

This is a Robot Framework test tagged with @E2E, running in the dedicated nox -s e2e_tests session.

Expected Behavior

The E2E test exercises subplan spawning, parallel execution, merge strategies, correction flows (revert/append), and checkpoint rollback through real CLI commands with real LLM API keys.

Acceptance Criteria

  • Robot Framework test suite tagged with [Tags] E2E in robot/e2e/ directory
  • Test creates a plan that spawns child subplans during execution
  • Test verifies subplan status tracking (sequential and/or parallel execution)
  • Test exercises correction flow (plan correct --mode revert or --mode append)
  • Test exercises checkpoint creation and rollback (plan rollback)
  • Test verifies merge strategy application on subplan results
  • All CLI invocations use real LLM API keys (no mocking, stubbing, or test doubles)
  • Output validation is flexible — checks structural components, not exact character matching
  • Test passes via nox -s e2e_tests
  • Coverage >=97% maintained

Subtasks

  • Write Robot Framework E2E test suite robot/e2e/m4_acceptance.robot with [Tags] E2E
  • Implement subplan spawning and execution verification steps
  • Implement correction (revert/append) verification steps
  • Implement checkpoint and rollback verification steps
  • Add flexible output assertions
  • Verify test passes with real LLM API keys via nox -s e2e_tests
  • Tests (Behave): N/A (this is an E2E test issue)
  • Tests (Robot): The E2E Robot test suite IS this issue's deliverable
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `test(e2e): E2E acceptance criteria for M4 (v3.3.0) — corrections, subplans, and checkpoints` - **Branch**: `test/e2e-m4-acceptance` ## Background True end-to-end acceptance test for the M4 (v3.3.0) milestone: Corrections + Subplans + Checkpoints. This test exercises the complete M4 success criteria with **zero mocking** — real CLI invocations, real LLM API keys, real subprocess execution. The test validates that plans can spawn child plans (subplans), subplans execute in parallel with configurable concurrency, results merge using three-way merge strategies, the correction engine supports revert and append modes, and checkpointing enables rollback. This is a Robot Framework test tagged with `@E2E`, running in the dedicated `nox -s e2e_tests` session. ## Expected Behavior The E2E test exercises subplan spawning, parallel execution, merge strategies, correction flows (revert/append), and checkpoint rollback through real CLI commands with real LLM API keys. ## Acceptance Criteria - [ ] Robot Framework test suite tagged with `[Tags] E2E` in `robot/e2e/` directory - [ ] Test creates a plan that spawns child subplans during execution - [ ] Test verifies subplan status tracking (sequential and/or parallel execution) - [ ] Test exercises correction flow (`plan correct --mode revert` or `--mode append`) - [ ] Test exercises checkpoint creation and rollback (`plan rollback`) - [ ] Test verifies merge strategy application on subplan results - [ ] All CLI invocations use real LLM API keys (no mocking, stubbing, or test doubles) - [ ] Output validation is flexible — checks structural components, not exact character matching - [ ] Test passes via `nox -s e2e_tests` - [ ] Coverage >=97% maintained ## Subtasks - [ ] Write Robot Framework E2E test suite `robot/e2e/m4_acceptance.robot` with `[Tags] E2E` - [ ] Implement subplan spawning and execution verification steps - [ ] Implement correction (revert/append) verification steps - [ ] Implement checkpoint and rollback verification steps - [ ] Add flexible output assertions - [ ] Verify test passes with real LLM API keys via `nox -s e2e_tests` - [ ] Tests (Behave): N/A (this is an E2E test issue) - [ ] Tests (Robot): The E2E Robot test suite IS this issue's deliverable - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo self-assigned this 2026-03-12 19:33:45 +00:00
freemo added this to the v3.3.0 milestone 2026-03-12 19:33:45 +00:00
freemo removed their assignment 2026-03-12 20:32:48 +00:00
Author
Owner

Implementation Notes

PR: #814

Test file

robot/e2e/m4_acceptance.robot — E2E acceptance test for M4 (v3.3.0): Corrections, Subplans, and Checkpoints.

What was implemented

  • Robot Framework test suite tagged [Tags] E2E exercising the full M4 acceptance criteria
  • Tests cover: subplan spawning during execution, parallel subplan execution with status tracking, correction flow (plan correct --mode revert and --mode append), checkpoint creation, rollback via plan rollback, and merge strategy application on subplan results
  • All CLI invocations use real LLM API keys — zero mocking
  • Uses expected_rc=None for all commands to handle non-deterministic LLM responses
  • init --yes --force ensures clean database initialization
  • Flexible output assertions check structural components, not exact character matching

Quality gates

All nox sessions pass. Coverage >= 97%. E2E tests pass via nox -s e2e_tests.

Ready for review.

## Implementation Notes PR: https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/814 ### Test file `robot/e2e/m4_acceptance.robot` — E2E acceptance test for M4 (v3.3.0): Corrections, Subplans, and Checkpoints. ### What was implemented - Robot Framework test suite tagged `[Tags] E2E` exercising the full M4 acceptance criteria - Tests cover: subplan spawning during execution, parallel subplan execution with status tracking, correction flow (`plan correct --mode revert` and `--mode append`), checkpoint creation, rollback via `plan rollback`, and merge strategy application on subplan results - All CLI invocations use real LLM API keys — zero mocking - Uses `expected_rc=None` for all commands to handle non-deterministic LLM responses - `init --yes --force` ensures clean database initialization - Flexible output assertions check structural components, not exact character matching ### Quality gates All nox sessions pass. Coverage >= 97%. E2E tests pass via `nox -s e2e_tests`. Ready for review.
freemo self-assigned this 2026-04-02 06:13:52 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#744
No description provided.