UAT: No E2E acceptance tests for M3 (v3.2.0) and M4 (v3.3.0) milestones — m3_acceptance.robot and m4_acceptance.robot missing #6014

Open
opened 2026-04-09 13:39:54 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: E2E Workflow Specification Tests — Missing Milestone Acceptance Tests
Milestone: v3.6.0 (M7) — E2E workflow specification tests
Severity: Priority/Backlog — test infrastructure gap, not blocking runtime

What Was Tested

Code-level analysis of robot/e2e/ directory against milestone acceptance criteria for M3 (v3.2.0) and M4 (v3.3.0).

Expected Behavior (from spec)

The E2E test suite has acceptance tests for milestones M1, M2, M5, and M6:

  • robot/e2e/m1_acceptance.robot — M1 (v3.0.0) acceptance
  • robot/e2e/m2_acceptance.robot — M2 (v3.1.0) acceptance
  • robot/e2e/m5_acceptance.robot — M5 (v3.4.0) acceptance
  • robot/e2e/m6_acceptance.robot — M6 (v3.5.0) acceptance

By the same pattern, M3 and M4 should have corresponding acceptance tests.

Actual Behavior

No m3_acceptance.robot or m4_acceptance.robot files exist in robot/e2e/. The integration test suite has robot/m3_e2e_verification.robot and robot/m4_e2e_verification.robot (which use mocks), but there are no E2E acceptance tests (with real LLM keys) for these milestones.

M3 (v3.2.0) acceptance criteria that need E2E coverage:

  • Decisions recorded during Strategize with full context snapshots
  • agents plan tree renders the decision tree correctly
  • agents plan explain shows decision details including alternatives considered
  • agents invariant add creates invariants; agents invariant list displays them
  • Invariants are enforced during strategize
  • agents plan correct --mode=revert re-executes from the targeted decision point
  • agents plan correct --mode=append adds guidance without recomputing

M4 (v3.3.0) acceptance criteria that need E2E coverage:

  • Plans spawn child subplans during execution
  • Subplan status tracking works (sequential and/or parallel execution)
  • Correction flow functional (plan correct --mode revert and --mode append)
  • Checkpoint creation and rollback (plan rollback) functional
  • Merge strategy application on subplan results works correctly
  • Parent plan tracks all subplan statuses
  • Three-way merge combines non-conflicting changes; conflicts surfaced to user

Code Location

  • robot/e2e/ directory — missing m3_acceptance.robot and m4_acceptance.robot
  • robot/m3_e2e_verification.robot — integration test (uses mocks, not real LLM)
  • robot/m4_e2e_verification.robot — integration test (uses mocks, not real LLM)

Impact

M3 and M4 are active milestones (both overdue, both at 67-73% completion). Without E2E acceptance tests:

  • Decision recording, plan tree, plan explain, invariant enforcement are untested with real LLM calls
  • Subplan spawning, checkpoint rollback, three-way merge are untested end-to-end
  • Milestone completion cannot be verified against real LLM behavior

The integration tests (m3_e2e_verification.robot, m4_e2e_verification.robot) use mocked LLM responses and cannot catch issues that only manifest with real LLM calls (e.g., decision tree structure, invariant enforcement in real strategize output).

Definition of Done

Create E2E acceptance test files:

  • robot/e2e/m3_acceptance.robot — M3 acceptance with real LLM keys
    • Decision recording during Strategize
    • plan tree and plan explain output verification
    • Invariant add/list/enforce cycle
    • plan correct --mode revert and --mode append
  • robot/e2e/m4_acceptance.robot — M4 acceptance with real LLM keys
    • Subplan spawning during Execute
    • Checkpoint creation and plan rollback
    • Three-way merge on subplan results
    • Parent plan tracking subplan statuses

Each file should follow the pattern of m6_acceptance.robot (the most complete acceptance test).


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: E2E Workflow Specification Tests — Missing Milestone Acceptance Tests **Milestone**: v3.6.0 (M7) — E2E workflow specification tests **Severity**: Priority/Backlog — test infrastructure gap, not blocking runtime ### What Was Tested Code-level analysis of `robot/e2e/` directory against milestone acceptance criteria for M3 (v3.2.0) and M4 (v3.3.0). ### Expected Behavior (from spec) The E2E test suite has acceptance tests for milestones M1, M2, M5, and M6: - `robot/e2e/m1_acceptance.robot` — M1 (v3.0.0) acceptance - `robot/e2e/m2_acceptance.robot` — M2 (v3.1.0) acceptance - `robot/e2e/m5_acceptance.robot` — M5 (v3.4.0) acceptance - `robot/e2e/m6_acceptance.robot` — M6 (v3.5.0) acceptance By the same pattern, M3 and M4 should have corresponding acceptance tests. ### Actual Behavior No `m3_acceptance.robot` or `m4_acceptance.robot` files exist in `robot/e2e/`. The integration test suite has `robot/m3_e2e_verification.robot` and `robot/m4_e2e_verification.robot` (which use mocks), but there are no E2E acceptance tests (with real LLM keys) for these milestones. **M3 (v3.2.0) acceptance criteria that need E2E coverage:** - Decisions recorded during Strategize with full context snapshots - `agents plan tree` renders the decision tree correctly - `agents plan explain` shows decision details including alternatives considered - `agents invariant add` creates invariants; `agents invariant list` displays them - Invariants are enforced during strategize - `agents plan correct --mode=revert` re-executes from the targeted decision point - `agents plan correct --mode=append` adds guidance without recomputing **M4 (v3.3.0) acceptance criteria that need E2E coverage:** - Plans spawn child subplans during execution - Subplan status tracking works (sequential and/or parallel execution) - Correction flow functional (`plan correct --mode revert` and `--mode append`) - Checkpoint creation and rollback (`plan rollback`) functional - Merge strategy application on subplan results works correctly - Parent plan tracks all subplan statuses - Three-way merge combines non-conflicting changes; conflicts surfaced to user ### Code Location - `robot/e2e/` directory — missing `m3_acceptance.robot` and `m4_acceptance.robot` - `robot/m3_e2e_verification.robot` — integration test (uses mocks, not real LLM) - `robot/m4_e2e_verification.robot` — integration test (uses mocks, not real LLM) ### Impact M3 and M4 are active milestones (both overdue, both at 67-73% completion). Without E2E acceptance tests: - Decision recording, plan tree, plan explain, invariant enforcement are untested with real LLM calls - Subplan spawning, checkpoint rollback, three-way merge are untested end-to-end - Milestone completion cannot be verified against real LLM behavior The integration tests (`m3_e2e_verification.robot`, `m4_e2e_verification.robot`) use mocked LLM responses and cannot catch issues that only manifest with real LLM calls (e.g., decision tree structure, invariant enforcement in real strategize output). ### Definition of Done Create E2E acceptance test files: - `robot/e2e/m3_acceptance.robot` — M3 acceptance with real LLM keys - Decision recording during Strategize - `plan tree` and `plan explain` output verification - Invariant add/list/enforce cycle - `plan correct --mode revert` and `--mode append` - `robot/e2e/m4_acceptance.robot` — M4 acceptance with real LLM keys - Subplan spawning during Execute - Checkpoint creation and `plan rollback` - Three-way merge on subplan results - Parent plan tracking subplan statuses Each file should follow the pattern of `m6_acceptance.robot` (the most complete acceptance test). --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

🏷️ Label compliance fix applied by backlog groomer (cycle 64)

Added missing labels: State/Verified, Type/Bug, Priority/Critical

This issue was missing the State/ and Type/ labels. Labels have been applied based on issue content (UAT-identified complete absence of E2E acceptance tests for M3 and M4 milestones).


Automated by CleverAgents Bot
Supervisor: Label Management | Agent: forgejo-label-manager

🏷️ **Label compliance fix applied by backlog groomer (cycle 64)** Added missing labels: `State/Verified`, `Type/Bug`, `Priority/Critical` This issue was missing the `State/` and `Type/` labels. Labels have been applied based on issue content (UAT-identified complete absence of E2E acceptance tests for M3 and M4 milestones). --- **Automated by CleverAgents Bot** Supervisor: Label Management | Agent: forgejo-label-manager
HAL9000 added this to the v3.6.0 milestone 2026-04-09 15:36:15 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6014
No description provided.