UAT: No E2E error workflow tests — error cases not tested end-to-end across any workflow test file #6031

Open
opened 2026-04-09 13:54:04 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: E2E Workflow Specification Tests — Error Workflow Coverage
Milestone: v3.6.0 (M7) — E2E workflow specification tests
Severity: Priority/Backlog — test quality gap, not blocking runtime

What Was Tested

Code-level analysis of all files in robot/e2e/ for error workflow test coverage.

Expected Behavior (from spec)

The specification describes error handling and recovery workflows throughout the 18 examples. Key error scenarios that should be tested end-to-end:

  1. Plan execution failure: What happens when plan execute fails (LLM error, tool error, validation failure)?
  2. Plan rollback: Specification Example 15 (Disaster Recovery) covers plan rollback after a failed apply
  3. Validation failure: What happens when required validations fail during apply?
  4. Invariant violation: What happens when the LLM violates an invariant during strategize?
  5. Budget exceeded: What happens when max_cost_per_plan is exceeded?
  6. Checkpoint rollback: What happens when plan rollback is invoked after a checkpoint?
  7. Subplan failure: What happens when a child subplan fails in a multi-project workflow?
  8. Invalid action arguments: What happens when plan use is called with missing required args?

Actual Behavior

Reviewing all E2E test files in robot/e2e/:

  • smoke_test.robot: Only tests --version and --help (no error cases)
  • m1_acceptance.robot: Empty test body
  • m2_acceptance.robot: Empty test body
  • m5_acceptance.robot: No error workflow tests
  • m6_acceptance.robot: Tests guard enforcement (denylist, budget) but only verifies the profile is stored, not that execution is actually blocked
  • wf04_multi_project.robot: Empty test body
  • wf05_db_migration.robot: Empty test body
  • wf07_cicd.robot: No error workflow tests (only happy path)
  • wf12_hierarchical.robot: TDD placeholder
  • wf14_server_mode.robot: No error workflow tests
  • wf16_devcontainer.robot: Empty test body
  • wf17_explicit_container.robot: All Skip
  • wf18_container_clone.robot: Empty test body
  • e2e_session_create_persist.robot: Tests session CRUD (no error cases)
  • tdd_acms_behavioral_validation.robot: Tests ACMS behavioral bugs (not error workflows)

No E2E test file tests any error workflow end-to-end.

The wf07_cicd.robot has a Poll Plan Until Terminal keyword that handles errored state, but the test only asserts applied state — it doesn't test what happens when the plan errors.

Code Location

  • All files in robot/e2e/ — none contain error workflow tests

Impact

Error handling is a critical part of the CleverAgents user experience. Users need to know:

  • How to recover from a failed plan execution
  • How to rollback a bad apply
  • What error messages look like when invariants are violated
  • How budget enforcement manifests in practice

Without E2E error workflow tests, regressions in error handling paths will not be caught. The specification's Example 15 (Disaster Recovery) is specifically about error recovery and has no E2E test.

Definition of Done

Add error workflow tests to the E2E suite. At minimum:

  1. wf15_disaster_recovery.robot (new file):

    • Run a plan that produces changes
    • Apply the plan
    • Verify apply succeeded
    • Simulate a "bad" apply by corrupting the applied state
    • Run plan rollback to revert
    • Verify the rollback succeeded
  2. Error case in wf07_cicd.robot:

    • Add a test case where the CI plan fails (e.g., validation failure)
    • Verify the error message is meaningful
    • Verify the plan state is errored
  3. Invariant violation test in m6_acceptance.robot:

    • Create an invariant that the LLM will violate
    • Run plan execute
    • Verify the invariant violation is recorded in the decision tree
    • Verify the plan is constrained (not applied)

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: E2E Workflow Specification Tests — Error Workflow Coverage **Milestone**: v3.6.0 (M7) — E2E workflow specification tests **Severity**: Priority/Backlog — test quality gap, not blocking runtime ### What Was Tested Code-level analysis of all files in `robot/e2e/` for error workflow test coverage. ### Expected Behavior (from spec) The specification describes error handling and recovery workflows throughout the 18 examples. Key error scenarios that should be tested end-to-end: 1. **Plan execution failure**: What happens when `plan execute` fails (LLM error, tool error, validation failure)? 2. **Plan rollback**: Specification Example 15 (Disaster Recovery) covers `plan rollback` after a failed apply 3. **Validation failure**: What happens when required validations fail during apply? 4. **Invariant violation**: What happens when the LLM violates an invariant during strategize? 5. **Budget exceeded**: What happens when `max_cost_per_plan` is exceeded? 6. **Checkpoint rollback**: What happens when `plan rollback` is invoked after a checkpoint? 7. **Subplan failure**: What happens when a child subplan fails in a multi-project workflow? 8. **Invalid action arguments**: What happens when `plan use` is called with missing required args? ### Actual Behavior Reviewing all E2E test files in `robot/e2e/`: - `smoke_test.robot`: Only tests `--version` and `--help` (no error cases) - `m1_acceptance.robot`: Empty test body - `m2_acceptance.robot`: Empty test body - `m5_acceptance.robot`: No error workflow tests - `m6_acceptance.robot`: Tests guard enforcement (denylist, budget) but only verifies the profile is stored, not that execution is actually blocked - `wf04_multi_project.robot`: Empty test body - `wf05_db_migration.robot`: Empty test body - `wf07_cicd.robot`: No error workflow tests (only happy path) - `wf12_hierarchical.robot`: TDD placeholder - `wf14_server_mode.robot`: No error workflow tests - `wf16_devcontainer.robot`: Empty test body - `wf17_explicit_container.robot`: All Skip - `wf18_container_clone.robot`: Empty test body - `e2e_session_create_persist.robot`: Tests session CRUD (no error cases) - `tdd_acms_behavioral_validation.robot`: Tests ACMS behavioral bugs (not error workflows) **No E2E test file tests any error workflow end-to-end.** The `wf07_cicd.robot` has a `Poll Plan Until Terminal` keyword that handles `errored` state, but the test only asserts `applied` state — it doesn't test what happens when the plan errors. ### Code Location - All files in `robot/e2e/` — none contain error workflow tests ### Impact Error handling is a critical part of the CleverAgents user experience. Users need to know: - How to recover from a failed plan execution - How to rollback a bad apply - What error messages look like when invariants are violated - How budget enforcement manifests in practice Without E2E error workflow tests, regressions in error handling paths will not be caught. The specification's Example 15 (Disaster Recovery) is specifically about error recovery and has no E2E test. ### Definition of Done Add error workflow tests to the E2E suite. At minimum: 1. **`wf15_disaster_recovery.robot`** (new file): - Run a plan that produces changes - Apply the plan - Verify apply succeeded - Simulate a "bad" apply by corrupting the applied state - Run `plan rollback` to revert - Verify the rollback succeeded 2. **Error case in `wf07_cicd.robot`**: - Add a test case where the CI plan fails (e.g., validation failure) - Verify the error message is meaningful - Verify the plan state is `errored` 3. **Invariant violation test in `m6_acceptance.robot`**: - Create an invariant that the LLM will violate - Run `plan execute` - Verify the invariant violation is recorded in the decision tree - Verify the plan is constrained (not applied) --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.2.0 milestone 2026-04-09 14:31:51 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
HAL9000 modified the milestone from v3.2.0 to v3.6.0 2026-04-09 15:29:47 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#6031
No description provided.