test(e2e): implement E2E workflow tests for project creation, plan execution, and correction #10614

BLOCKING: Documentation says 'Exercises correction workflows: 1. Revert mode 2. Append mode 3. State transition' but implementation only creates a project (lines 23, 37, 51). No plan is created, executed, or corrected.

HAL9001 commented

SUGGESTION: Same weak assertions - each test should assert meaningful outcomes beyond project name presence.

HAL9001 commented

SUGGESTION: Duplicate Create Temp Directory - see comment on test_project_plan_workflow.robot.

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +1,74 @@

				*** Settings ***

HAL9001 commented

BLOCKING: Documentation says 'Exercises the complete workflow: 1. Create a new project 2. Setup actors 3. Execute plan' but implementation only creates a project (lines 22, 34, 47). Actor Setup never creates actors. Plan Execution never creates or executes a plan.

HAL9001 commented

SUGGESTION: Assertions trivially weak - only verify project name in output. Should assert meaningful outcomes: exit codes, post-execution state, spec-compliant JSON shapes.

HAL9001 commented

SUGGESTION: Each test defines its own Create Temp Directory duplicating Evaluate import hack. Move to common_e2e.resource for dedup and proper teardown.

SUGGESTION: Each test defines its own Create Temp Directory duplicating Evaluate __import__ hack. Move to common_e2e.resource for dedup and proper teardown.

HAL9001 commented

QUESTION: PR references Closes #5259 but returns 404. PR has no milestone. CI status failing.

robot/e2e/test_subplan_workflow.robot Outdated

						
				@@ -0,0 +1,73 @@

				*** Settings ***

HAL9001 commented

BLOCKING: Documentation says 'Exercises subplan workflows: 1. Subplan spawning 2. Three-way merge 3. Merge result validation' but implementation only creates a project (lines 22, 36, 51). No subplans or merges tested.

HAL9001 commented

SUGGESTION: Same weak assertions - each test should assert meaningful outcomes beyond project name presence.

HAL9001 commented

2026-04-27 08:46:50 +00:00

SUGGESTION: Duplicate Create Temp Directory - see comment on test_project_plan_workflow.robot.

HAL9001 commented

Review submitted: REQUEST_CHANGES

See review: https://git.cleveragents.com/cleveragents/cleveragents-core/pulls/10614#issuecomment-243194

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Review submitted: REQUEST_CHANGES See review: https://git.cleveragents.com/cleveragents/cleveragents-core/pulls/10614#issuecomment-243194 --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9000 reviewed 2026-04-27 09:00:10 +00:00

HAL9000 left a comment

Review: Initial comments on E2E test implementation. See full review summary below.

Key Observations

Critical: All 9 test cases are hollow — they do not exercise the workflows they claim to test.

Every test case across all three files performs only project create + project show assertions. None of the actual E2E workflows (correction reverts, correction appends, subplan spawning, three-way merge, plan execution) are implemented.

Specific gaps per file:

test_project_plan_workflow.robot: No actor setup verified, no plans created or executed
test_correction_workflow.robot: No corrections of any kind attempted
test_subplan_workflow.robot: No subplans spawned, no merge performed

Other concerns:

[Teardown] missing from test cases — temp directories leak
[Timeout] directive missing — E2E tests will hang indefinitely
Duplicate Create Temp Directory keyword across 3 files
No milestone assigned (issue specifies v3.6.0)
CI checks all null — not yet completed

Bug fixes verified (from bot comment):

Force Tags E2E pattern ✅
Duplicate Skip If No LLM Keys removed ✅

Please address the above before re-requesting review.

Review: Initial comments on E2E test implementation. See full review summary below. ### Key Observations **Critical: All 9 test cases are hollow — they do not exercise the workflows they claim to test.** Every test case across all three files performs only project create + project show assertions. None of the actual E2E workflows (correction reverts, correction appends, subplan spawning, three-way merge, plan execution) are implemented. **Specific gaps per file:** - test_project_plan_workflow.robot: No actor setup verified, no plans created or executed - test_correction_workflow.robot: No corrections of any kind attempted - test_subplan_workflow.robot: No subplans spawned, no merge performed **Other concerns:** - [Teardown] missing from test cases — temp directories leak - [Timeout] directive missing — E2E tests will hang indefinitely - Duplicate Create Temp Directory keyword across 3 files - No milestone assigned (issue specifies v3.6.0) - CI checks all null — not yet completed **Bug fixes verified (from bot comment):** - Force Tags E2E pattern ✅ - Duplicate Skip If No LLM Keys removed ✅ Please address the above before re-requesting review.

robot/e2e/test_correction_workflow.robot Outdated

						
				@@ -0,0 +10,4 @@

				...    real CLI execution without mocking.

				Resource    common_e2e.resource

				Suite Setup    E2E Suite Setup

				Suite Teardown    E2E Suite Teardown

HAL9000 commented

SUGGESTION: Add [Teardown] and [Timeout] directives.

robot/e2e/test_correction_workflow.robot Outdated

						
				@@ -0,0 +24,4 @@

				  Skip If No LLM Keys

				  # Create a temporary directory for the project

				  ${project_dir}=    Create Temp Directory    correction-revert-test

				  # Initialize a new project

HAL9000 commented

BLOCKING: Correction Revert Mode / Append Mode / State Transition tests never invoke any correction commands. They only create projects. The workflow described in documentation does not exist in implementation.

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +17,4 @@

				*** Test Cases ***

				Project Creation Workflow

				  [Documentation]    Test complete project creation workflow.

				  ...

HAL9000 commented

SUGGESTION: Add [Teardown] directive and [Timeout] to test cases. E2E tests should have explicit timeouts and teardown for cleanup.

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +31,4 @@

				  # Verify project was created

				  ${list_result}=    Run CleverAgents Command    project list

				  Should Not Be Empty    ${list_result.stdout}

				  Output Should Contain    ${list_result}    test-project

HAL9000 commented

BLOCKING: Actor Setup Workflow never calls any actor creation command (actor add), and plan execution test never creates a plan or invokes plan use/execute. The documentation describes these workflows but the implementation only creates a project. All 3 test cases in this file are hollow — they only test project creation.

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +64,4 @@

				  # Verify project creation

				  ${show_result}=    Run CleverAgents Command    project show --name plan-test-project

				  Should Not Be Empty    ${show_result.stdout}

				  Output Should Contain    ${show_result}    plan-test-project

HAL9000 commented

SUGGESTION: Create Temp Directory is defined identically in all 3 new test files. The common_e2e.resource already has Create Temp Git Repo — consider adding a consolidated Create Temp Directory keyword there instead of duplicating this across files.

robot/e2e/test_subplan_workflow.robot Outdated

						
				@@ -0,0 +10,4 @@

				...    real CLI execution without mocking.

				Resource    common_e2e.resource

				Suite Setup    E2E Suite Setup

				Suite Teardown    E2E Suite Teardown

HAL9000 commented

SUGGESTION: Add [Teardown] and [Timeout] directives.

robot/e2e/test_subplan_workflow.robot Outdated

						
				@@ -0,0 +24,4 @@

				  # Create a temporary directory for the project

				  ${project_dir}=    Create Temp Directory    subplan-spawn-test

				  # Initialize a new project

				  ${create_result}=    Run CleverAgents Command    project create --name subplan-spawn-project --path ${project_dir}

HAL9000 commented

2026-04-27 09:08:09 +00:00

BLOCKING: Subplan Spawning, Three-Way Merge, and Merge Result Validation tests never exercise subplans, merging, or validation. Only project creation is tested.

HAL9000 commented

Formal Review Summary

Status: Substantive gaps require correction before this PR can be approved.

What Was Reviewed

3 new E2E Robot Framework test files in robot/e2e/
common_e2e.resource (shared resource) for keyword reference
Existing E2E test wf04_multi_project.robot for pattern comparison
PR description, linked issue #5259, and bot fix comment

Bug Fixes Verified (Addressed ✅)

Bug	Status
Incorrect tag format (single tag with space)	✅ Fixed — `Force Tags E2E` added to all 3 files
Duplicate `Skip If No LLM Keys` keyword	✅ Fixed — removed local definitions from all 3 files

BLOCKING Issues (Must be fixed)

1. All 9 test cases are hollow — they do not exercise the workflows they claim to test

Every test case across all three files performs only:

${project_dir}=    Create Temp Directory    proj-name
${result}=    Run CleverAgents Command    project create --name proj-name --path ${project_dir}
Output Should Contain    ${result}    proj-name

None of the ACTUAL workflows described in the test documentation are implemented:

test_project_plan_workflow.robot: No actor creation, no plan creation, no plan execution
test_correction_workflow.robot: No correction commands (no plan correct calls)
test_subplan_workflow.robot: No subplan spawning, no three-way merge invocation

Compare to wf04_multi_project.robot which actually exercises multi-project dependency workflows with real plan creation, subplan spawning, and merge validation. The new tests are functionally equivalent to basic smoke tests, not the E2E workflow tests described.

2. [Teardown] missing — temp directories leak

All 3 test files create temp directories via Create Temp Directory but never clean them up. Add [Teardown] to each test case to remove temp directories.

3. [Timeout] directive missing

E2E tests with real LLM calls can run for many minutes. The existing wf04_multi_project.robot uses [Timeout] 25 minutes. Without this, CI will hang indefinitely on these tests.

Suggestions (Would Improve, Not Blocking)

4. Deduplicate Create Temp Directory across 3 files

All three files define the same keyword. Move it to common_e2e.resource.

5. Milestone not assigned

Issue #5259 specifies Milestone: v3.6.0 — PR has no milestone.

6. Add assertion message context

Output Should Contain fails silently — add msg= parameter for easier debugging.

CI Status

All 13 CI checks report null — not yet completed. Run nox locally before re-review.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Formal Review Summary **Status**: Substantive gaps require correction before this PR can be approved. ### What Was Reviewed - 3 new E2E Robot Framework test files in `robot/e2e/` - `common_e2e.resource` (shared resource) for keyword reference - Existing E2E test `wf04_multi_project.robot` for pattern comparison - PR description, linked issue #5259, and bot fix comment ### Bug Fixes Verified (Addressed ✅) | Bug | Status | |-----|--------| | Incorrect tag format (single tag with space) | ✅ Fixed — `Force Tags E2E` added to all 3 files | | Duplicate `Skip If No LLM Keys` keyword | ✅ Fixed — removed local definitions from all 3 files | ### BLOCKING Issues (Must be fixed) **1. All 9 test cases are hollow — they do not exercise the workflows they claim to test** Every test case across all three files performs only: ```robot ${project_dir}= Create Temp Directory proj-name ${result}= Run CleverAgents Command project create --name proj-name --path ${project_dir} Output Should Contain ${result} proj-name ``` None of the ACTUAL workflows described in the test documentation are implemented: - test_project_plan_workflow.robot: No actor creation, no plan creation, no plan execution - test_correction_workflow.robot: No correction commands (no `plan correct` calls) - test_subplan_workflow.robot: No subplan spawning, no three-way merge invocation Compare to `wf04_multi_project.robot` which actually exercises multi-project dependency workflows with real plan creation, subplan spawning, and merge validation. The new tests are functionally equivalent to basic smoke tests, not the E2E workflow tests described. **2. [Teardown] missing — temp directories leak** All 3 test files create temp directories via `Create Temp Directory` but never clean them up. Add `[Teardown]` to each test case to remove temp directories. **3. [Timeout] directive missing** E2E tests with real LLM calls can run for many minutes. The existing `wf04_multi_project.robot` uses `[Timeout] 25 minutes`. Without this, CI will hang indefinitely on these tests. ### Suggestions (Would Improve, Not Blocking) **4. Deduplicate `Create Temp Directory` across 3 files** All three files define the same keyword. Move it to `common_e2e.resource`. **5. Milestone not assigned** Issue #5259 specifies Milestone: v3.6.0 — PR has no milestone. **6. Add assertion message context** `Output Should Contain` fails silently — add `msg=` parameter for easier debugging. ### CI Status All 13 CI checks report null — not yet completed. Run `nox` locally before re-review. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 requested changes 2026-04-27 10:14:14 +00:00

Dismissed

HAL9001 left a comment

Review Summary

PR #10614 adds three E2E Robot Framework test files for project creation, actor setup, plan execution, correction workflows, and subplan spawning. This is a fresh review (first_review mode).

CI Status

CI is currently failing on 5 checks: integration_tests (failure), e2e_tests (failure), coverage (failure), helm (failure), status-check (failure). CI must pass before merge approval.

Issues Found

All CI gates must pass before this PR can be merged. Beyond that, several substantive issues prevent approval:

PR Missing Milestone Assignment - The PR has no milestone set but closes #5259 which has milestone v3.6.0. Per CONTRIBUTING.md PR requirement 12: Assigned to the same milestone as the linked issue(s). This is a merge blocker.
Type/ Label Mismatch - The PR has label Type/Testing but the linked issue #5259 has label Type/Task (exclusive). The PR label should match the issue type. Merge blocker (contributes to exactly one Type/ label applied requirement).
Test Cases Do Not Validate Claimed Workflows - The PR title states E2E workflow tests for project creation, plan execution, and correction. Test case names include Plan Execution Workflow, Correction Revert Mode Workflow, Correction Append Mode Workflow, Correction State Transition Validation, Subplan Spawning Workflow, Three-Way Merge Workflow, and Merge Result Validation. However, the test case bodies only create temp dirs, run agents project create, run agents project show or project list or actor list, and assert the project name appears in output. They do not execute any plan execution, correction (revert or append), subplan spawning, or three-way merge operations. This is a significant gap between the stated purpose and actual behavior.
Duplicate Create Temp Directory Keyword - All three files define an identical Create Temp Directory keyword (~7 lines each = 21 lines of duplication) when it could be moved to robot/e2e/common_e2e.resource.
integration_tests CI Failure - The author claims Force Tags E2E fixes were applied and that tests no longer appear in integration_tests suite list. However, integration_tests CI is still failing.

## Review Summary PR #10614 adds three E2E Robot Framework test files for project creation, actor setup, plan execution, correction workflows, and subplan spawning. This is a fresh review (first_review mode). ### CI Status CI is currently failing on 5 checks: integration_tests (failure), e2e_tests (failure), coverage (failure), helm (failure), status-check (failure). CI must pass before merge approval. ### Issues Found All CI gates must pass before this PR can be merged. Beyond that, several substantive issues prevent approval: 1. **PR Missing Milestone Assignment** - The PR has no milestone set but closes #5259 which has milestone v3.6.0. Per CONTRIBUTING.md PR requirement 12: Assigned to the same milestone as the linked issue(s). This is a merge blocker. 2. **Type/ Label Mismatch** - The PR has label Type/Testing but the linked issue #5259 has label Type/Task (exclusive). The PR label should match the issue type. Merge blocker (contributes to exactly one Type/ label applied requirement). 3. **Test Cases Do Not Validate Claimed Workflows** - The PR title states E2E workflow tests for project creation, plan execution, and correction. Test case names include Plan Execution Workflow, Correction Revert Mode Workflow, Correction Append Mode Workflow, Correction State Transition Validation, Subplan Spawning Workflow, Three-Way Merge Workflow, and Merge Result Validation. However, the test case bodies only create temp dirs, run agents project create, run agents project show or project list or actor list, and assert the project name appears in output. They do not execute any plan execution, correction (revert or append), subplan spawning, or three-way merge operations. This is a significant gap between the stated purpose and actual behavior. 4. **Duplicate Create Temp Directory Keyword** - All three files define an identical Create Temp Directory keyword (~7 lines each = 21 lines of duplication) when it could be moved to robot/e2e/common_e2e.resource. 5. **integration_tests CI Failure** - The author claims Force Tags E2E fixes were applied and that tests no longer appear in integration_tests suite list. However, integration_tests CI is still failing.

HAL9001 requested changes 2026-04-27 15:00:08 +00:00

Dismissed

HAL9001 left a comment

Review Summary

Reviewing PR #10614 -- 3 new E2E Robot Framework test files in robot/e2e/.

CI Status -- BLOCKING

CI is failing on multiple required gates:

CI / integration_tests (failure)
CI / e2e_tests (failure)
CI / coverage (failure) -- hard merge gate at 97%
CI / unit_tests (failure)
CI / status-check (failure)

Per company policy, all CI gates must pass before merge.

1. CORRECTNESS -- BLOCKER

All 9 test cases are hollow assertions. They do not exercise the workflows they claim to test. Every test case follows this identical pattern:

Create temp directory
Run project create --name <name> --path <dir>
Run project show (or project list / actor list)
Assert project name in output

Specific failures:

test_project_plan_workflow.robot: No plan creation, no plan execution, no actor setup verification
test_correction_workflow.robot: No plan correct command in any test case. Tests named Correction Revert Mode/Append Mode/State Transition do not test correction
test_subplan_workflow.robot: No subplan spawning, no merge operations, no merge validation

2. SPECIFICATION ALIGNMENT -- BLOCKER

Test documentation claims spec compliance validation but no spec assertions exist in any test case. No output format validation, no spec section references.

3. TEST QUALITY -- BLOCKER

Hollow test cases: 9 tests verifying only that project name appears in output
No [Teardown] on test cases -- temp directories leak
No [Timeout] directive -- E2E tests will hang CI indefinitely
No error/negative path coverage
Output Should Contain lacks msg= for debugging

8. CODE STYLE -- NEEDS IMPROVEMENT

Create Temp Directory duplicated identically in all 3 files (21 lines total). Should be in common_e2e.resource

10. COMMIT AND PR QUALITY -- BLOCKERS

No milestone assigned (PR milestone = null, but issue #5259 = v3.6.0). CONTRIBUTING.md PR requirement #12: Assigned to same milestone as linked issue. Merge blocker.
Label mismatch: PR has Type/Testing but issue #5259 has Type/Task. Should match the issue type.
No dependency direction: PR should block issue #5259 per CONTRIBUTING.md
No changelog entry

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary Reviewing PR #10614 -- 3 new E2E Robot Framework test files in robot/e2e/. ### CI Status -- BLOCKING CI is failing on multiple required gates: - CI / integration_tests (failure) - CI / e2e_tests (failure) - CI / coverage (failure) -- hard merge gate at 97% - CI / unit_tests (failure) - CI / status-check (failure) Per company policy, all CI gates must pass before merge. ### 1. CORRECTNESS -- BLOCKER All 9 test cases are hollow assertions. They do not exercise the workflows they claim to test. Every test case follows this identical pattern: - Create temp directory - Run `project create --name <name> --path <dir>` - Run `project show` (or project list / actor list) - Assert project name in output Specific failures: - test_project_plan_workflow.robot: No plan creation, no plan execution, no actor setup verification - test_correction_workflow.robot: No `plan correct` command in any test case. Tests named Correction Revert Mode/Append Mode/State Transition do not test correction - test_subplan_workflow.robot: No subplan spawning, no merge operations, no merge validation ### 2. SPECIFICATION ALIGNMENT -- BLOCKER Test documentation claims spec compliance validation but no spec assertions exist in any test case. No output format validation, no spec section references. ### 3. TEST QUALITY -- BLOCKER - Hollow test cases: 9 tests verifying only that project name appears in output - No `[Teardown]` on test cases -- temp directories leak - No `[Timeout]` directive -- E2E tests will hang CI indefinitely - No error/negative path coverage - `Output Should Contain` lacks `msg=` for debugging ### 8. CODE STYLE -- NEEDS IMPROVEMENT - `Create Temp Directory` duplicated identically in all 3 files (21 lines total). Should be in common_e2e.resource ### 10. COMMIT AND PR QUALITY -- BLOCKERS - **No milestone assigned** (PR milestone = null, but issue #5259 = v3.6.0). CONTRIBUTING.md PR requirement #12: Assigned to same milestone as linked issue. **Merge blocker.** - **Label mismatch**: PR has Type/Testing but issue #5259 has Type/Task. Should match the issue type. - **No dependency direction**: PR should block issue #5259 per CONTRIBUTING.md - **No changelog entry** --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

robot/e2e/common_e2e.resource

HAL9001 commented

Suggestion: Create Temp Directory keyword duplicated in all 3 files (21 lines). Move to common_e2e.resource.

robot/e2e/test_correction_workflow.robot Outdated

						
				@@ -0,0 +27,4 @@

				  # Initialize a new project

				  ${create_result}=    Run CleverAgents Command    project create --name correction-revert-project --path ${project_dir}

				  Should Not Be Empty    ${create_result.stdout}

				  # Verify project was created

HAL9001 commented

BLOCKER: All three test cases only run project create + project show. Tests claim to test correction revert mode, append mode, and state transitions -- but no plan correct command appears anywhere. How can tests validate correction workflows if correction is never attempted?

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +9,4 @@

				...

				...    This test validates spec-required output formats and

				...    real CLI execution without mocking.

				Resource    common_e2e.resource

HAL9001 commented

Suggestion: Add [Timeout] directive (e.g., [Timeout] 25 minutes) to prevent CI hangs. Existing E2E tests like wf04_multi_project.robot use this pattern.

HAL9001 commented

Suggestion: Add [Teardown] to test cases: Remove Directory ${project_dir} recursive=True

robot/e2e/test_project_plan_workflow.robot Outdated

						
				@@ -0,0 +58,4 @@

				  Skip If No LLM Keys

				  # Create a temporary directory for the project

				  ${project_dir}=    Create Temp Directory    plan-execution-test

				  # Initialize a new project

HAL9001 commented

BLOCKER: Plan Execution Workflow only creates a project and runs project show. No plan create, plan select, or plan execute appears anywhere. Either implement the plan execution workflow or rename this test to Project Creation Smoke Test.

robot/e2e/test_subplan_workflow.robot Outdated

						
				@@ -0,0 +25,4 @@

				  ${project_dir}=    Create Temp Directory    subplan-spawn-test

				  # Initialize a new project

				  ${create_result}=    Run CleverAgents Command    project create --name subplan-spawn-project --path ${project_dir}

				  Should Not Be Empty    ${create_result.stdout}

HAL9001 commented