test(e2e): update m1_acceptance.robot #1260

Merged
freemo merged 1 commit from test/e2e-update-m1-acceptance into master 2026-04-02 16:58:38 +00:00
Member

Summary

Updates the M1 acceptance E2E test (m1_acceptance.robot) to include comprehensive return code checks and expected output validation for every CLI step in the test lifecycle.

Previously, many steps in the test either did not check CLI output at all or only verified that stdout was non-empty. This PR adds explicit Should Be Equal As Integers checks for return codes and Output Should Contain assertions for expected output strings across all eight CLI commands exercised by the test.

tdd_expected_fail — sandbox_root wiring bug (#1313)

This test is tagged tdd_expected_fail because investigation revealed a bug in the plan execute pipeline: _get_plan_executor() in src/cleveragents/cli/commands/plan.py does not pass sandbox_root to PlanExecutor. This causes LLMExecuteActor._write_to_sandbox() to be silently skipped — the LLM generates file content (HELLO.md) but it is never written to disk.

As a result, the following assertions will fail until #1313 is resolved:

  • Step 9 (plan diff): Output Should Contain ${diff_result} HELLO — the diff should show HELLO.md changes
  • Step 11 (post-apply): Should Be True ${commit_lines} >= 2 — apply should produce a second commit
  • Step 11 (post-apply): File Should Exist ${repo_path}${/}HELLO.md — HELLO.md should exist after apply

The tdd_expected_fail listener inverts the test result so CI passes while the bug exists. When #1313 is fixed, the tag must be removed and the test should pass normally.

Steps updated:

  • action create — checks rc=0, action name, description, and state
  • resource add git-checkout — checks rc=0, resource name and type
  • project create — checks rc=0, project namespaced_name and linked_resources
  • plan use — checks rc=0, plan phase, action details, actor config
  • plan execute (strategize) — checks rc=0, phase transition, processing state, project links
  • plan execute (execute) — checks rc=0, same state reporting as strategize (with explanatory comment)
  • plan diff — checks rc=0, verifies diff contains HELLO.md (tdd_expected_fail)
  • plan apply — checks rc=0, applied state, project links, action details
  • Post-apply verification — checks ≥2 commits and HELLO.md exists (tdd_expected_fail)

Review fixes applied:

  • Fixed ${git_log.rc} reference before definition (was guaranteed runtime crash) → replaced with ${apply_result.rc}
  • Fixed wrong variable ${exec2_result.rc} in diff section → replaced with ${diff_result.rc}
  • Removed contradictory No changes in changeset. assertion; replaced with correct HELLO assertion
  • Removed developer debug comments (### IMPORTANT QUESTION: block)
  • Removed duplicate action_name: local/test-action assertion in plan use step
  • Added explanatory comment for identical assertions in both plan execute steps
  • Updated stale comment to match enforced rc=0 assertion
  • Fixed positional-after-named-argument ordering in plan execute calls
  • Updated assertions to match actual CLI plain format field names
  • Strengthened post-apply verification to check commit count and HELLO.md existence
  • Normalized indentation, spacing, and cell separator widths throughout

Closes #1249

## Summary Updates the M1 acceptance E2E test (`m1_acceptance.robot`) to include comprehensive return code checks and expected output validation for every CLI step in the test lifecycle. Previously, many steps in the test either did not check CLI output at all or only verified that stdout was non-empty. This PR adds explicit `Should Be Equal As Integers` checks for return codes and `Output Should Contain` assertions for expected output strings across all eight CLI commands exercised by the test. ### `tdd_expected_fail` — sandbox_root wiring bug (#1313) This test is tagged `tdd_expected_fail` because investigation revealed a **bug in the plan execute pipeline**: `_get_plan_executor()` in `src/cleveragents/cli/commands/plan.py` does not pass `sandbox_root` to `PlanExecutor`. This causes `LLMExecuteActor._write_to_sandbox()` to be silently skipped — the LLM generates file content (HELLO.md) but it is never written to disk. As a result, the following assertions will fail until #1313 is resolved: - **Step 9 (plan diff):** `Output Should Contain ${diff_result} HELLO` — the diff should show HELLO.md changes - **Step 11 (post-apply):** `Should Be True ${commit_lines} >= 2` — apply should produce a second commit - **Step 11 (post-apply):** `File Should Exist ${repo_path}${/}HELLO.md` — HELLO.md should exist after apply The `tdd_expected_fail` listener inverts the test result so CI passes while the bug exists. When #1313 is fixed, the tag must be removed and the test should pass normally. ### Steps updated: - `action create` — checks rc=0, action name, description, and state - `resource add git-checkout` — checks rc=0, resource name and type - `project create` — checks rc=0, project namespaced_name and linked_resources - `plan use` — checks rc=0, plan phase, action details, actor config - `plan execute` (strategize) — checks rc=0, phase transition, processing state, project links - `plan execute` (execute) — checks rc=0, same state reporting as strategize (with explanatory comment) - `plan diff` — checks rc=0, verifies diff contains HELLO.md (tdd_expected_fail) - `plan apply` — checks rc=0, applied state, project links, action details - Post-apply verification — checks ≥2 commits and HELLO.md exists (tdd_expected_fail) ### Review fixes applied: - Fixed `${git_log.rc}` reference before definition (was guaranteed runtime crash) → replaced with `${apply_result.rc}` - Fixed wrong variable `${exec2_result.rc}` in diff section → replaced with `${diff_result.rc}` - Removed contradictory `No changes in changeset.` assertion; replaced with correct `HELLO` assertion - Removed developer debug comments (`### IMPORTANT QUESTION:` block) - Removed duplicate `action_name: local/test-action` assertion in plan use step - Added explanatory comment for identical assertions in both plan execute steps - Updated stale comment to match enforced rc=0 assertion - Fixed positional-after-named-argument ordering in plan execute calls - Updated assertions to match actual CLI plain format field names - Strengthened post-apply verification to check commit count and HELLO.md existence - Normalized indentation, spacing, and cell separator widths throughout Closes #1249
freemo requested changes 2026-04-02 04:45:39 +00:00
Dismissed
freemo left a comment

Review: test(e2e): update m1_acceptance.robot

The intent of this PR is good — adding comprehensive return code checks and output validation to the M1 acceptance test is valuable. However, there are two critical variable-reference bugs that will cause the test to fail at runtime, plus several other issues that need attention before merging.

Critical Bugs (will cause test failure)

1. Step 10 (plan apply) — wrong variable in rc check
The line Should Be Equal As Integers ${git_log.rc} 0 references ${git_log}, which is not defined until step 11. This should be ${apply_result.rc}. Robot Framework will raise a variable-not-found error at runtime.

2. Step 9 (plan diff) — wrong variable in rc check
The line Should Be Equal As Integers ${exec2_result.rc} 0 checks the return code of the previous execute step, not the diff step. This should be ${diff_result.rc}.

Questionable Assertion

3. Step 9 — "No changes in changeset" may be incorrect
The test creates a file (HELLO.md) via LLM execution. If the LLM succeeds, there should be changes in the changeset, making Output Should Contain ${diff_result} No changes in changeset. incorrect. The author's own ### IMPORTANT QUESTION comment suggests uncertainty here. This needs to be resolved — either the assertion is wrong, or the test's understanding of the expected behavior needs to be documented.

Code Quality Issues

4. Debug comments left in code
The ### IMPORTANT QUESTION block (3 lines) should not be in production test code. Resolve the question and remove the comments.

5. Inconsistent indentation

  • Step 7 comment has 5 spaces instead of 4 (extra leading space)
  • Step 6 plan ID extraction comment has 3 spaces instead of 4

PR Metadata Issues

6. No milestone assigned — CONTRIBUTING.md requires PRs to be assigned to the same milestone as the linked issue. Issue #1249 is in milestone v3.0.0.

7. No Type/ label — CONTRIBUTING.md requires exactly one Type/ label. The issue has Type/Testing, so this PR should too.

Summary

Please fix the two critical variable bugs, resolve the "No changes in changeset" question, remove debug comments, fix indentation, and add the missing milestone and label before re-requesting review.

## Review: test(e2e): update m1_acceptance.robot The intent of this PR is good — adding comprehensive return code checks and output validation to the M1 acceptance test is valuable. However, there are **two critical variable-reference bugs** that will cause the test to fail at runtime, plus several other issues that need attention before merging. ### Critical Bugs (will cause test failure) **1. Step 10 (plan apply) — wrong variable in rc check** The line `Should Be Equal As Integers ${git_log.rc} 0` references `${git_log}`, which is not defined until step 11. This should be `${apply_result.rc}`. Robot Framework will raise a variable-not-found error at runtime. **2. Step 9 (plan diff) — wrong variable in rc check** The line `Should Be Equal As Integers ${exec2_result.rc} 0` checks the return code of the *previous* execute step, not the diff step. This should be `${diff_result.rc}`. ### Questionable Assertion **3. Step 9 — "No changes in changeset" may be incorrect** The test creates a file (HELLO.md) via LLM execution. If the LLM succeeds, there *should* be changes in the changeset, making `Output Should Contain ${diff_result} No changes in changeset.` incorrect. The author's own `### IMPORTANT QUESTION` comment suggests uncertainty here. This needs to be resolved — either the assertion is wrong, or the test's understanding of the expected behavior needs to be documented. ### Code Quality Issues **4. Debug comments left in code** The `### IMPORTANT QUESTION` block (3 lines) should not be in production test code. Resolve the question and remove the comments. **5. Inconsistent indentation** - Step 7 comment has 5 spaces instead of 4 (extra leading space) - Step 6 plan ID extraction comment has 3 spaces instead of 4 ### PR Metadata Issues **6. No milestone assigned** — CONTRIBUTING.md requires PRs to be assigned to the same milestone as the linked issue. Issue #1249 is in milestone v3.0.0. **7. No Type/ label** — CONTRIBUTING.md requires exactly one `Type/` label. The issue has `Type/Testing`, so this PR should too. ### Summary Please fix the two critical variable bugs, resolve the "No changes in changeset" question, remove debug comments, fix indentation, and add the missing milestone and label before re-requesting review.
@ -81,3 +115,4 @@
Log Execute phase rc=${exec2_result.rc}
${exec2_combined}= Set Variable ${exec2_result.stdout}\n${exec2_result.stderr}
Log Execute output: ${exec2_combined}
Owner

BUG: Wrong variable reference. This checks ${exec2_result.rc} (the previous execute step's return code) instead of ${diff_result.rc} (the diff step's return code). While it won't crash (exec2_result exists), it's validating the wrong command.

Should be:

Should Be Equal As Integers    ${diff_result.rc}    0
**BUG: Wrong variable reference.** This checks `${exec2_result.rc}` (the previous execute step's return code) instead of `${diff_result.rc}` (the diff step's return code). While it won't crash (exec2_result exists), it's validating the wrong command. Should be: ``` Should Be Equal As Integers ${diff_result.rc} 0 ```
@ -77,0 +106,4 @@
Output Should Contain ${exec1_result} Strategy Actor: openai/gpt-4o-mini
Output Should Contain ${exec1_result} Execution Actor: openai/gpt-4o-mini
Output Should Contain ${exec1_result} Definition of Done:
Output Should Contain ${exec1_result} Create a file called HELLO.md with a short greeting.
Owner

Questionable assertion + debug comments. The ### IMPORTANT QUESTION block should not be in production test code. More importantly, if the LLM successfully creates HELLO.md during execution, then No changes in changeset. would be the wrong expected output — there should be changes. Please resolve this question and either:

  1. Assert that changes ARE present (if LLM execution is expected to produce them), or
  2. Document clearly why no changes are expected at this point in the flow.

Also remove the ### IMPORTANT QUESTION debug comments.

**Questionable assertion + debug comments.** The `### IMPORTANT QUESTION` block should not be in production test code. More importantly, if the LLM successfully creates HELLO.md during execution, then `No changes in changeset.` would be the wrong expected output — there *should* be changes. Please resolve this question and either: 1. Assert that changes ARE present (if LLM execution is expected to produce them), or 2. Document clearly why no changes are expected at this point in the flow. Also remove the `### IMPORTANT QUESTION` debug comments.
@ -85,0 +128,4 @@
Output Should Contain ${exec2_result} Execution Actor: openai/gpt-4o-mini
Output Should Contain ${exec2_result} Definition of Done:
Output Should Contain ${exec2_result} Create a file called HELLO.md with a short greeting.
Owner

BUG: Wrong variable reference. This checks ${git_log.rc} but ${git_log} is not defined until step 11 (the Run Process git log call below). This will cause a Robot Framework runtime error.

Should be:

Should Be Equal As Integers    ${apply_result.rc}    0
**BUG: Wrong variable reference.** This checks `${git_log.rc}` but `${git_log}` is not defined until step 11 (the `Run Process git log` call below). This will cause a Robot Framework runtime error. Should be: ``` Should Be Equal As Integers ${apply_result.rc} 0 ```
Owner

🔒 Claimed by pr-reviewer-5. Starting independent code review.

🔒 Claimed by pr-reviewer-5. Starting independent code review.
freemo self-assigned this 2026-04-02 08:06:13 +00:00
Owner

Independent Review Confirmation (pr-reviewer-5, second pass)

I've completed an independent review of the current state of this PR (SHA d642cec8f0e50f5d91052c494e28fdc43945bbb3). The issues identified in the previous REQUEST_CHANGES review (review #3068) are still unresolved — no new commits have been pushed since that review.

Outstanding Issues Summary

Critical bugs (will cause Robot Framework runtime errors):

  1. Step 10 (plan apply): Should Be Equal As Integers ${git_log.rc} 0 references ${git_log} which is not defined until step 11. Must be ${apply_result.rc}.
  2. Step 9 (plan diff): Should Be Equal As Integers ${exec2_result.rc} 0 checks the wrong variable. Must be ${diff_result.rc}.

Questionable logic:
3. Step 9: Output Should Contain ${diff_result} No changes in changeset. — If the LLM successfully creates HELLO.md during execution, there should be changes in the changeset. This assertion appears incorrect and needs resolution.

Code quality:
4. ### IMPORTANT QUESTION debug comments (3 lines) must be removed from production test code.
5. Step 7 comment has inconsistent indentation (5 spaces instead of 4).

PR metadata (CONTRIBUTING.md requirements):
6. No milestone assigned — should be v3.0.0 (matching issue #1249).
7. No Type/ label — should be Type/Testing (matching issue #1249).

Decision

Cannot merge. The two critical variable-reference bugs will cause the test to fail at runtime. Please address all items above and push a fix commit.

## Independent Review Confirmation (pr-reviewer-5, second pass) I've completed an independent review of the current state of this PR (SHA `d642cec8f0e50f5d91052c494e28fdc43945bbb3`). The issues identified in the previous REQUEST_CHANGES review (review #3068) are **still unresolved** — no new commits have been pushed since that review. ### Outstanding Issues Summary **Critical bugs (will cause Robot Framework runtime errors):** 1. **Step 10 (plan apply):** `Should Be Equal As Integers ${git_log.rc} 0` references `${git_log}` which is not defined until step 11. Must be `${apply_result.rc}`. 2. **Step 9 (plan diff):** `Should Be Equal As Integers ${exec2_result.rc} 0` checks the wrong variable. Must be `${diff_result.rc}`. **Questionable logic:** 3. **Step 9:** `Output Should Contain ${diff_result} No changes in changeset.` — If the LLM successfully creates HELLO.md during execution, there *should* be changes in the changeset. This assertion appears incorrect and needs resolution. **Code quality:** 4. `### IMPORTANT QUESTION` debug comments (3 lines) must be removed from production test code. 5. Step 7 comment has inconsistent indentation (5 spaces instead of 4). **PR metadata (CONTRIBUTING.md requirements):** 6. No milestone assigned — should be **v3.0.0** (matching issue #1249). 7. No `Type/` label — should be **Type/Testing** (matching issue #1249). ### Decision **Cannot merge.** The two critical variable-reference bugs will cause the test to fail at runtime. Please address all items above and push a fix commit.
freemo added this to the v3.2.0 milestone 2026-04-02 08:09:32 +00:00
brent.edwards force-pushed test/e2e-update-m1-acceptance from d642cec8f0
Some checks failed
CI / lint (pull_request) Successful in 26s
CI / build (pull_request) Successful in 14s
CI / helm (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 3m56s
CI / quality (pull_request) Successful in 3m42s
CI / security (pull_request) Successful in 4m17s
CI / unit_tests (pull_request) Successful in 9m23s
CI / docker (pull_request) Successful in 1m36s
CI / coverage (pull_request) Successful in 9m32s
CI / e2e_tests (pull_request) Failing after 14m51s
CI / benchmark-publish (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 24m56s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Successful in 55m4s
to 214c74afdc
Some checks failed
CI / build (pull_request) Successful in 18s
CI / helm (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m17s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m15s
CI / unit_tests (pull_request) Successful in 9m15s
CI / docker (pull_request) Successful in 1m34s
CI / coverage (pull_request) Successful in 12m18s
CI / e2e_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-04-02 08:57:06 +00:00
Compare
brent.edwards force-pushed test/e2e-update-m1-acceptance from 214c74afdc
Some checks failed
CI / build (pull_request) Successful in 18s
CI / helm (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m17s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m15s
CI / unit_tests (pull_request) Successful in 9m15s
CI / docker (pull_request) Successful in 1m34s
CI / coverage (pull_request) Successful in 12m18s
CI / e2e_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / benchmark-publish (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to bfb3286a35
All checks were successful
CI / build (pull_request) Successful in 16s
CI / helm (pull_request) Successful in 20s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m44s
CI / typecheck (pull_request) Successful in 3m56s
CI / security (pull_request) Successful in 4m6s
CI / unit_tests (pull_request) Successful in 8m53s
CI / docker (pull_request) Successful in 1m23s
CI / coverage (pull_request) Successful in 12m40s
CI / e2e_tests (pull_request) Successful in 19m11s
CI / integration_tests (pull_request) Successful in 24m38s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 55m0s
2026-04-02 11:46:58 +00:00
Compare
Author
Member

Response to Reviews (#3068 and independent confirmation #76859)

Thank you for the thorough reviews. All issues identified have been addressed, and investigation into the "Questionable Assertion" revealed a significant upstream bug.

Critical bugs — Fixed

Both variable-reference bugs were fixed in the earlier iteration:

  1. Step 10 ${git_log.rc} → replaced with ${apply_result.rc}
  2. Step 9 ${exec2_result.rc} → replaced with ${diff_result.rc}

"Questionable Assertion" — Root cause identified: sandbox_root wiring bug

Both reviews correctly flagged the No changes in changeset. assertion as suspicious, noting that if the LLM creates HELLO.md, there should be changes. This was also the question behind the ### IMPORTANT QUESTION debug comment.

I investigated why HELLO.md is not being created despite the LLM being asked to create it. The root cause is a bug in the plan execute pipeline:

_get_plan_executor() in src/cleveragents/cli/commands/plan.py (~line 1267) does not pass sandbox_root to PlanExecutor. The full chain:

  1. PlanExecutor.__init__() receives sandbox_root=None
  2. PlanExecutor._run_execute_with_stub() passes sandbox_root=None to LLMExecuteActor.execute()
  3. LLMExecuteActor.execute() (~line 354 in llm_actors.py) checks if sandbox_root is not None and not read_only: — this evaluates to False
  4. _write_to_sandbox() is never called — LLM-generated files are silently discarded

The LLM does run and does generate content (you can see it in DEBUG logs), but the files are never written to disk because sandbox_root is None.

Additionally, plan apply only transitions plan state metadata — it never calls GitWorktreeSandbox.commit() to merge sandbox changes into the target repo. The sandbox infrastructure exists and works in isolation tests, but is never wired into the CLI execute pipeline.

Actions taken

  1. Bug ticket created: #1313fix(plan): wire sandbox_root into plan execute pipeline with full root-cause analysis, code locations, and acceptance criteria.

  2. Test updated with correct assertions: Instead of the contradictory No changes in changeset. assertion, the test now asserts what should happen:

    • Step 9: Output Should Contain ${diff_result} HELLO — diff must reference HELLO.md
    • Step 11: Should Be True ${commit_lines} >= 2 — apply must produce a real commit
    • Step 11: File Should Exist ${repo_path}${/}HELLO.md — HELLO.md must exist post-apply
  3. Tagged tdd_expected_fail: The test is tagged tdd_expected_fail tdd_issue tdd_issue_1313 so the tdd_expected_fail_listener inverts the result — the test passes CI while the bug exists. When #1313 is fixed, the tag is removed and the test passes normally.

Code quality issues — Fixed

  1. Debug comments removed
  2. Indentation normalized to 4-space throughout

PR metadata

  1. Milestone: Issue #1249 is in v3.0.0 (closed). The PR was auto-assigned to v3.2.0 (the current open milestone). Deferring to maintainer on whether this needs adjustment.
  2. Type/Testing label: Added
## Response to Reviews (#3068 and independent confirmation #76859) Thank you for the thorough reviews. All issues identified have been addressed, and investigation into the "Questionable Assertion" revealed a significant upstream bug. ### Critical bugs — Fixed Both variable-reference bugs were fixed in the earlier iteration: 1. **Step 10 `${git_log.rc}`** → replaced with `${apply_result.rc}` ✅ 2. **Step 9 `${exec2_result.rc}`** → replaced with `${diff_result.rc}` ✅ ### "Questionable Assertion" — Root cause identified: sandbox_root wiring bug Both reviews correctly flagged the `No changes in changeset.` assertion as suspicious, noting that if the LLM creates HELLO.md, there **should** be changes. This was also the question behind the `### IMPORTANT QUESTION` debug comment. I investigated why HELLO.md is not being created despite the LLM being asked to create it. The root cause is a **bug in the plan execute pipeline**: **`_get_plan_executor()`** in `src/cleveragents/cli/commands/plan.py` (~line 1267) does **not pass `sandbox_root`** to `PlanExecutor`. The full chain: 1. `PlanExecutor.__init__()` receives `sandbox_root=None` 2. `PlanExecutor._run_execute_with_stub()` passes `sandbox_root=None` to `LLMExecuteActor.execute()` 3. `LLMExecuteActor.execute()` (~line 354 in `llm_actors.py`) checks `if sandbox_root is not None and not read_only:` — this evaluates to `False` 4. `_write_to_sandbox()` is **never called** — LLM-generated files are silently discarded The LLM **does** run and **does** generate content (you can see it in DEBUG logs), but the files are never written to disk because `sandbox_root` is `None`. Additionally, `plan apply` only transitions plan state metadata — it never calls `GitWorktreeSandbox.commit()` to merge sandbox changes into the target repo. The sandbox infrastructure exists and works in isolation tests, but is never wired into the CLI execute pipeline. ### Actions taken 1. **Bug ticket created**: #1313 — `fix(plan): wire sandbox_root into plan execute pipeline` with full root-cause analysis, code locations, and acceptance criteria. 2. **Test updated with correct assertions**: Instead of the contradictory `No changes in changeset.` assertion, the test now asserts what **should** happen: - Step 9: `Output Should Contain ${diff_result} HELLO` — diff must reference HELLO.md - Step 11: `Should Be True ${commit_lines} >= 2` — apply must produce a real commit - Step 11: `File Should Exist ${repo_path}${/}HELLO.md` — HELLO.md must exist post-apply 3. **Tagged `tdd_expected_fail`**: The test is tagged `tdd_expected_fail tdd_issue tdd_issue_1313` so the `tdd_expected_fail_listener` inverts the result — the test passes CI while the bug exists. When #1313 is fixed, the tag is removed and the test passes normally. ### Code quality issues — Fixed 4. **Debug comments removed** ✅ 5. **Indentation normalized** to 4-space throughout ✅ ### PR metadata 6. **Milestone**: Issue #1249 is in v3.0.0 (closed). The PR was auto-assigned to v3.2.0 (the current open milestone). Deferring to maintainer on whether this needs adjustment. 7. **Type/Testing label**: Added ✅
Owner

Review claimed by reviewer pool instance reviewer-pool-1. Dispatching independent code review.

Review claimed by reviewer pool instance reviewer-pool-1. Dispatching independent code review.
freemo approved these changes 2026-04-02 16:58:32 +00:00
freemo left a comment

Independent Code Review — APPROVED

Reviewer: reviewer-pool-1 (second independent review)
Commit reviewed: bfb3286a35b0ea2014810327d2cb7d2ef5a33bd8

Summary

This PR updates the M1 acceptance E2E test (m1_acceptance.robot) to add comprehensive return code checks and expected output validation for every CLI step in the test lifecycle. All issues from the previous REQUEST_CHANGES review (#3068) have been properly addressed.

Review Checklist

Previous Review Issues — All Resolved

  1. ${git_log.rc} bug → Fixed: now uses ${apply_result.rc} in step 10
  2. ${exec2_result.rc} bug → Fixed: now uses ${diff_result.rc} in step 9
  3. Contradictory No changes in changeset. assertion → Replaced with correct Output Should Contain ${diff_result} HELLO assertion, properly tagged tdd_expected_fail with upstream bug #1313 filed
  4. Debug comments → Removed
  5. Indentation → Normalized throughout
  6. Type/Testing label → Added
  7. Milestone → Assigned to v3.2.0 (current open milestone; issue's v3.0.0 is closed)

Specification Alignment

  • Test exercises the complete M1 plan lifecycle (action create → resource add → project create → plan use → plan execute × 2 → plan diff → plan apply) which matches the spec's plan lifecycle design
  • Uses real CLI invocations with no mocking, consistent with CONTRIBUTING.md integration test requirements

Commit Message Format

  • test(e2e): update m1_acceptance.robot — valid Conventional Changelog format
  • ISSUES CLOSED: #1249 footer present
  • Single, atomic commit representing one logical change

Test Quality

  • Every CLI step now validates rc=0 and checks meaningful output fields
  • --format plain added consistently to all commands for reliable output parsing
  • tdd_expected_fail tag properly documents the known sandbox_root wiring bug (#1313) — this is a correct TDD pattern for capturing known failures while keeping CI green
  • The tdd_expected_fail_listener inversion mechanism is well-documented in the test's Documentation section
  • Assertions check structural output fields (phase, processing_state, action_name, etc.) rather than exact string matching, providing good coverage without brittleness

Correctness

  • Variable references are all correct (verified each ${*_result.rc} matches its corresponding Run CleverAgents Command assignment)
  • Phase transitions are logically consistent: strategize → execute (step 7 output shows phase: execute), execute completes (step 8 shows processing_state: complete), apply transitions to state: applied
  • The Output Should Contain keyword (from common_e2e.resource) is case-insensitive and checks both stdout and stderr — appropriate for CLI output validation
  • Post-apply verification (commit count ≥ 2, HELLO.md existence) correctly captures the expected end state

Code Quality

  • Clear numbered step structure with descriptive section comments
  • Explanatory comment in step 8 noting why assertions match step 7 (both report current plan state)
  • Comments on tdd_expected_fail assertions explain why they will fail and reference the upstream issue
  • Extract Plan Id keyword is well-documented with ULID pattern explanation

PR Metadata

  • Title matches commit message
  • Body has detailed summary with Closes #1249
  • Type/Testing label present
  • Milestone v3.2.0 assigned

Minor Observations (non-blocking)

  • Step 6 has both expected_rc=${0} (keyword argument) and a separate Should Be Equal As Integers check — slightly redundant but provides explicit documentation of the expectation
  • Step 10 checks - {"project_name": "local/test-project"} which is a specific plain-format representation — if the output format changes this would need updating, but that's expected for E2E tests

Decision: APPROVED

All previous review issues are resolved. The code is clean, well-structured, and the tdd_expected_fail approach is a proper TDD pattern. Ready to merge.

## Independent Code Review — APPROVED **Reviewer**: reviewer-pool-1 (second independent review) **Commit reviewed**: `bfb3286a35b0ea2014810327d2cb7d2ef5a33bd8` ### Summary This PR updates the M1 acceptance E2E test (`m1_acceptance.robot`) to add comprehensive return code checks and expected output validation for every CLI step in the test lifecycle. All issues from the previous REQUEST_CHANGES review (#3068) have been properly addressed. ### Review Checklist #### ✅ Previous Review Issues — All Resolved 1. **`${git_log.rc}` bug** → Fixed: now uses `${apply_result.rc}` in step 10 2. **`${exec2_result.rc}` bug** → Fixed: now uses `${diff_result.rc}` in step 9 3. **Contradictory `No changes in changeset.` assertion** → Replaced with correct `Output Should Contain ${diff_result} HELLO` assertion, properly tagged `tdd_expected_fail` with upstream bug #1313 filed 4. **Debug comments** → Removed 5. **Indentation** → Normalized throughout 6. **Type/Testing label** → Added ✅ 7. **Milestone** → Assigned to v3.2.0 (current open milestone; issue's v3.0.0 is closed) ✅ #### ✅ Specification Alignment - Test exercises the complete M1 plan lifecycle (action create → resource add → project create → plan use → plan execute × 2 → plan diff → plan apply) which matches the spec's plan lifecycle design - Uses real CLI invocations with no mocking, consistent with CONTRIBUTING.md integration test requirements #### ✅ Commit Message Format - `test(e2e): update m1_acceptance.robot` — valid Conventional Changelog format - `ISSUES CLOSED: #1249` footer present - Single, atomic commit representing one logical change #### ✅ Test Quality - Every CLI step now validates `rc=0` and checks meaningful output fields - `--format plain` added consistently to all commands for reliable output parsing - `tdd_expected_fail` tag properly documents the known sandbox_root wiring bug (#1313) — this is a correct TDD pattern for capturing known failures while keeping CI green - The `tdd_expected_fail_listener` inversion mechanism is well-documented in the test's Documentation section - Assertions check structural output fields (phase, processing_state, action_name, etc.) rather than exact string matching, providing good coverage without brittleness #### ✅ Correctness - Variable references are all correct (verified each `${*_result.rc}` matches its corresponding `Run CleverAgents Command` assignment) - Phase transitions are logically consistent: strategize → execute (step 7 output shows `phase: execute`), execute completes (step 8 shows `processing_state: complete`), apply transitions to `state: applied` - The `Output Should Contain` keyword (from `common_e2e.resource`) is case-insensitive and checks both stdout and stderr — appropriate for CLI output validation - Post-apply verification (commit count ≥ 2, HELLO.md existence) correctly captures the expected end state #### ✅ Code Quality - Clear numbered step structure with descriptive section comments - Explanatory comment in step 8 noting why assertions match step 7 (both report current plan state) - Comments on tdd_expected_fail assertions explain *why* they will fail and reference the upstream issue - `Extract Plan Id` keyword is well-documented with ULID pattern explanation #### ✅ PR Metadata - Title matches commit message ✅ - Body has detailed summary with `Closes #1249` ✅ - `Type/Testing` label present ✅ - Milestone v3.2.0 assigned ✅ ### Minor Observations (non-blocking) - Step 6 has both `expected_rc=${0}` (keyword argument) and a separate `Should Be Equal As Integers` check — slightly redundant but provides explicit documentation of the expectation - Step 10 checks `- {"project_name": "local/test-project"}` which is a specific plain-format representation — if the output format changes this would need updating, but that's expected for E2E tests ### Decision: **APPROVED** All previous review issues are resolved. The code is clean, well-structured, and the tdd_expected_fail approach is a proper TDD pattern. Ready to merge.
freemo merged commit 90215d2a64 into master 2026-04-02 16:58:38 +00:00
freemo deleted branch test/e2e-update-m1-acceptance 2026-04-02 16:58:40 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1260
No description provided.