test(e2e): E2E acceptance criteria for M1 (v3.0.0) — minimal plan execution flow #789

Closed
freemo wants to merge 1 commit from test/e2e-m1-acceptance into master
Owner

Summary

E2E acceptance test for M1 (v3.0.0) — minimal plan execution flow. Tests the complete plan lifecycle: init, resource registration, project creation, action creation from YAML, plan use, two-phase execute (strategize + implement), diff review, and lifecycle-apply.

Closes #741

ISSUES CLOSED: #741

Manual Verification

To manually verify the CLI commands exercised by this E2E test, run the following in a fresh temporary directory:

Prerequisites

  • OPENAI_API_KEY or GEMINI_API_KEY environment variable set
  • Git installed

Commands

# 1. Create a sample git repo as the target resource
REPO=$(mktemp -d)
cd "$REPO" && git init && git checkout -b main
echo "def greet(): return 'hello'" > main.py
git add . && git commit -m "initial commit"

# 2. Create a working directory and initialize CleverAgents
WORKDIR=$(mktemp -d) && cd "$WORKDIR"
python -m cleveragents init --yes --force

# 3. Register the git repo as a resource
python -m cleveragents resource add git-checkout "$REPO"

# 4. Create a project
python -m cleveragents project create my-project

# 5. Create an action from YAML (write a minimal action.yaml first)
cat > action.yaml << 'EOF'
name: fix-greeting
description: Fix the greeting function to return a proper message
action_type: code_modification
EOF
python -m cleveragents action create --config action.yaml

# 6. Start a plan
python -m cleveragents plan use fix-greeting
# → Look for: a plan ID in the output

# 7. Execute strategize phase
python -m cleveragents plan execute PLAN_ID
# → Look for: no Traceback, strategize completes

# 8. Execute implementation phase
python -m cleveragents plan execute PLAN_ID
# → Look for: no Traceback, execute completes

# 9. Review pending changes
python -m cleveragents plan diff PLAN_ID
# → Look for: diff output showing proposed changes

# 10. Apply changes via lifecycle
python -m cleveragents plan lifecycle-apply PLAN_ID
# → Look for: successful apply, changes committed

What to Look For

  • All commands exit without Traceback in stderr
  • plan use returns a plan ID (UUID format)
  • Both plan execute phases complete without error
  • plan diff shows pending file changes
  • plan lifecycle-apply commits changes to the target repo
  • git log in the target repo shows at least 2 commits after apply
## Summary E2E acceptance test for M1 (v3.0.0) — minimal plan execution flow. Tests the complete plan lifecycle: init, resource registration, project creation, action creation from YAML, plan use, two-phase execute (strategize + implement), diff review, and lifecycle-apply. Closes #741 ISSUES CLOSED: #741 ## Manual Verification To manually verify the CLI commands exercised by this E2E test, run the following in a fresh temporary directory: ### Prerequisites - `OPENAI_API_KEY` or `GEMINI_API_KEY` environment variable set - Git installed ### Commands ```bash # 1. Create a sample git repo as the target resource REPO=$(mktemp -d) cd "$REPO" && git init && git checkout -b main echo "def greet(): return 'hello'" > main.py git add . && git commit -m "initial commit" # 2. Create a working directory and initialize CleverAgents WORKDIR=$(mktemp -d) && cd "$WORKDIR" python -m cleveragents init --yes --force # 3. Register the git repo as a resource python -m cleveragents resource add git-checkout "$REPO" # 4. Create a project python -m cleveragents project create my-project # 5. Create an action from YAML (write a minimal action.yaml first) cat > action.yaml << 'EOF' name: fix-greeting description: Fix the greeting function to return a proper message action_type: code_modification EOF python -m cleveragents action create --config action.yaml # 6. Start a plan python -m cleveragents plan use fix-greeting # → Look for: a plan ID in the output # 7. Execute strategize phase python -m cleveragents plan execute PLAN_ID # → Look for: no Traceback, strategize completes # 8. Execute implementation phase python -m cleveragents plan execute PLAN_ID # → Look for: no Traceback, execute completes # 9. Review pending changes python -m cleveragents plan diff PLAN_ID # → Look for: diff output showing proposed changes # 10. Apply changes via lifecycle python -m cleveragents plan lifecycle-apply PLAN_ID # → Look for: successful apply, changes committed ``` ### What to Look For - All commands exit without `Traceback` in stderr - `plan use` returns a plan ID (UUID format) - Both `plan execute` phases complete without error - `plan diff` shows pending file changes - `plan lifecycle-apply` commits changes to the target repo - `git log` in the target repo shows at least 2 commits after apply
test(e2e): E2E acceptance criteria for M1 (v3.0.0) — minimal plan execution flow
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 20s
CI / e2e_tests (pull_request) Failing after 32s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m53s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m43s
CI / coverage (pull_request) Successful in 6m32s
CI / benchmark-regression (pull_request) Successful in 36m30s
0f0baaf7e7
Added Robot Framework E2E test suite for M1 milestone acceptance criteria.
Tests the complete plan lifecycle (action create → resource add → project
create → plan use → plan execute strategize → plan execute → plan diff →
plan apply) with real LLM API keys and no mocking.

Key implementation details:
- Uses openai/gpt-4o-mini as strategy/execution actor (cost-effective)
- Simple definition_of_done: "Create a file called HELLO.md"
- Creates isolated temp git repo via Create Temp Git Repo keyword
- Extracts plan ID via ULID regex from plain-text output
- Uses expected_rc=None for LLM-dependent steps (execute, diff, apply)
  to handle non-deterministic LLM behavior gracefully
- Flexible structural assertions: checks rc, output presence, git log
- Skips gracefully when no LLM API keys (ANTHROPIC/OPENAI) are set
- Tagged [E2E] so it runs only in nox -s e2e_tests session

ISSUES CLOSED: #741
freemo added this to the v3.0.0 milestone 2026-03-12 22:54:51 +00:00
freemo force-pushed test/e2e-m1-acceptance from 0f0baaf7e7
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 20s
CI / e2e_tests (pull_request) Failing after 32s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m53s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m43s
CI / coverage (pull_request) Successful in 6m32s
CI / benchmark-regression (pull_request) Successful in 36m30s
to 9c20fb972a
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Failing after 29s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 32s
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 2m29s
CI / integration_tests (pull_request) Successful in 2m36s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 5m14s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 16:12:31 +00:00
Compare
freemo force-pushed test/e2e-m1-acceptance from 9c20fb972a
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Failing after 29s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 32s
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 2m29s
CI / integration_tests (pull_request) Successful in 2m36s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 5m14s
CI / benchmark-regression (pull_request) Has been cancelled
to d43cee47c6
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Successful in 1m1s
CI / unit_tests (pull_request) Successful in 3m40s
CI / integration_tests (pull_request) Successful in 4m42s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Successful in 6m3s
CI / benchmark-regression (pull_request) Successful in 34m44s
2026-03-13 16:23:58 +00:00
Compare
freemo force-pushed test/e2e-m1-acceptance from d43cee47c6
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Successful in 1m1s
CI / unit_tests (pull_request) Successful in 3m40s
CI / integration_tests (pull_request) Successful in 4m42s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Successful in 6m3s
CI / benchmark-regression (pull_request) Successful in 34m44s
to 0d7a06efbb
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Failing after 25s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / integration_tests (pull_request) Successful in 2m53s
CI / unit_tests (pull_request) Successful in 3m52s
CI / docker (pull_request) Successful in 54s
CI / coverage (pull_request) Successful in 6m53s
CI / benchmark-regression (pull_request) Successful in 34m3s
2026-03-13 23:19:26 +00:00
Compare
Author
Owner

PM Review — Day 34

Status: NOT mergeable (conflicts), 0 reviews, M1 (v3.0.0), State/In Review
Author: @freemo

E2E acceptance criteria for M1 (v3.0.0) — minimal plan execution flow.

[BLOCKING] Merge conflicts — rebase required before review.

Action Items

Who Action Deadline
@freemo Rebase onto master Day 36
@brent.edwards Peer review after rebase Day 37
## PM Review — Day 34 **Status**: NOT mergeable (conflicts), 0 reviews, M1 (v3.0.0), State/In Review **Author**: @freemo E2E acceptance criteria for M1 (v3.0.0) — minimal plan execution flow. **[BLOCKING] Merge conflicts** — rebase required before review. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @freemo | Rebase onto master | Day 36 | | @brent.edwards | **Peer review** after rebase | Day 37 |
Author
Owner

merged in through another PR

merged in through another PR
freemo closed this pull request 2026-03-15 21:57:20 +00:00
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
Required
Details
CI / build (pull_request) Successful in 17s
Required
Details
CI / quality (pull_request) Successful in 18s
Required
Details
CI / e2e_tests (pull_request) Failing after 25s
CI / security (pull_request) Successful in 33s
Required
Details
CI / typecheck (pull_request) Successful in 36s
Required
Details
CI / integration_tests (pull_request) Successful in 2m53s
Required
Details
CI / unit_tests (pull_request) Successful in 3m52s
Required
Details
CI / docker (pull_request) Successful in 54s
Required
Details
CI / coverage (pull_request) Successful in 6m53s
Required
Details
CI / benchmark-regression (pull_request) Successful in 34m3s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!789
No description provided.