test(e2e): E2E acceptance criteria for M2 (v3.1.0) — actor compiler and LLM integration #793

Closed
freemo wants to merge 1 commit from test/e2e-m2-acceptance into master
Owner

Summary

E2E acceptance test for M2 (v3.1.0) — actor compiler and LLM integration. Extends M1 flow with plan tree inspection, plan status checking, and lifecycle listing.

Closes #742

ISSUES CLOSED: #742

Manual Verification

Prerequisites

  • OPENAI_API_KEY or GEMINI_API_KEY environment variable set

Commands

# 1-6. Same init/resource/project/action/plan-use/execute flow as M1
REPO=$(mktemp -d) && cd "$REPO" && git init && git checkout -b main
echo "def greet(): return 'hello'" > main.py && git add . && git commit -m "init"
WORKDIR=$(mktemp -d) && cd "$WORKDIR"
python -m cleveragents init --yes --force
python -m cleveragents resource add git-checkout "$REPO"
python -m cleveragents project create my-project
# (create action.yaml as in M1)
python -m cleveragents action create --config action.yaml
python -m cleveragents plan use fix-greeting

# 7. Execute strategize
python -m cleveragents plan execute PLAN_ID

# 8. Inspect decision tree (M2-specific)
python -m cleveragents plan tree --format json PLAN_ID
# → Look for: JSON output with decision tree structure, or "No decisions found" message

# 9. Check plan status (M2-specific)
python -m cleveragents plan status PLAN_ID
# → Look for: status field showing current plan state

# 10. Execute implementation
python -m cleveragents plan execute PLAN_ID

# 11. Review and apply
python -m cleveragents plan diff PLAN_ID
python -m cleveragents plan lifecycle-apply PLAN_ID

# 12. List lifecycle entries (M2-specific)
python -m cleveragents plan lifecycle-list PLAN_ID
# → Look for: lifecycle entries showing state transitions

What to Look For

  • plan tree --format json returns valid JSON or a descriptive message
  • plan status shows a recognized state (strategized, executing, applied, etc.)
  • plan lifecycle-list shows lifecycle entries for the plan
  • No Traceback in any command's stderr
## Summary E2E acceptance test for M2 (v3.1.0) — actor compiler and LLM integration. Extends M1 flow with plan tree inspection, plan status checking, and lifecycle listing. Closes #742 ISSUES CLOSED: #742 ## Manual Verification ### Prerequisites - `OPENAI_API_KEY` or `GEMINI_API_KEY` environment variable set ### Commands ```bash # 1-6. Same init/resource/project/action/plan-use/execute flow as M1 REPO=$(mktemp -d) && cd "$REPO" && git init && git checkout -b main echo "def greet(): return 'hello'" > main.py && git add . && git commit -m "init" WORKDIR=$(mktemp -d) && cd "$WORKDIR" python -m cleveragents init --yes --force python -m cleveragents resource add git-checkout "$REPO" python -m cleveragents project create my-project # (create action.yaml as in M1) python -m cleveragents action create --config action.yaml python -m cleveragents plan use fix-greeting # 7. Execute strategize python -m cleveragents plan execute PLAN_ID # 8. Inspect decision tree (M2-specific) python -m cleveragents plan tree --format json PLAN_ID # → Look for: JSON output with decision tree structure, or "No decisions found" message # 9. Check plan status (M2-specific) python -m cleveragents plan status PLAN_ID # → Look for: status field showing current plan state # 10. Execute implementation python -m cleveragents plan execute PLAN_ID # 11. Review and apply python -m cleveragents plan diff PLAN_ID python -m cleveragents plan lifecycle-apply PLAN_ID # 12. List lifecycle entries (M2-specific) python -m cleveragents plan lifecycle-list PLAN_ID # → Look for: lifecycle entries showing state transitions ``` ### What to Look For - `plan tree --format json` returns valid JSON or a descriptive message - `plan status` shows a recognized state (strategized, executing, applied, etc.) - `plan lifecycle-list` shows lifecycle entries for the plan - No `Traceback` in any command's stderr
test(e2e): E2E acceptance criteria for M2 (v3.1.0) — actor compiler and LLM integration
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 21s
CI / build (pull_request) Successful in 16s
CI / e2e_tests (pull_request) Failing after 39s
CI / security (pull_request) Successful in 47s
CI / typecheck (pull_request) Successful in 1m21s
CI / unit_tests (pull_request) Successful in 4m13s
CI / integration_tests (pull_request) Failing after 4m58s
CI / docker (pull_request) Successful in 49s
CI / coverage (pull_request) Successful in 8m15s
CI / benchmark-regression (pull_request) Successful in 37m39s
6abc317ba3
Add Robot Framework E2E test suite robot/e2e/m2_acceptance.robot exercising
M2 acceptance criteria with zero mocking. Test creates a temp git repo with
sample project files, registers a custom actor via CLI, sets up resource and
project, creates an action referencing the actor, and runs the full plan
lifecycle (use → execute strategize → execute → diff → apply). Validates
actor YAML compilation, skill registry, tool lifecycle, and LLM integration
through real CLI invocations with real provider API keys. Uses flexible
structural assertions and expected_rc=None for LLM-dependent commands.

ISSUES CLOSED: #742
freemo added this to the v3.1.0 milestone 2026-03-12 23:11:35 +00:00
freemo force-pushed test/e2e-m2-acceptance from 6abc317ba3
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 21s
CI / build (pull_request) Successful in 16s
CI / e2e_tests (pull_request) Failing after 39s
CI / security (pull_request) Successful in 47s
CI / typecheck (pull_request) Successful in 1m21s
CI / unit_tests (pull_request) Successful in 4m13s
CI / integration_tests (pull_request) Failing after 4m58s
CI / docker (pull_request) Successful in 49s
CI / coverage (pull_request) Successful in 8m15s
CI / benchmark-regression (pull_request) Successful in 37m39s
to d9931d01eb
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 36s
CI / build (pull_request) Successful in 30s
CI / e2e_tests (pull_request) Successful in 52s
CI / unit_tests (pull_request) Successful in 3m9s
CI / integration_tests (pull_request) Successful in 3m8s
CI / docker (pull_request) Successful in 52s
CI / coverage (pull_request) Successful in 5m29s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 16:13:44 +00:00
Compare
freemo force-pushed test/e2e-m2-acceptance from d9931d01eb
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 36s
CI / build (pull_request) Successful in 30s
CI / e2e_tests (pull_request) Successful in 52s
CI / unit_tests (pull_request) Successful in 3m9s
CI / integration_tests (pull_request) Successful in 3m8s
CI / docker (pull_request) Successful in 52s
CI / coverage (pull_request) Successful in 5m29s
CI / benchmark-regression (pull_request) Has been cancelled
to deff73faa6
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 25s
CI / typecheck (pull_request) Successful in 39s
CI / build (pull_request) Successful in 26s
CI / security (pull_request) Successful in 41s
CI / e2e_tests (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 3m26s
CI / integration_tests (pull_request) Successful in 3m48s
CI / docker (pull_request) Successful in 9s
CI / coverage (pull_request) Successful in 5m20s
CI / benchmark-regression (pull_request) Successful in 34m51s
2026-03-13 16:24:01 +00:00
Compare
freemo force-pushed test/e2e-m2-acceptance from deff73faa6
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 25s
CI / typecheck (pull_request) Successful in 39s
CI / build (pull_request) Successful in 26s
CI / security (pull_request) Successful in 41s
CI / e2e_tests (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 3m26s
CI / integration_tests (pull_request) Successful in 3m48s
CI / docker (pull_request) Successful in 9s
CI / coverage (pull_request) Successful in 5m20s
CI / benchmark-regression (pull_request) Successful in 34m51s
to d879ba1f96
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 20s
CI / build (pull_request) Successful in 19s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Failing after 52s
CI / integration_tests (pull_request) Successful in 2m58s
CI / unit_tests (pull_request) Successful in 3m36s
CI / docker (pull_request) Successful in 35s
CI / coverage (pull_request) Successful in 4m41s
CI / benchmark-regression (pull_request) Failing after 40m8s
2026-03-13 23:19:27 +00:00
Compare
Author
Owner

PM Review — Day 34

Status: NOT mergeable (conflicts), 0 reviews, M2 (v3.1.0)
Author: @freemo

E2E acceptance criteria for M2 (v3.1.0) — actor compiler and LLM integration.

[BLOCKING] Merge conflicts — rebase required.

Action Items

Who Action Deadline
@freemo Rebase onto master Day 36
@hurui200320 Peer review after rebase Day 37
## PM Review — Day 34 **Status**: NOT mergeable (conflicts), 0 reviews, M2 (v3.1.0) **Author**: @freemo E2E acceptance criteria for M2 (v3.1.0) — actor compiler and LLM integration. **[BLOCKING] Merge conflicts** — rebase required. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @freemo | Rebase onto master | Day 36 | | @hurui200320 | **Peer review** after rebase | Day 37 |
Author
Owner

accepted as part of a different pull request ( #963 )

accepted as part of a different pull request ( #963 )
freemo closed this pull request 2026-03-16 00:15:14 +00:00
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
Required
Details
CI / quality (pull_request) Successful in 20s
Required
Details
CI / build (pull_request) Successful in 19s
Required
Details
CI / security (pull_request) Successful in 35s
Required
Details
CI / typecheck (pull_request) Successful in 36s
Required
Details
CI / e2e_tests (pull_request) Failing after 52s
CI / integration_tests (pull_request) Successful in 2m58s
Required
Details
CI / unit_tests (pull_request) Successful in 3m36s
Required
Details
CI / docker (pull_request) Successful in 35s
Required
Details
CI / coverage (pull_request) Successful in 4m41s
Required
Details
CI / benchmark-regression (pull_request) Failing after 40m8s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!793
No description provided.