test(e2e): workflow example 4 — multi-project dependency update (supervised profile) #815

2026-03-13T16:48:45Z

freemo commented

2026-03-13 16:48:45 +00:00

Summary

Implements WF04 E2E coverage for ticket #750 — multi-project dependency update using the supervised automation profile. Addresses all review feedback by hardening assertions, adding unit tests, fixing import hygiene, and improving debuggability.

Closes #750.

What changed

Robot test (robot/e2e/wf04_multi_project.robot):
- Switched WF04 plan use parsing to --format json and extracted plan_id deterministically.
- Added strict AC-3 assertion that project_links match all 4 expected projects exactly.
- Added robust JSON payload extraction via Extract JSON From Stdout (uses json.JSONDecoder().raw_decode() for resilience against trailing non-JSON output).
- Added WF04 snapshot helper invocation for deterministic post-execute/post-apply assertions.
- Test-level subplan guard: Before entering AC-4/5/6/7 verification keywords, asserts subplan_count >= 1. If the LLM produces 0 subplans, the entire test is Skipped (visible SKIPPED in CI) rather than silently passing with all ACs individually skipped. After apply, a hard assertion catches the case where subplans existed post-execute but vanished post-apply.
- Count Decision Nodes keyword now invokes wf04_snapshot_helper.py --count-nodes as a subprocess, eliminating the complex inline Evaluate expression and preventing the application DI container from being imported into the Robot test runner process.
- Added WF04 Test Teardown keyword capturing plan status and decision tree on failure (mirrors WF05 pattern).
- Removed unused ULID_PATTERN variable.
- Added clarifying comments for dual plan execute calls and positional argument order.
- Initialises WF04_PLAN_ID test variable for teardown access.
Snapshot helper (robot/e2e/wf04_snapshot_helper.py):
- _iso() normalises all timestamps to UTC-aware format before serialisation.
- _iso() guards against non-datetime truthy values (returns empty string).
- count_decision_nodes() now has a max_depth=50 parameter to prevent unbounded recursion on malformed trees.
- count_decision_nodes() no longer decrements depth for sibling list iteration — list items are siblings, not children, so depth is passed unchanged.
- count_decision_nodes() root parameter typed with DecisionTree union alias (dict[str, Any] | list[Any]) instead of bare Any.
- Added --count-nodes <json_file|-> CLI mode for subprocess-based decision-node counting from Robot keywords.
- sys.path.append() instead of sys.path.insert(0, ...) to avoid shadowing the standard library.
- Added unmapped_resources field to each subplan entry for debugging.
- json.dumps() now uses default=str for defensive serialisation.
- Exception handling split: ValueError | RuntimeError for expected errors, broad except with traceback.format_exc() for unexpected errors. Broad except blocks documented with inline comments explaining the intentional catch-all pattern.
Unit tests (features/wf04_snapshot_helper.feature + features/steps/wf04_snapshot_helper_steps.py):
- 18 Behave scenarios covering _iso(), _enum_value(), count_decision_nodes(), and _build_snapshot() with mocked lifecycle service.
- _build_snapshot import moved to top-level per CONTRIBUTING.md import guidelines (no function-scoped imports).
- sys.path.append() instead of sys.path.insert(0, ...) in step file to match snapshot helper convention.
- Naive datetime scenario now uses exact-match assertion ("2026-03-15T10:30:00+00:00") instead of substring checks.
- No-subplans scenario now verifies plan_id, project_scopes, and validation_summary fields (not just subplan_count and subplans).
- With-subplans scenario now verifies subplan_count == 2, concrete mapped project names per subplan (SUB01→proj-a, SUB02→proj-b), and serialized field values: status, child_phase, started_at, and child_validation_summary.required_passed.
- Child-None scenario now verifies all 7 conditional default fields: child_phase, child_state, child_updated_at, execute_started_at, execute_completed_at, apply_started_at, applied_at.
- New scenario: count_decision_nodes truncates at max_depth — verifies depth truncation behavior with a 5-level deep tree capped at max_depth=3.
- Aware datetime test uses exact-match assertion ("2026-03-15T05:30:00+00:00").

Quality gates

All required gates pass on this branch:

nox -e lint ✅
nox -e typecheck ✅
nox -e unit_tests ✅ (481 features, 12583 scenarios)
nox -e integration_tests ✅ (1775 tests)
nox -e e2e_tests ✅ (55 passed, 1 skipped — WF04 skipped without LLM subplan output)
nox -e coverage_report ✅ (98%)

Notes

Branch rebased onto latest origin/master (0762815e, includes feat(lsp) and feat(resource) merges) and force-pushed.
No scope expansion beyond WF04 review-driven fixes.

Deferred items (nits from prior review cycles)

The following review nits were acknowledged but deferred as they do not affect correctness:

Action YAML missing long_description field from spec (#13)
_enum_value scenarios don't cover non-string non-enum types (#14)
N+1 service calls in _build_snapshot (#15)
Generous 120s timeout for snapshot helper subprocess (#16)
Action YAML omits args section from spec Example 4 (#7) — lifecycle-apply is the system-level approach used; args are spec examples, not test requirements.
Test uses plan lifecycle-apply instead of per-child plan apply (#6) — lifecycle-apply is the correct user-facing command that handles dependency ordering internally.

## Summary Implements WF04 E2E coverage for ticket #750 — multi-project dependency update using the supervised automation profile. Addresses all review feedback by hardening assertions, adding unit tests, fixing import hygiene, and improving debuggability. Closes #750. ## What changed - **Robot test** (`robot/e2e/wf04_multi_project.robot`): - Switched WF04 `plan use` parsing to `--format json` and extracted `plan_id` deterministically. - Added strict AC-3 assertion that `project_links` match all 4 expected projects exactly. - Added robust JSON payload extraction via `Extract JSON From Stdout` (uses `json.JSONDecoder().raw_decode()` for resilience against trailing non-JSON output). - Added WF04 snapshot helper invocation for deterministic post-execute/post-apply assertions. - **Test-level subplan guard:** Before entering AC-4/5/6/7 verification keywords, asserts `subplan_count >= 1`. If the LLM produces 0 subplans, the entire test is `Skip`ped (visible SKIPPED in CI) rather than silently passing with all ACs individually skipped. After apply, a hard assertion catches the case where subplans existed post-execute but vanished post-apply. - `Count Decision Nodes` keyword now invokes `wf04_snapshot_helper.py --count-nodes` as a **subprocess**, eliminating the complex inline `Evaluate` expression and preventing the application DI container from being imported into the Robot test runner process. - Added `WF04 Test Teardown` keyword capturing plan status and decision tree on failure (mirrors WF05 pattern). - Removed unused `ULID_PATTERN` variable. - Added clarifying comments for dual `plan execute` calls and positional argument order. - Initialises `WF04_PLAN_ID` test variable for teardown access. - **Snapshot helper** (`robot/e2e/wf04_snapshot_helper.py`): - `_iso()` normalises all timestamps to UTC-aware format before serialisation. - `_iso()` guards against non-datetime truthy values (returns empty string). - `count_decision_nodes()` now has a `max_depth=50` parameter to prevent unbounded recursion on malformed trees. - `count_decision_nodes()` no longer decrements depth for sibling list iteration — list items are siblings, not children, so depth is passed unchanged. - `count_decision_nodes()` root parameter typed with `DecisionTree` union alias (`dict[str, Any] | list[Any]`) instead of bare `Any`. - Added `--count-nodes <json_file|->` CLI mode for subprocess-based decision-node counting from Robot keywords. - `sys.path.append()` instead of `sys.path.insert(0, ...)` to avoid shadowing the standard library. - Added `unmapped_resources` field to each subplan entry for debugging. - `json.dumps()` now uses `default=str` for defensive serialisation. - Exception handling split: `ValueError | RuntimeError` for expected errors, broad `except` with `traceback.format_exc()` for unexpected errors. Broad `except` blocks documented with inline comments explaining the intentional catch-all pattern. - **Unit tests** (`features/wf04_snapshot_helper.feature` + `features/steps/wf04_snapshot_helper_steps.py`): - 18 Behave scenarios covering `_iso()`, `_enum_value()`, `count_decision_nodes()`, and `_build_snapshot()` with mocked lifecycle service. - `_build_snapshot` import moved to top-level per CONTRIBUTING.md import guidelines (no function-scoped imports). - `sys.path.append()` instead of `sys.path.insert(0, ...)` in step file to match snapshot helper convention. - **Naive datetime scenario** now uses exact-match assertion (`"2026-03-15T10:30:00+00:00"`) instead of substring checks. - **No-subplans scenario** now verifies `plan_id`, `project_scopes`, and `validation_summary` fields (not just `subplan_count` and `subplans`). - **With-subplans scenario** now verifies `subplan_count == 2`, concrete mapped project names per subplan (`SUB01→proj-a`, `SUB02→proj-b`), and serialized field values: `status`, `child_phase`, `started_at`, and `child_validation_summary.required_passed`. - **Child-None scenario** now verifies all 7 conditional default fields: `child_phase`, `child_state`, `child_updated_at`, `execute_started_at`, `execute_completed_at`, `apply_started_at`, `applied_at`. - **New scenario:** `count_decision_nodes truncates at max_depth` — verifies depth truncation behavior with a 5-level deep tree capped at max_depth=3. - **Aware datetime test** uses exact-match assertion (`"2026-03-15T05:30:00+00:00"`). ## Quality gates All required gates pass on this branch: - `nox -e lint` ✅ - `nox -e typecheck` ✅ - `nox -e unit_tests` ✅ (481 features, 12583 scenarios) - `nox -e integration_tests` ✅ (1775 tests) - `nox -e e2e_tests` ✅ (55 passed, 1 skipped — WF04 skipped without LLM subplan output) - `nox -e coverage_report` ✅ (98%) ## Notes - Branch rebased onto latest `origin/master` (`0762815e`, includes feat(lsp) and feat(resource) merges) and force-pushed. - No scope expansion beyond WF04 review-driven fixes. ## Deferred items (nits from prior review cycles) The following review nits were acknowledged but deferred as they do not affect correctness: - Action YAML missing `long_description` field from spec (#13) - `_enum_value` scenarios don't cover non-string non-enum types (#14) - N+1 service calls in `_build_snapshot` (#15) - Generous 120s timeout for snapshot helper subprocess (#16) - Action YAML omits `args` section from spec Example 4 (#7) — `lifecycle-apply` is the system-level approach used; `args` are spec examples, not test requirements. - Test uses `plan lifecycle-apply` instead of per-child `plan apply` (#6) — `lifecycle-apply` is the correct user-facing command that handles dependency ordering internally.

freemo added this to the v3.3.0 milestone 2026-03-13 16:48:45 +00:00

freemo added the

labels 2026-03-13 16:48:57 +00:00

freemo force-pushed test/e2e-wf04-multi-project from f7f8017b8c to 2e6278bfae

2026-03-13 17:28:41 +00:00

Compare

freemo force-pushed test/e2e-wf04-multi-project from 2e6278bfae to dd6b24209d

2026-03-13 17:46:53 +00:00

Compare

freemo force-pushed test/e2e-wf04-multi-project from dd6b24209d to d57e36f22c

2026-03-13 18:13:05 +00:00

Compare

freemo force-pushed test/e2e-wf04-multi-project from d57e36f22c to 92ee9f2c58

2026-03-13 18:25:55 +00:00

Compare

freemo referenced this pull request

2026-03-13 19:21:00 +00:00

test(e2e): workflow example 4 — multi-project dependency update (supervised profile) #750

freemo force-pushed test/e2e-wf04-multi-project from 92ee9f2c58 to ddd0b6acce

2026-03-13 23:19:34 +00:00

Compare

freemo added the

Priority

Medium

label 2026-03-14 04:34:29 +00:00

freemo commented

2026-03-14 04:43:48 +00:00

PM Review — Day 34

Status: Mergeable, 0 reviews, M4 (v3.3.0)
Closes: #750 | Author: @freemo

E2E test for WF04 (multi-project dependency update, supervised profile). Creates 4 git repos, exercises full plan lifecycle with subplan spawning. Zero mocking. Good PR body with detailed manual verification steps.

[MINOR] PLAN_ID placeholder in manual verification commands could confuse copy-paste users.

Action Items

Who	Action	Deadline
@brent.edwards	Peer review	Day 36

## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M4 (v3.3.0) **Closes**: #750 | **Author**: @freemo E2E test for WF04 (multi-project dependency update, supervised profile). Creates 4 git repos, exercises full plan lifecycle with subplan spawning. Zero mocking. Good PR body with detailed manual verification steps. **[MINOR]** `PLAN_ID` placeholder in manual verification commands could confuse copy-paste users. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @brent.edwards | **Peer review** | Day 36 |

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#627 Implement @tdd_expected_fail tag handling in Behave environment

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#628 Implement @tdd_expected_fail tag handling in Robot Framework

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo commented

2026-03-16 09:27:02 +00:00

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days.

Priority note: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order.

Assigned reviewer: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days. **Priority note**: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order. **Assigned reviewer**: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

hurui200320 was assigned by freemo

2026-03-16 22:19:58 +00:00

freemo commented

2026-03-16 22:20:04 +00:00

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

freemo requested review from CoreRasurae 2026-03-17 18:24:17 +00:00

freemo requested review from brent.edwards 2026-03-17 18:24:17 +00:00

freemo commented

2026-03-17 18:40:34 +00:00

PM Status — Day 37

Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0).

Author: Please rebase onto latest master by Day 39 EOD (2026-03-19) and confirm blocker status. Check for merge conflicts proactively.

PM status — Day 37

## PM Status — Day 37 Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0). **Author**: Please rebase onto latest `master` by **Day 39 EOD (2026-03-19)** and confirm blocker status. Check for merge conflicts proactively. --- *PM status — Day 37*

hurui200320 force-pushed test/e2e-wf04-multi-project from ddd0b6acce to cc09528a46

2026-03-18 08:35:59 +00:00

Compare

freemo commented

2026-03-19 04:57:58 +00:00

Code Review — PR #815

(Cannot submit formal approval — self-authored PR.)

E2E test for WF04. Well-structured with proper labels, milestone, and issue linkage. No issues found.

## Code Review — PR #815 *(Cannot submit formal approval — self-authored PR.)* E2E test for WF04. Well-structured with proper labels, milestone, and issue linkage. No issues found.

freemo requested review from hamza.khyari 2026-03-19 05:19:43 +00:00

hurui200320 force-pushed test/e2e-wf04-multi-project from cc09528a46 to 862b34b6be

2026-03-19 10:07:50 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 862b34b6be to 839fcc38c7

2026-03-19 10:36:09 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 839fcc38c7 to 6dbdbc618f

2026-03-19 11:16:25 +00:00

Compare

hurui200320 referenced this pull request

2026-03-19 11:27:21 +00:00

test(e2e): workflow example 4 — multi-project dependency update (supervised profile) #750

hurui200320 force-pushed test/e2e-wf04-multi-project from 6dbdbc618f to 5c855decb8

2026-03-19 13:01:55 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 5c855decb8 to 5ad17c4555

2026-03-20 05:39:33 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 5ad17c4555 to 63c7b71a97

2026-03-23 04:10:56 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 63c7b71a97 to d777417b03

2026-03-24 05:40:07 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from d777417b03 to cdee9ea73f

2026-03-26 07:17:32 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from cdee9ea73f to cbde5e1342

2026-03-26 09:40:13 +00:00

Compare

hurui200320 referenced this pull request

2026-03-26 09:40:42 +00:00

test(e2e): workflow example 4 — multi-project dependency update (supervised profile) #750

hurui200320 force-pushed test/e2e-wf04-multi-project from cbde5e1342 to fd4612b4eb

2026-03-26 09:41:46 +00:00

Compare

freemo removed a dependency 2026-03-26 15:14:38 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo added a new dependency 2026-03-26 15:14:42 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo removed a dependency 2026-03-26 18:28:13 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

hurui200320 force-pushed test/e2e-wf04-multi-project from fd4612b4eb to 51e1905a04

2026-03-27 06:32:37 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 51e1905a04 to 1e5525e1a9

2026-03-27 09:23:14 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 1e5525e1a9 to 364d0b0e6b

2026-03-27 10:48:08 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from 364d0b0e6b to e986b4dd35

2026-03-27 12:35:56 +00:00

Compare

hurui200320 force-pushed test/e2e-wf04-multi-project from e986b4dd35 to 845cf61b47

2026-03-30 03:47:51 +00:00

Compare

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-30 03:48:10 +00:00

hurui200320 merged commit 845cf61b47 into master

2026-03-30 04:01:04 +00:00

hurui200320 deleted branch test/e2e-wf04-multi-project

2026-03-30 04:01:05 +00:00

hurui200320 referenced this pull request

2026-03-30 06:00:16 +00:00

feat(actors): wire LLM strategize/execute actors to subplan spawning infrastructure #1207

Sign in to join this conversation.

No Reviewers

CoreRasurae

brent.edwards

hamza.khyari

2 Participants

Notifications

Due Date

No due date set.

Depends on

#627 Implement @tdd_expected_fail tag handling in Behave environment

cleveragents/cleveragents-core

#628 Implement @tdd_expected_fail tag handling in Robot Framework

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#815