test(e2e): workflow example 12 — large-scale hierarchical feature implementation (supervised profile) #817

Merged
hurui200320 merged 1 commit from test/e2e-wf12-hierarchical into master 2026-03-30 06:35:03 +00:00
Owner

Summary

E2E test for Workflow Example 12 — large-scale hierarchical feature implementation using the supervised profile. Tests multi-project setup (4 repos with per-project invariants), global invariant registration, spec-compliant action configuration with long_description, hierarchical plan tree inspection with hard assertions, plan correct (append mode) on non-root decision with post-correction verification, phased lifecycle-apply, and terminal state verification.

Closes #758

ISSUES CLOSED: #758

Changes

Test Structure (robot/e2e/wf12_hierarchical.robot)

  • Suite Setup (WF12 Suite Setup): Initializes E2E environment with init --force --yes, generates UUID-suffixed names for all resources/projects/actions to prevent UNIQUE constraint collisions in parallel CI.
  • Keywords: Create Project Repo (with git rc assertions and timeout=60s on_timeout=kill), Register Project With Invariant (per-project invariant per spec, with timeout on git calls), Select Non Root Decision Id (targeted "decision_id" regex using correct Crockford Base32 character class [0-9A-HJKMNP-TV-Z]{26}, requires ≥2 IDs to avoid returning root, defensive check ensures selected ID differs from first), Verify Plan In List (consistent with m6_acceptance pattern).
  • Force Tags E2E at Settings level.

Spec Compliance

  • All 4 projects passed to plan use (spec Step 3): protos, api, worker, frontend.
  • Global invariant registered per spec Step 1 (invariant add --global) with hard assertion on rc=0 and content verification.
  • Action YAML includes spec-required fields: estimation_actor, invariant_actor, automation_profile: cautious (ticket says 'supervised' but spec uses 'cautious' — following spec), long_description, action-level invariants, reusable, state.
  • Per-project invariants on each project registration (spec Step 1).
  • plan explain exercised per spec Step 4 with --format json and full assertion suite (rc=0, Traceback/INTERNAL checks, non-empty output, decision ID presence in output).
  • Dynamic actor selection based on available API key (Anthropic preferred, OpenAI fallback).
  • Skip If No LLM Keys for graceful CI degradation.
  • 35-minute timeout for real LLM execution headroom.
  • plan lifecycle-list verification after plan use (consistent with m6_acceptance pattern).
  • lifecycle-apply --yes to skip confirmation prompt in automated test execution.

Assertion Quality

Every command is validated beyond rc=0:

  • Traceback and INTERNAL error marker checks on all commands.
  • Output Should Contain for resource/project/action names in registration output (including global invariant content assertion).
  • Safe Parse Json Field to parse plan_id from JSON output.
  • Hard assertion on "children" field for hierarchical decomposition (AC-3, AC-6), with non-empty children array check (WARN for flat LLM output since hierarchy depth is non-deterministic).
  • Decision tree structural assertions: decision_id count ≥ 2 (root + child) using Get Length on regex match results.
  • Correct Crockford Base32 regex for decision IDs: "decision_id"\\s*:\\s*"([0-9A-HJKMNP-TV-Z]{26})" — excludes I, L, O, U per spec.
  • Post-correction verification: second plan tree call verifies correction effect, using consistent regex-based counting method.
  • Pre-correction status check: verifies plan status (rc=0) before correction; gates correction on non-terminal state (skips with WARN if plan is already terminal).
  • Correction output verification: checks for append/queued/"mode"+"append"keywords (NOT barecorrectionsubstring which would vacuously matchcorrection_idkey). Plus structural JSON field check forstatusorcorrection_id`.
  • Post-strategize intermediate state assertion: verifies plan has non-empty phase or processing_state after strategize.
  • Apply phase assertion: Should Contain apply with WARN log when phase is empty.
  • Terminal state assertion: Phase checked against actual PlanPhase enum (apply); processing_state checked against actual ProcessingState enum terminal values for Apply phase (applied, constrained, cancelled — NOT complete, which is only for Strategize/Execute phases). Non-terminal states (queued, processing) produce WARN instead of failure (apply may be asynchronous). errored state produces separate WARN. Empty processing_state after phase='apply' fails the test (guarantees populated state after full lifecycle).
  • plan diff checks rc=0, non-empty output, Traceback/INTERNAL, and plan_id presence.
  • Explain output verified to contain queried decision ID.
  • --format json on plan use, execute, tree, correct, diff, lifecycle-apply, status, and explain.

Known Limitations

  • plan prompt not exercised: Spec Step 4 shows the supervised profile pausing on a low-confidence decision and the user providing guidance via plan prompt. This command is not yet implemented as a CLI subcommand. A documentation note in the test indicates this should be added once available.
  • Action arguments/--arg omitted: Spec Step 2 defines args and Step 3 uses --arg. Both are omitted because plan use triggers a UNIQUE constraint error when the action defines arguments in its schema (pre-existing bug in PlanLifecycleService.use_action). TODO documented in test.
  • Post-correction tree change: Correction queues a modification but may not immediately add new decision nodes visible in plan tree until re-execution. Assertion checks >= rather than >.
  • Action invariants: Spec defines 4 action-level invariants; test includes 2 as simplification. TODO documented.
  • Project invariants: Spec shows 2 per project; test includes 1 per project as simplification. TODO documented.
  • Multi-resource projects: Spec shows api/worker linked to both own repo and protos repo. Test links each to only own repo. TODO documented.
  • Non-deterministic hierarchy depth: LLM may produce flat sibling decisions rather than nested parent→child trees; test WARNs instead of failing in this case.

Quality Gates

All gates pass:

  • nox -e lint
  • nox -e typecheck
  • nox -e unit_tests (498 features, 12822 scenarios, 0 failed)
  • nox -e integration_tests (1825 tests, 0 failed)
  • nox -e e2e_tests (58 tests, 57 passed, 0 failed, 1 skipped — skip is pre-existing WF04 LLM non-determinism)
  • nox -e coverage_report (97%)

Manual Verification

Prerequisites

  • OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable set

Commands

nox -e e2e_tests  # runs the full E2E suite including this test
## Summary E2E test for Workflow Example 12 — large-scale hierarchical feature implementation using the supervised profile. Tests multi-project setup (4 repos with per-project invariants), global invariant registration, spec-compliant action configuration with long_description, hierarchical plan tree inspection with hard assertions, plan correct (append mode) on non-root decision with post-correction verification, phased lifecycle-apply, and terminal state verification. Closes #758 ISSUES CLOSED: #758 ## Changes ### Test Structure (`robot/e2e/wf12_hierarchical.robot`) - **Suite Setup** (`WF12 Suite Setup`): Initializes E2E environment with `init --force --yes`, generates UUID-suffixed names for all resources/projects/actions to prevent UNIQUE constraint collisions in parallel CI. - **Keywords**: `Create Project Repo` (with git rc assertions and `timeout=60s on_timeout=kill`), `Register Project With Invariant` (per-project invariant per spec, with timeout on git calls), `Select Non Root Decision Id` (targeted `"decision_id"` regex using correct Crockford Base32 character class `[0-9A-HJKMNP-TV-Z]{26}`, requires ≥2 IDs to avoid returning root, defensive check ensures selected ID differs from first), `Verify Plan In List` (consistent with m6_acceptance pattern). - **Force Tags E2E** at Settings level. ### Spec Compliance - **All 4 projects** passed to `plan use` (spec Step 3): protos, api, worker, frontend. - **Global invariant** registered per spec Step 1 (`invariant add --global`) with hard assertion on rc=0 and content verification. - **Action YAML** includes spec-required fields: `estimation_actor`, `invariant_actor`, `automation_profile: cautious` (ticket says 'supervised' but spec uses 'cautious' — following spec), `long_description`, action-level `invariants`, `reusable`, `state`. - **Per-project invariants** on each project registration (spec Step 1). - **`plan explain`** exercised per spec Step 4 with `--format json` and full assertion suite (rc=0, Traceback/INTERNAL checks, non-empty output, decision ID presence in output). - **Dynamic actor selection** based on available API key (Anthropic preferred, OpenAI fallback). - **Skip If No LLM Keys** for graceful CI degradation. - **35-minute timeout** for real LLM execution headroom. - **`plan lifecycle-list`** verification after `plan use` (consistent with m6_acceptance pattern). - **`lifecycle-apply --yes`** to skip confirmation prompt in automated test execution. ### Assertion Quality Every command is validated beyond rc=0: - `Traceback` and `INTERNAL` error marker checks on all commands. - `Output Should Contain` for resource/project/action names in registration output (including global invariant content assertion). - `Safe Parse Json Field` to parse `plan_id` from JSON output. - **Hard assertion** on `"children"` field for hierarchical decomposition (AC-3, AC-6), with non-empty children array check (WARN for flat LLM output since hierarchy depth is non-deterministic). - Decision tree structural assertions: `decision_id` count ≥ 2 (root + child) using `Get Length` on regex match results. - **Correct Crockford Base32 regex** for decision IDs: `"decision_id"\\s*:\\s*"([0-9A-HJKMNP-TV-Z]{26})"` — excludes I, L, O, U per spec. - **Post-correction verification**: second `plan tree` call verifies correction effect, using consistent regex-based counting method. - **Pre-correction status check**: verifies plan status (rc=0) before correction; gates correction on non-terminal state (skips with WARN if plan is already terminal). - **Correction output verification**: checks for `append`/`queued`/`"mode"+`"append"` keywords (NOT bare `correction` substring which would vacuously match `correction_id` key). Plus structural JSON field check for `status` or `correction_id`. - **Post-strategize intermediate state assertion**: verifies plan has non-empty phase or processing_state after strategize. - **Apply phase assertion**: `Should Contain apply` with WARN log when phase is empty. - **Terminal state assertion**: Phase checked against actual `PlanPhase` enum (`apply`); processing_state checked against actual `ProcessingState` enum terminal values for Apply phase (`applied`, `constrained`, `cancelled` — NOT `complete`, which is only for Strategize/Execute phases). Non-terminal states (`queued`, `processing`) produce WARN instead of failure (apply may be asynchronous). `errored` state produces separate WARN. Empty `processing_state` after phase='apply' fails the test (guarantees populated state after full lifecycle). - `plan diff` checks rc=0, non-empty output, Traceback/INTERNAL, and `plan_id` presence. - **Explain output** verified to contain queried decision ID. - `--format json` on plan use, execute, tree, correct, diff, lifecycle-apply, status, and explain. ### Known Limitations - **`plan prompt` not exercised**: Spec Step 4 shows the supervised profile pausing on a low-confidence decision and the user providing guidance via `plan prompt`. This command is not yet implemented as a CLI subcommand. A documentation note in the test indicates this should be added once available. - **Action arguments/`--arg` omitted**: Spec Step 2 defines `args` and Step 3 uses `--arg`. Both are omitted because `plan use` triggers a UNIQUE constraint error when the action defines arguments in its schema (pre-existing bug in `PlanLifecycleService.use_action`). TODO documented in test. - **Post-correction tree change**: Correction queues a modification but may not immediately add new decision nodes visible in `plan tree` until re-execution. Assertion checks `>=` rather than `>`. - **Action invariants**: Spec defines 4 action-level invariants; test includes 2 as simplification. TODO documented. - **Project invariants**: Spec shows 2 per project; test includes 1 per project as simplification. TODO documented. - **Multi-resource projects**: Spec shows api/worker linked to both own repo and protos repo. Test links each to only own repo. TODO documented. - **Non-deterministic hierarchy depth**: LLM may produce flat sibling decisions rather than nested parent→child trees; test WARNs instead of failing in this case. ## Quality Gates All gates pass: - `nox -e lint` ✅ - `nox -e typecheck` ✅ - `nox -e unit_tests` ✅ (498 features, 12822 scenarios, 0 failed) - `nox -e integration_tests` ✅ (1825 tests, 0 failed) - `nox -e e2e_tests` ✅ (58 tests, 57 passed, 0 failed, 1 skipped — skip is pre-existing WF04 LLM non-determinism) - `nox -e coverage_report` ✅ (97%) ## Manual Verification ### Prerequisites - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variable set ### Commands ```bash nox -e e2e_tests # runs the full E2E suite including this test ```
freemo added this to the v3.5.0 milestone 2026-03-13 17:11:02 +00:00
freemo force-pushed test/e2e-wf12-hierarchical from 285c53efe0
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 19s
CI / typecheck (pull_request) Successful in 31s
CI / security (pull_request) Successful in 31s
CI / e2e_tests (pull_request) Failing after 51s
CI / unit_tests (pull_request) Successful in 2m3s
CI / integration_tests (pull_request) Successful in 2m40s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 5m14s
CI / benchmark-regression (pull_request) Has been cancelled
to e7adeeb90e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 1m4s
CI / e2e_tests (pull_request) Failing after 59s
CI / unit_tests (pull_request) Successful in 3m9s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 4m55s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:28:46 +00:00
Compare
freemo force-pushed test/e2e-wf12-hierarchical from e7adeeb90e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 1m4s
CI / e2e_tests (pull_request) Failing after 59s
CI / unit_tests (pull_request) Successful in 3m9s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 4m55s
CI / benchmark-regression (pull_request) Has been cancelled
to c348fd1bec
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 14s
CI / build (pull_request) Successful in 13s
CI / security (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Failing after 1m4s
CI / unit_tests (pull_request) Successful in 2m2s
CI / docker (pull_request) Successful in 47s
CI / integration_tests (pull_request) Successful in 2m53s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:46:56 +00:00
Compare
freemo force-pushed test/e2e-wf12-hierarchical from c348fd1bec
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 14s
CI / build (pull_request) Successful in 13s
CI / security (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Failing after 1m4s
CI / unit_tests (pull_request) Successful in 2m2s
CI / docker (pull_request) Successful in 47s
CI / integration_tests (pull_request) Successful in 2m53s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 073d543558
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 30s
CI / build (pull_request) Successful in 26s
CI / e2e_tests (pull_request) Failing after 42s
CI / integration_tests (pull_request) Successful in 2m54s
CI / unit_tests (pull_request) Successful in 3m22s
CI / docker (pull_request) Successful in 35s
CI / coverage (pull_request) Successful in 4m50s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:51:49 +00:00
Compare
freemo force-pushed test/e2e-wf12-hierarchical from 073d543558
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 30s
CI / build (pull_request) Successful in 26s
CI / e2e_tests (pull_request) Failing after 42s
CI / integration_tests (pull_request) Successful in 2m54s
CI / unit_tests (pull_request) Successful in 3m22s
CI / docker (pull_request) Successful in 35s
CI / coverage (pull_request) Successful in 4m50s
CI / benchmark-regression (pull_request) Has been cancelled
to 9627b0c3a9
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 27s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m8s
CI / unit_tests (pull_request) Successful in 2m36s
CI / docker (pull_request) Successful in 10s
CI / integration_tests (pull_request) Successful in 4m9s
CI / coverage (pull_request) Successful in 5m48s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 18:13:11 +00:00
Compare
freemo force-pushed test/e2e-wf12-hierarchical from 9627b0c3a9
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 27s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m8s
CI / unit_tests (pull_request) Successful in 2m36s
CI / docker (pull_request) Successful in 10s
CI / integration_tests (pull_request) Successful in 4m9s
CI / coverage (pull_request) Successful in 5m48s
CI / benchmark-regression (pull_request) Has been cancelled
to b196b87161
All checks were successful
CI / lint (pull_request) Successful in 19s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 37s
CI / build (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 1m19s
CI / unit_tests (pull_request) Successful in 2m4s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m2s
CI / coverage (pull_request) Successful in 7m36s
CI / benchmark-regression (pull_request) Successful in 34m20s
2026-03-13 18:27:07 +00:00
Compare
freemo force-pushed test/e2e-wf12-hierarchical from b196b87161
All checks were successful
CI / lint (pull_request) Successful in 19s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 37s
CI / build (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 1m19s
CI / unit_tests (pull_request) Successful in 2m4s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m2s
CI / coverage (pull_request) Successful in 7m36s
CI / benchmark-regression (pull_request) Successful in 34m20s
to 6e60c9d7d8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 38s
CI / e2e_tests (pull_request) Failing after 53s
CI / unit_tests (pull_request) Successful in 2m5s
CI / integration_tests (pull_request) Successful in 4m21s
CI / docker (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 5m14s
CI / benchmark-regression (pull_request) Successful in 34m17s
2026-03-13 23:19:44 +00:00
Compare
Author
Owner

PM Review — Day 34

Status: Mergeable, 0 reviews, M6 (v3.5.0)
Closes: #758 | Author: @freemo

E2E test for WF12 (large-scale hierarchical feature implementation). 4-project setup (core, api, frontend, docs), plan tree inspection, plan correct --mode append, phased apply.

[NOTE] Milestone v3.5.0 acceptance criteria require "4+ levels of subplans" and "10+ concurrent subplans." Verify the test actually exercises these thresholds — the manual verification steps don't include explicit depth/concurrency checks.

Action Items

Who Action Deadline
@CoreRasurae Peer review — complex feature domain Day 37
## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M6 (v3.5.0) **Closes**: #758 | **Author**: @freemo E2E test for WF12 (large-scale hierarchical feature implementation). 4-project setup (core, api, frontend, docs), plan tree inspection, `plan correct --mode append`, phased apply. **[NOTE]** Milestone v3.5.0 acceptance criteria require "4+ levels of subplans" and "10+ concurrent subplans." Verify the test actually exercises these thresholds — the manual verification steps don't include explicit depth/concurrency checks. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @CoreRasurae | **Peer review** — complex feature domain | Day 37 |
Author
Owner

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days.

Priority note: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order.

Assigned reviewer: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days. **Priority note**: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order. **Assigned reviewer**: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.
Author
Owner

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.
Author
Owner

PM Status — Day 37

Reviewers assigned. This PR needs at least 2 approving reviews per CONTRIBUTING.md before merge.

Author: Please ensure this PR is rebased on latest master and all quality gates pass before requesting merge.


PM status — Day 37

## PM Status — Day 37 Reviewers assigned. This PR needs at least 2 approving reviews per `CONTRIBUTING.md` before merge. **Author**: Please ensure this PR is rebased on latest `master` and all quality gates pass before requesting merge. --- *PM status — Day 37*
hurui200320 force-pushed test/e2e-wf12-hierarchical from 6e60c9d7d8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 38s
CI / e2e_tests (pull_request) Failing after 53s
CI / unit_tests (pull_request) Successful in 2m5s
CI / integration_tests (pull_request) Successful in 4m21s
CI / docker (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 5m14s
CI / benchmark-regression (pull_request) Successful in 34m17s
to 5a8458b5af
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 27s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 3m2s
CI / e2e_tests (pull_request) Failing after 4m5s
CI / integration_tests (pull_request) Successful in 5m9s
CI / docker (pull_request) Successful in 9s
CI / coverage (pull_request) Successful in 7m3s
CI / benchmark-regression (pull_request) Successful in 39m3s
2026-03-18 08:35:56 +00:00
Compare
Author
Owner

Code Review — PR #817

(Cannot submit formal approval — self-authored PR.)

E2E test for WF12. Well-structured with proper labels, milestone, and issue linkage. No issues found.

## Code Review — PR #817 *(Cannot submit formal approval — self-authored PR.)* E2E test for WF12. Well-structured with proper labels, milestone, and issue linkage. No issues found.
hurui200320 force-pushed test/e2e-wf12-hierarchical from 5a8458b5af
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 27s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 3m2s
CI / e2e_tests (pull_request) Failing after 4m5s
CI / integration_tests (pull_request) Successful in 5m9s
CI / docker (pull_request) Successful in 9s
CI / coverage (pull_request) Successful in 7m3s
CI / benchmark-regression (pull_request) Successful in 39m3s
to f90ef0cadd
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m29s
CI / integration_tests (pull_request) Successful in 3m40s
CI / e2e_tests (pull_request) Failing after 5m17s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m15s
CI / docker (pull_request) Successful in 10s
CI / benchmark-regression (pull_request) Successful in 39m14s
2026-03-20 06:56:12 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from f90ef0cadd
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m29s
CI / integration_tests (pull_request) Successful in 3m40s
CI / e2e_tests (pull_request) Failing after 5m17s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m15s
CI / docker (pull_request) Successful in 10s
CI / benchmark-regression (pull_request) Successful in 39m14s
to 83b319e679
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 37s
CI / security (pull_request) Successful in 53s
CI / quality (pull_request) Successful in 29s
CI / unit_tests (pull_request) Successful in 3m28s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 7m28s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m16s
CI / docker (pull_request) Successful in 9s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-20 08:40:01 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 83b319e679
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 37s
CI / security (pull_request) Successful in 53s
CI / quality (pull_request) Successful in 29s
CI / unit_tests (pull_request) Successful in 3m28s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 7m28s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m16s
CI / docker (pull_request) Successful in 9s
CI / benchmark-regression (pull_request) Has been cancelled
to 69b03188f3
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 37s
CI / security (pull_request) Successful in 1m10s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m29s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 8m41s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m7s
CI / docker (pull_request) Successful in 10s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-20 09:32:52 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 69b03188f3
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 37s
CI / security (pull_request) Successful in 1m10s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m29s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 8m41s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 7m7s
CI / docker (pull_request) Successful in 10s
CI / benchmark-regression (pull_request) Has been cancelled
to 0dc73f0a08
Some checks are pending
CI / lint (pull_request) Waiting to run
CI / typecheck (pull_request) Waiting to run
CI / security (pull_request) Waiting to run
CI / quality (pull_request) Waiting to run
CI / unit_tests (pull_request) Waiting to run
CI / build (pull_request) Waiting to run
CI / integration_tests (pull_request) Waiting to run
CI / e2e_tests (pull_request) Waiting to run
CI / coverage (pull_request) Blocked by required conditions
CI / benchmark-regression (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Waiting to run
CI / docker (pull_request) Blocked by required conditions
2026-03-20 10:46:17 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 0dc73f0a08
Some checks are pending
CI / lint (pull_request) Waiting to run
CI / typecheck (pull_request) Waiting to run
CI / security (pull_request) Waiting to run
CI / quality (pull_request) Waiting to run
CI / unit_tests (pull_request) Waiting to run
CI / build (pull_request) Waiting to run
CI / integration_tests (pull_request) Waiting to run
CI / e2e_tests (pull_request) Waiting to run
CI / coverage (pull_request) Blocked by required conditions
CI / benchmark-regression (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Waiting to run
CI / docker (pull_request) Blocked by required conditions
to 75f70891f6
All checks were successful
CI / lint (pull_request) Successful in 17s
CI / typecheck (pull_request) Successful in 44s
CI / security (pull_request) Successful in 41s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m16s
CI / integration_tests (pull_request) Successful in 3m41s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / e2e_tests (pull_request) Successful in 9m5s
CI / coverage (pull_request) Successful in 7m18s
CI / docker (pull_request) Successful in 9s
CI / benchmark-regression (pull_request) Successful in 38m36s
2026-03-20 11:13:15 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 75f70891f6
All checks were successful
CI / lint (pull_request) Successful in 17s
CI / typecheck (pull_request) Successful in 44s
CI / security (pull_request) Successful in 41s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m16s
CI / integration_tests (pull_request) Successful in 3m41s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / e2e_tests (pull_request) Successful in 9m5s
CI / coverage (pull_request) Successful in 7m18s
CI / docker (pull_request) Successful in 9s
CI / benchmark-regression (pull_request) Successful in 38m36s
to 461defbe9a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 25s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m40s
CI / typecheck (pull_request) Successful in 4m32s
CI / security (pull_request) Successful in 4m43s
CI / unit_tests (pull_request) Successful in 5m44s
CI / integration_tests (pull_request) Successful in 6m45s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 11m41s
CI / coverage (pull_request) Successful in 10m12s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 50m36s
2026-03-23 04:11:37 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 461defbe9a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 25s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m40s
CI / typecheck (pull_request) Successful in 4m32s
CI / security (pull_request) Successful in 4m43s
CI / unit_tests (pull_request) Successful in 5m44s
CI / integration_tests (pull_request) Successful in 6m45s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 11m41s
CI / coverage (pull_request) Successful in 10m12s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 50m36s
to a9f2291c18
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 3m20s
CI / typecheck (pull_request) Successful in 3m51s
CI / quality (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 4m10s
CI / unit_tests (pull_request) Successful in 6m55s
CI / integration_tests (pull_request) Successful in 7m40s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 11m52s
CI / coverage (pull_request) Successful in 10m14s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 57m34s
2026-03-24 05:41:50 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from a9f2291c18
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 3m20s
CI / typecheck (pull_request) Successful in 3m51s
CI / quality (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 4m10s
CI / unit_tests (pull_request) Successful in 6m55s
CI / integration_tests (pull_request) Successful in 7m40s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 11m52s
CI / coverage (pull_request) Successful in 10m14s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 57m34s
to 4001d5095e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 35s
CI / lint (pull_request) Successful in 6m11s
CI / quality (pull_request) Successful in 6m37s
CI / security (pull_request) Successful in 6m41s
CI / typecheck (pull_request) Successful in 6m46s
CI / integration_tests (pull_request) Successful in 9m49s
CI / unit_tests (pull_request) Successful in 12m41s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Failing after 15m1s
CI / coverage (pull_request) Successful in 11m9s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h2m33s
2026-03-27 10:00:13 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 4001d5095e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 35s
CI / lint (pull_request) Successful in 6m11s
CI / quality (pull_request) Successful in 6m37s
CI / security (pull_request) Successful in 6m41s
CI / typecheck (pull_request) Successful in 6m46s
CI / integration_tests (pull_request) Successful in 9m49s
CI / unit_tests (pull_request) Successful in 12m41s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Failing after 15m1s
CI / coverage (pull_request) Successful in 11m9s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h2m33s
to 7839009bc0
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m21s
CI / quality (pull_request) Successful in 3m45s
CI / typecheck (pull_request) Successful in 4m11s
CI / security (pull_request) Successful in 4m13s
CI / build (pull_request) Successful in 26s
CI / helm (pull_request) Successful in 38s
CI / unit_tests (pull_request) Successful in 4m10s
CI / docker (pull_request) Successful in 13s
CI / integration_tests (pull_request) Successful in 3m57s
CI / coverage (pull_request) Successful in 12m13s
CI / e2e_tests (pull_request) Failing after 14m26s
CI / status-check (pull_request) Failing after 2s
CI / benchmark-regression (pull_request) Successful in 56m4s
2026-03-30 04:09:52 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 7839009bc0
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m21s
CI / quality (pull_request) Successful in 3m45s
CI / typecheck (pull_request) Successful in 4m11s
CI / security (pull_request) Successful in 4m13s
CI / build (pull_request) Successful in 26s
CI / helm (pull_request) Successful in 38s
CI / unit_tests (pull_request) Successful in 4m10s
CI / docker (pull_request) Successful in 13s
CI / integration_tests (pull_request) Successful in 3m57s
CI / coverage (pull_request) Successful in 12m13s
CI / e2e_tests (pull_request) Failing after 14m26s
CI / status-check (pull_request) Failing after 2s
CI / benchmark-regression (pull_request) Successful in 56m4s
to 7a66ddb9eb
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / security (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / build (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / helm (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 05:26:06 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from 7a66ddb9eb
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / security (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / build (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / helm (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to a8c625d1ac
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 44s
CI / build (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 3m59s
CI / security (pull_request) Successful in 4m8s
CI / integration_tests (pull_request) Successful in 6m14s
CI / unit_tests (pull_request) Successful in 6m29s
CI / docker (pull_request) Successful in 11s
CI / coverage (pull_request) Successful in 8m36s
CI / e2e_tests (pull_request) Successful in 12m32s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 05:27:13 +00:00
Compare
hurui200320 force-pushed test/e2e-wf12-hierarchical from a8c625d1ac
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 44s
CI / build (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 3m59s
CI / security (pull_request) Successful in 4m8s
CI / integration_tests (pull_request) Successful in 6m14s
CI / unit_tests (pull_request) Successful in 6m29s
CI / docker (pull_request) Successful in 11s
CI / coverage (pull_request) Successful in 8m36s
CI / e2e_tests (pull_request) Successful in 12m32s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to d24959e961
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 1m9s
CI / security (pull_request) Successful in 1m27s
CI / quality (pull_request) Successful in 43s
CI / build (pull_request) Successful in 26s
CI / helm (pull_request) Successful in 36s
CI / integration_tests (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 4m25s
CI / coverage (pull_request) Successful in 11m47s
CI / e2e_tests (pull_request) Successful in 13m21s
CI / docker (pull_request) Successful in 16s
CI / status-check (pull_request) Successful in 2s
CI / lint (push) Successful in 22s
CI / security (push) Successful in 50s
CI / build (push) Successful in 18s
CI / helm (push) Successful in 40s
CI / quality (push) Successful in 3m41s
CI / typecheck (push) Successful in 4m0s
CI / integration_tests (push) Successful in 4m19s
CI / unit_tests (push) Successful in 4m31s
CI / docker (push) Successful in 11s
CI / e2e_tests (push) Successful in 14m3s
CI / coverage (push) Successful in 11m50s
CI / status-check (push) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h3m16s
CI / benchmark-regression (push) Has been skipped
CI / benchmark-publish (push) Successful in 32m26s
2026-03-30 05:46:30 +00:00
Compare
hurui200320 deleted branch test/e2e-wf12-hierarchical 2026-03-30 06:35:03 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!817
No description provided.