test(integration): workflow example 11 — complex graph actor for multi-stage code review (trusted profile) #810

2026-03-13T05:48:18Z

brent.edwards commented

2026-03-13 05:48:18 +00:00

Summary

Closes #775

Add Robot Framework integration test suite for Specification Workflow Example 11: Complex Graph Actor for Multi-Stage Code Review. Exercises a custom graph-type actor (5 nodes, 6 edges) with parallel fan-out to security/performance/style reviewer nodes and result synthesis using mocked LLM providers. The action is read-only.

Graph Topology

dispatcher ──┬──> security_reviewer  ──┐
             ├──> performance_reviewer ─┤──> synthesizer
             └──> style_reviewer ──────┘

5 nodes, 6 edges
Entry: dispatcher, Exit: synthesizer
3-way fan-out (dispatcher → 3 reviewers), 3-way fan-in (3 reviewers → synthesizer)

New Files

File	Description
`robot/wf11_graph_actor.robot`	8 Robot Framework test cases
`robot/helper_wf11_graph_actor.py`	8-subcommand Python helper (~470 lines)

Test Cases (8)

Register Graph Actor Via CLI — Register 5-node graph actor via actor add --config
Verify Graph Topology 5 Nodes 6 Edges — Parse YAML, validate 5 nodes, 6 edges, fan-out/fan-in
Compile Graph Actor To LangGraph StateGraph — Compile actor, verify metadata
Create Read Only Review Action — Action referencing graph actor, read-only flag
Plan Use Review With Graph Actor — Create plan from review action
Verify Read Only Guard On Plan Execute — Confirm no crash on read-only execute
Verify Review Synthesis Structure — 3 reviewer edges into synthesizer, sole exit
Verify No File Modifications — Action marked read-only in output

Verification

nox -s integration_tests — All 8 tests pass
nox -s coverage_report — 98% (≥>=97% threshold)
nox -s typecheck — 0 errors
nox -s lint — All checks passed
nox -s format — No changes needed
nox -s docs — Builds successfully
nox -s build — Wheel built successfully

## Summary Closes #775 Add Robot Framework integration test suite for Specification Workflow Example 11: Complex Graph Actor for Multi-Stage Code Review. Exercises a custom graph-type actor (5 nodes, 6 edges) with parallel fan-out to security/performance/style reviewer nodes and result synthesis using mocked LLM providers. The action is read-only. ## Graph Topology ``` dispatcher ──┬──> security_reviewer ──┐ ├──> performance_reviewer ─┤──> synthesizer └──> style_reviewer ──────┘ ``` - 5 nodes, 6 edges - Entry: `dispatcher`, Exit: `synthesizer` - 3-way fan-out (dispatcher → 3 reviewers), 3-way fan-in (3 reviewers → synthesizer) ## New Files | File | Description | |------|-------------| | `robot/wf11_graph_actor.robot` | 8 Robot Framework test cases | | `robot/helper_wf11_graph_actor.py` | 8-subcommand Python helper (~470 lines) | ## Test Cases (8) 1. **Register Graph Actor Via CLI** — Register 5-node graph actor via `actor add --config` 2. **Verify Graph Topology 5 Nodes 6 Edges** — Parse YAML, validate 5 nodes, 6 edges, fan-out/fan-in 3. **Compile Graph Actor To LangGraph StateGraph** — Compile actor, verify metadata 4. **Create Read Only Review Action** — Action referencing graph actor, read-only flag 5. **Plan Use Review With Graph Actor** — Create plan from review action 6. **Verify Read Only Guard On Plan Execute** — Confirm no crash on read-only execute 7. **Verify Review Synthesis Structure** — 3 reviewer edges into synthesizer, sole exit 8. **Verify No File Modifications** — Action marked read-only in output ## Verification - `nox -s integration_tests` — All 8 tests pass - `nox -s coverage_report` — 98% (≥>=97% threshold) - `nox -s typecheck` — 0 errors - `nox -s lint` — All checks passed - `nox -s format` — No changes needed - `nox -s docs` — Builds successfully - `nox -s build` — Wheel built successfully

brent.edwards added the

Type

Testing

label 2026-03-13 05:48:28 +00:00

brent.edwards added this to the v3.1.0 milestone 2026-03-13 05:48:32 +00:00

brent.edwards added a new dependency 2026-03-13 05:48:43 +00:00

#775 test(integration): workflow example 11 — complex graph actor for multi-stage code review (trusted profile)

freemo added the

State

In Review

label 2026-03-13 21:16:45 +00:00

freemo added the

Priority

Medium

label 2026-03-14 04:10:19 +00:00

freemo commented

2026-03-14 04:44:38 +00:00

PM Review — Day 34

Status: Mergeable, 0 reviews, M2 (v3.1.0)
Author: @brent.edwards

Integration test for WF11 (complex graph actor for multi-stage code review). Robot Framework + helper pattern.

Action Items

Who	Action	Deadline
@hamza.khyari	Peer review	Day 37

## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M2 (v3.1.0) **Author**: @brent.edwards Integration test for WF11 (complex graph actor for multi-stage code review). Robot Framework + helper pattern. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @hamza.khyari | **Peer review** | Day 37 |

freemo added the

labels 2026-03-14 22:11:36 +00:00

freemo modified the milestone from v3.1.0 to v3.2.0

2026-03-16 00:32:00 +00:00

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#627 Implement @tdd_expected_fail tag handling in Behave environment

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#628 Implement @tdd_expected_fail tag handling in Robot Framework

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo commented

2026-03-16 09:27:02 +00:00

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. 0 reviewer activity after 2 days.

Assigned reviewer: Please acknowledge and provide an ETA for review. Prioritize M3 PRs first, then M4+ in milestone order.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. 0 reviewer activity after 2 days. **Assigned reviewer**: Please acknowledge and provide an ETA for review. Prioritize M3 PRs first, then M4+ in milestone order.

freemo requested review from hurui200320 2026-03-17 18:24:24 +00:00

freemo requested review from CoreRasurae 2026-03-17 18:24:24 +00:00

freemo commented

2026-03-17 18:40:41 +00:00

PM Status — Day 37

0 reviewer activity after 3 days. Review was assigned to @hamza.khyari on Day 34 with Day 37 deadline — now overdue. PR is M2 (v3.2.0) by @brent.edwards.

Author: Ensure PR is rebased on master by Day 39 EOD (2026-03-19). Reviewer: please post review or flag for reassignment.

PM status — Day 37

## PM Status — Day 37 0 reviewer activity after 3 days. Review was assigned to @hamza.khyari on Day 34 with Day 37 deadline — now overdue. PR is M2 (v3.2.0) by @brent.edwards. **Author**: Ensure PR is rebased on `master` by **Day 39 EOD (2026-03-19)**. Reviewer: please post review or flag for reassignment. --- *PM status — Day 37*

hurui200320 requested changes 2026-03-18 07:18:36 +00:00

hurui200320 left a comment

PR Review: !810 (Ticket #775)

Verdict: Request Changes

The PR adds a Robot Framework integration test suite for Specification Workflow Example 11 (complex graph actor for multi-stage code review). The overall approach follows established project patterns and covers the core acceptance criteria at a structural/topology level. However, there are 3 critical process violations, 9 major issues (code quality violations, missing mandatory PR artifacts, spec divergences, and weak test assertions), and several minor/nit items that should be addressed before merge.

Two review passes were performed. New findings from the second pass are marked 🆕.

Critical Issues

C1. PR description is empty

Location: PR !810 metadata (body field is "")
Problem: CONTRIBUTING.md §"Pull Request Process" point 1 requires every PR to include: (a) a summary of the changes, (b) an issue reference using a closing keyword (e.g., Closes #775), and (c) a Forgejo dependency link (PR blocks issue #775). The document states: "PRs submitted without a description or without an issue reference will not be reviewed."
Recommendation: Add a PR description with summary, Closes #775, and set up the Forgejo dependency link (PR blocks #775).

C2. Branch contains 4 merge commits — violates commit hygiene standards

Location: Branch test/int-wf11-graph-actor git history
Problem: The branch has 4 Merge branch 'master' into test/int-wf11-graph-actor commits alongside the single implementation commit. CONTRIBUTING.md §"Commit Hygiene" says to "Clean up history before merging" and use "interactive rebase or amend to fix typos, consolidate fixup commits, and polish the commit series before pushing to shared branches."
Recommendation: Rebase the branch onto origin/master to produce a clean linear history: git rebase origin/master.

C3. # type: ignore[operator] suppression on line 701

Location: robot/helper_wf11_graph_actor.py, line 701
Problem: CONTRIBUTING.md §"Type Safety" and §"Static Type Checker" explicitly state: "never use inline comments (such as # type: ignore) to suppress type checking errors." The line fn() # type: ignore[operator] violates this. Root cause: _COMMANDS is typed as dict[str, object] (line 685) instead of dict[str, Callable[[], None]].
Recommendation: Change line 685 to _COMMANDS: dict[str, Callable[[], None]] (import Callable from collections.abc), then remove the # type: ignore on line 701. See robot/helper_config_cli.py line 150 for the correct pattern already used in this codebase.

Major Issues

M1. verify_read_only_guard() does not verify the read-only guard fires

Location: robot/helper_wf11_graph_actor.py, lines 518–534
Problem: The function's docstring says "Verify plan execute is rejected for read-only plans" but the only assertion is the absence of a Python Traceback (line 527). The inline comment explicitly says: "Either is acceptable — the key assertion is no crash." This means the test passes even if the read-only guard is completely broken — as long as some other error (e.g., "plan not ready") prevents execution. Compare with the correct pattern in helper_m4_e2e_cli_errors.py which verifies non-zero exit code AND output contains "read-only".
Recommendation: Assert r4.returncode != 0 and that the combined output contains a read-only rejection keyword (e.g., "read-only" in combined.lower() or "read_only" in combined.lower()).

M2. verify_no_file_modifications() has a false-positive read-only check

Location: robot/helper_wf11_graph_actor.py, lines 653–668
Problem: The check if "read" not in out or "only" not in out (line 656) always passes because the _ACTION_YAML description is "Read-only multi-stage code review using a graph actor..." — the lowercased plain output always contains both "read" and "only" as substrings of the description text, regardless of the actual read_only flag value. The more robust YAML-format fallback (which checks for "read_only: true") is therefore never reached.
Recommendation: Replace the fragile substring check. Either always use the YAML format check, or check for "read_only" as a compound term.

M3. create_review_action() silently swallows JSON decode errors

Location: robot/helper_wf11_graph_actor.py, lines 379–384
Problem: When verifying the read_only flag via JSON output, except json.JSONDecodeError: pass (line 384) silently swallows the error and execution falls through to print the success sentinel — without ever having verified the flag. This also violates CONTRIBUTING.md §"Error and Exception Handling" which says "Do not suppress errors."
Recommendation: If JSON parsing fails, either (a) try the YAML format as a third fallback, or (b) call _fail() to report that verification was inconclusive.

M4. Imports inside function bodies violate project import guidelines

Location: robot/helper_wf11_graph_actor.py, lines 221, 279–280, 554–555
Problem: CONTRIBUTING.md §"Import Guidelines" (Project-Specific) explicitly states: "Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods." Five from cleveragents... imports appear inside function bodies: verify_graph_topology() (line 221), compile_graph_actor() (lines 279–280), and verify_review_synthesis() (lines 554–555).

Recommendation: Move all five imports to the top of the file (after the path bootstrap block), consolidated into:

from cleveragents.actor.schema import ActorConfigSchema, ActorType  # noqa: E402
from cleveragents.actor.compiler import CompiledActor, compile_actor  # noqa: E402

M5. File exceeds 500-line limit (701 lines)

Location: robot/helper_wf11_graph_actor.py (701 lines total)
Problem: CONTRIBUTING.md §"General Principles" states: "Keep files under 500 lines. Break large files into focused, cohesive modules." At 701 lines, the file is 40% over the limit. Much of the bulk comes from repeated actor-registration + action-creation boilerplate duplicated across 4 functions.
Recommendation: Extract the common setup sequence (register actor, create action) into a shared _setup_actor_and_action(prefix: str) helper function. This would eliminate ~60–80 lines of duplication per function and bring the file under 500 lines.

M6. Graph node IDs diverge from specification Example 11

Location: robot/helper_wf11_graph_actor.py, lines 57–139 (_GRAPH_ACTOR_YAML)
Problem: The spec (docs/specification.md, line 40382–40416) defines node IDs as: dispatch, security, performance, style, synthesize. The test YAML uses: dispatcher, security_reviewer, performance_reviewer, style_reviewer, synthesizer. The spec also includes checkpointing: true (line 40386) and parallel_execution: true (line 40416) on the graph route, which are absent from the test YAML. CONTRIBUTING.md states the spec is "the authoritative source of truth."
Recommendation: Align node IDs to match the spec. If the current schema doesn't support checkpointing and parallel_execution fields, add comments documenting the gap and reference the spec sections.

🆕 M7. Missing CHANGELOG.md update

Location: Missing change to CHANGELOG.md
Problem: CONTRIBUTING.md requirement 6 states: "The PR must include an update to the changelog file. Add one new entry per commit in the PR that describes the change from the user's perspective." Verified via git diff origin/master...HEAD -- CHANGELOG.md — empty. The PR adds a new integration test suite but has no CHANGELOG.md entry.
Recommendation: Add a changelog entry under ## Unreleased, e.g.: "Added Robot Framework integration test for Specification Workflow Example 11 (complex graph actor with multi-stage code review). (#775)"

🆕 M8. Missing --automation-profile trusted in plan use invocation

Location: robot/helper_wf11_graph_actor.py, lines 430–437
Problem: The ticket title explicitly says "trusted profile" and the spec's Example 11 Step 2 shows agents plan use --automation-profile trusted .... The test's plan_use_review() calls plan use local/wf11-code-review --format plain without passing --automation-profile trusted. The "trusted" profile causes strategize and execute to proceed automatically — the default (supervised) profile has different behavior.
Recommendation: Add "--automation-profile", "trusted" to the plan use CLI invocation.

🆕 M9. provider: openai phantom field silently discarded by schema

Location: robot/helper_wf11_graph_actor.py, line 65
Problem: The _GRAPH_ACTOR_YAML includes provider: openai, but ActorConfigSchema (in src/cleveragents/actor/schema.py) has no provider field. Since the schema doesn't set extra="forbid", Pydantic v2 silently ignores unknown fields. The YAML is accepted not because provider is valid, but because it's silently discarded — this is dead configuration that gives a false sense of completeness.
Recommendation: Remove provider: openai from the YAML fixture, or if provider is part of the spec intent, file a ticket to add it to ActorConfigSchema.

Minor Issues

m1. Points label mismatch between ticket and PR

Location: PR labels vs. Issue labels
Problem: Ticket #775 has Points/5 but PR !810 has Points/3.
Recommendation: Align the PR label to Points/5 to match the ticket.

m2. Resource leak if second write_yaml() fails in multi-YAML functions

Location: robot/helper_wf11_graph_actor.py, lines 322–325 (also 401–403, 475–477, 610–612)
Problem: In functions like create_review_action(), setup_workspace() and the first write_yaml() execute before the try block. If the second write_yaml() raises, the workspace and first temp file leak because finally never executes.
Recommendation: Move resource allocation inside the try block with None sentinel guards in finally, or use nested try/finally.

m3. Commit author uses personal email

Location: Commit 1ef2e506, author: Brent E. Edwards <chipuni@cemcast.net>
Problem: The implementation commit uses a personal email instead of the corporate email brent.edwards@cleverthis.com. The PR was created from the brent.edwards Forgejo account (corporate email), so the mismatch is inconsistent.
Recommendation: If the branch is rebased anyway (per C2), amend the commit author email.

m4. No negative/error-path test cases

Location: robot/wf11_graph_actor.robot (all 8 tests are happy-path)
Problem: All 8 tests are success-path scenarios. There are no negative tests (e.g., invalid graph topology, missing nodes, broken edges). The actor_examples.robot file includes negative tests as precedent.
Recommendation: Add at least 1–2 negative test cases (e.g., reject a graph actor with a missing node referenced in an edge).

m5. plan_use_review() has weak plan status verification

Location: robot/helper_wf11_graph_actor.py, lines 447–460
Problem: The comment says "Verify plan status shows strategize/queued" but the code only checks for absence of "Traceback" and "INTERNAL". It doesn't verify the plan is in the expected state.
Recommendation: Add a positive assertion for the expected plan state (e.g., check output contains "strategize" or "queued").

🆕 m6. Missing timeout/on_timeout on all Robot Run Process calls

Location: robot/wf11_graph_actor.robot, lines 19, 29, 39, 49, 59, 70, 81, 91
Problem: All 8 Run Process calls lack timeout and on_timeout parameters. 337 other Run Process calls across the project use timeout=120s on_timeout=kill. WF11's helper operations (Alembic migrations + CLI subprocesses) are susceptible to hangs that would stall a pabot worker indefinitely.
Recommendation: Add timeout=120s on_timeout=kill to each Run Process call.

🆕 m7. Action YAML missing spec's arguments: section

Location: robot/helper_wf11_graph_actor.py, lines 141–152
Problem: The spec's actions/deep-review.yaml defines typed arguments: target_paths (required, string) and review_depth (optional, string, default "standard"). The test's _ACTION_YAML omits all argument definitions, meaning the argument-passing workflow (plan use --arg target_paths=... --arg review_depth=...) is completely untested.
Recommendation: Add the arguments: section to _ACTION_YAML with the two arguments from the spec, or document why they are omitted.

🆕 m8. compile_graph_actor() doesn't assert compiled node IDs match expected set

Location: robot/helper_wf11_graph_actor.py, lines 291–307
Problem: Verifies len(compiled.nodes) == 5 but does not assert the actual node IDs match the expected set. If the compiler silently renamed or dropped/duplicated a node such that the count remained 5, this test would pass. By contrast, verify_graph_topology() does perform set-equality checks on config nodes.
Recommendation: Add: assert set(compiled.nodes.keys()) == {"dispatcher", "security_reviewer", ...}

🆕 m9. Coverage acceptance criterion is vacuously satisfied for Robot-only changes

Location: Ticket #775 acceptance criteria, commit message
Problem: nox -s coverage_report only measures Behave unit test coverage. Robot Framework integration tests run in a separate session and are never coverage-measured. Since this PR adds only Robot files, it is impossible for these changes to affect the 97% metric. The claim "Coverage >= 97% maintained (98%)" is trivially true and provides no signal about test quality.
Recommendation: Acknowledge in the PR description that coverage is unaffected by Robot-only changes, rather than claiming the criterion is met.

Nits

N1. Robot test documentation inaccuracy

Location: robot/wf11_graph_actor.robot, line 88
Problem: Documentation says "Verify the action is marked read-only in JSON output" but the helper actually uses --format plain with a YAML fallback, not JSON.
Recommendation: Update to "Verify the action is marked read-only in plain/YAML output."

N2. _extract_plan_id() regex is over-permissive for ULIDs

Location: robot/helper_wf11_graph_actor.py, line 162
Problem: \b([0-9A-Z]{26})\b accepts characters I, L, O, U which are excluded from Crockford's Base32 encoding used by ULIDs. Practical risk is negligible.
Recommendation: For completeness, could use \b([0-9A-HJKMNP-TV-Z]{26})\b.

N3. Code duplication in setup boilerplate

Location: robot/helper_wf11_graph_actor.py, 4 functions repeating ~20-line actor+action setup
Problem: create_review_action, plan_use_review, verify_read_only_guard, and verify_no_file_modifications each independently register the actor and create the action. ~80 lines of redundancy.
Recommendation: Extract into a _setup_actor_and_action(prefix) helper. This also helps with M5 (file length).

🆕 N4. Helper uses bare if __name__ dispatch instead of main() function

Location: robot/helper_wf11_graph_actor.py, lines 696–701
Problem: The vast majority of other helpers in the project define a main() function for dispatch (e.g., helper_cli_lifecycle.py, helper_decision_recording.py, helper_actor_examples.py). WF11 is an outlier.
Recommendation: Refactor to match the established main() pattern.

🆕 N5. Inconsistent docstring depth across functions

Location: robot/helper_wf11_graph_actor.py, multiple functions
Problem: verify_review_synthesis() has a thorough multi-line docstring. Most other functions have only single-line docstrings that don't describe key assertion nuances (e.g., verify_read_only_guard doesn't mention it accepts non-crash as success).
Recommendation: Expand docstrings to briefly describe key assertions and non-obvious acceptance criteria.

Summary

The PR adds a structurally sound Robot Framework integration test for Workflow Example 11 that covers graph actor registration, topology validation (5 nodes, 6 edges), compilation, action creation, and plan lifecycle. The tests follow the established robot/helper_*.py pattern with good isolation and cleanup.

However, the PR needs significant work before merge:

Process compliance (C1, C2, M7): Empty PR description, merge commits, and missing CHANGELOG update must be fixed.
Code quality (C3, M4, M5): The # type: ignore suppression, scattered imports, and 701-line file exceed project standards.
Test assertion quality (M1, M2, M3): The three functions that test read-only behavior all have assertion weaknesses that would allow tests to pass even if the read-only guard were broken.
Spec fidelity (M6, M8, M9): Node IDs diverge from spec, the "trusted profile" from the ticket title is not exercised, and a phantom provider field is silently discarded.

The second review pass confirmed the first pass's findings and uncovered 3 additional major issues (missing CHANGELOG, missing trusted profile, phantom schema field) plus 4 new minor/nit items. No security issues were found — the code uses secure temp file creation, list-based subprocess invocation, and proper database isolation.

## PR Review: !810 (Ticket #775) ### Verdict: Request Changes The PR adds a Robot Framework integration test suite for Specification Workflow Example 11 (complex graph actor for multi-stage code review). The overall approach follows established project patterns and covers the core acceptance criteria at a structural/topology level. However, there are **3 critical process violations**, **9 major issues** (code quality violations, missing mandatory PR artifacts, spec divergences, and weak test assertions), and several minor/nit items that should be addressed before merge. Two review passes were performed. New findings from the second pass are marked 🆕. --- ### Critical Issues **C1. PR description is empty** - **Location:** PR !810 metadata (body field is `""`) - **Problem:** CONTRIBUTING.md §"Pull Request Process" point 1 requires every PR to include: (a) a summary of the changes, (b) an issue reference using a closing keyword (e.g., `Closes #775`), and (c) a Forgejo dependency link (PR blocks issue #775). The document states: *"PRs submitted without a description or without an issue reference will not be reviewed."* - **Recommendation:** Add a PR description with summary, `Closes #775`, and set up the Forgejo dependency link (PR blocks #775). **C2. Branch contains 4 merge commits — violates commit hygiene standards** - **Location:** Branch `test/int-wf11-graph-actor` git history - **Problem:** The branch has 4 `Merge branch 'master' into test/int-wf11-graph-actor` commits alongside the single implementation commit. CONTRIBUTING.md §"Commit Hygiene" says to *"Clean up history before merging"* and use *"interactive rebase or amend to fix typos, consolidate fixup commits, and polish the commit series before pushing to shared branches."* - **Recommendation:** Rebase the branch onto `origin/master` to produce a clean linear history: `git rebase origin/master`. **C3. `# type: ignore[operator]` suppression on line 701** - **Location:** `robot/helper_wf11_graph_actor.py`, line 701 - **Problem:** CONTRIBUTING.md §"Type Safety" and §"Static Type Checker" explicitly state: *"never use inline comments (such as `# type: ignore`) to suppress type checking errors."* The line `fn() # type: ignore[operator]` violates this. Root cause: `_COMMANDS` is typed as `dict[str, object]` (line 685) instead of `dict[str, Callable[[], None]]`. - **Recommendation:** Change line 685 to `_COMMANDS: dict[str, Callable[[], None]]` (import `Callable` from `collections.abc`), then remove the `# type: ignore` on line 701. See `robot/helper_config_cli.py` line 150 for the correct pattern already used in this codebase. --- ### Major Issues **M1. `verify_read_only_guard()` does not verify the read-only guard fires** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 518–534 - **Problem:** The function's docstring says *"Verify `plan execute` is rejected for read-only plans"* but the only assertion is the absence of a Python `Traceback` (line 527). The inline comment explicitly says: *"Either is acceptable — the key assertion is no crash."* This means the test passes even if the read-only guard is completely broken — as long as some other error (e.g., "plan not ready") prevents execution. Compare with the correct pattern in `helper_m4_e2e_cli_errors.py` which verifies non-zero exit code AND output contains "read-only". - **Recommendation:** Assert `r4.returncode != 0` and that the combined output contains a read-only rejection keyword (e.g., `"read-only" in combined.lower() or "read_only" in combined.lower()`). **M2. `verify_no_file_modifications()` has a false-positive read-only check** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 653–668 - **Problem:** The check `if "read" not in out or "only" not in out` (line 656) always passes because the `_ACTION_YAML` description is *"Read-only multi-stage code review using a graph actor..."* — the lowercased plain output always contains both "read" and "only" as substrings of the description text, regardless of the actual `read_only` flag value. The more robust YAML-format fallback (which checks for `"read_only: true"`) is therefore **never reached**. - **Recommendation:** Replace the fragile substring check. Either always use the YAML format check, or check for `"read_only"` as a compound term. **M3. `create_review_action()` silently swallows JSON decode errors** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 379–384 - **Problem:** When verifying the `read_only` flag via JSON output, `except json.JSONDecodeError: pass` (line 384) silently swallows the error and execution falls through to print the success sentinel — without ever having verified the flag. This also violates CONTRIBUTING.md §"Error and Exception Handling" which says *"Do not suppress errors."* - **Recommendation:** If JSON parsing fails, either (a) try the YAML format as a third fallback, or (b) call `_fail()` to report that verification was inconclusive. **M4. Imports inside function bodies violate project import guidelines** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 221, 279–280, 554–555 - **Problem:** CONTRIBUTING.md §"Import Guidelines" (Project-Specific) explicitly states: *"Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."* Five `from cleveragents...` imports appear inside function bodies: `verify_graph_topology()` (line 221), `compile_graph_actor()` (lines 279–280), and `verify_review_synthesis()` (lines 554–555). - **Recommendation:** Move all five imports to the top of the file (after the path bootstrap block), consolidated into: ```python from cleveragents.actor.schema import ActorConfigSchema, ActorType # noqa: E402 from cleveragents.actor.compiler import CompiledActor, compile_actor # noqa: E402 ``` **M5. File exceeds 500-line limit (701 lines)** - **Location:** `robot/helper_wf11_graph_actor.py` (701 lines total) - **Problem:** CONTRIBUTING.md §"General Principles" states: *"Keep files under 500 lines. Break large files into focused, cohesive modules."* At 701 lines, the file is 40% over the limit. Much of the bulk comes from repeated actor-registration + action-creation boilerplate duplicated across 4 functions. - **Recommendation:** Extract the common setup sequence (register actor, create action) into a shared `_setup_actor_and_action(prefix: str)` helper function. This would eliminate ~60–80 lines of duplication per function and bring the file under 500 lines. **M6. Graph node IDs diverge from specification Example 11** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 57–139 (`_GRAPH_ACTOR_YAML`) - **Problem:** The spec (`docs/specification.md`, line 40382–40416) defines node IDs as: `dispatch`, `security`, `performance`, `style`, `synthesize`. The test YAML uses: `dispatcher`, `security_reviewer`, `performance_reviewer`, `style_reviewer`, `synthesizer`. The spec also includes `checkpointing: true` (line 40386) and `parallel_execution: true` (line 40416) on the graph route, which are absent from the test YAML. CONTRIBUTING.md states the spec is *"the authoritative source of truth."* - **Recommendation:** Align node IDs to match the spec. If the current schema doesn't support `checkpointing` and `parallel_execution` fields, add comments documenting the gap and reference the spec sections. **🆕 M7. Missing CHANGELOG.md update** - **Location:** Missing change to `CHANGELOG.md` - **Problem:** CONTRIBUTING.md requirement 6 states: *"The PR must include an update to the changelog file. Add one new entry per commit in the PR that describes the change from the user's perspective."* Verified via `git diff origin/master...HEAD -- CHANGELOG.md` — empty. The PR adds a new integration test suite but has no CHANGELOG.md entry. - **Recommendation:** Add a changelog entry under `## Unreleased`, e.g.: *"Added Robot Framework integration test for Specification Workflow Example 11 (complex graph actor with multi-stage code review). (#775)"* **🆕 M8. Missing `--automation-profile trusted` in `plan use` invocation** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 430–437 - **Problem:** The ticket title explicitly says *"trusted profile"* and the spec's Example 11 Step 2 shows `agents plan use --automation-profile trusted ...`. The test's `plan_use_review()` calls `plan use local/wf11-code-review --format plain` without passing `--automation-profile trusted`. The "trusted" profile causes strategize and execute to proceed automatically — the default (supervised) profile has different behavior. - **Recommendation:** Add `"--automation-profile", "trusted"` to the `plan use` CLI invocation. **🆕 M9. `provider: openai` phantom field silently discarded by schema** - **Location:** `robot/helper_wf11_graph_actor.py`, line 65 - **Problem:** The `_GRAPH_ACTOR_YAML` includes `provider: openai`, but `ActorConfigSchema` (in `src/cleveragents/actor/schema.py`) has no `provider` field. Since the schema doesn't set `extra="forbid"`, Pydantic v2 silently ignores unknown fields. The YAML is accepted not because `provider` is valid, but because it's silently discarded — this is dead configuration that gives a false sense of completeness. - **Recommendation:** Remove `provider: openai` from the YAML fixture, or if provider is part of the spec intent, file a ticket to add it to `ActorConfigSchema`. --- ### Minor Issues **m1. Points label mismatch between ticket and PR** - **Location:** PR labels vs. Issue labels - **Problem:** Ticket #775 has `Points/5` but PR !810 has `Points/3`. - **Recommendation:** Align the PR label to `Points/5` to match the ticket. **m2. Resource leak if second `write_yaml()` fails in multi-YAML functions** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 322–325 (also 401–403, 475–477, 610–612) - **Problem:** In functions like `create_review_action()`, `setup_workspace()` and the first `write_yaml()` execute before the `try` block. If the second `write_yaml()` raises, the workspace and first temp file leak because `finally` never executes. - **Recommendation:** Move resource allocation inside the `try` block with `None` sentinel guards in `finally`, or use nested try/finally. **m3. Commit author uses personal email** - **Location:** Commit `1ef2e506`, author: `Brent E. Edwards <chipuni@cemcast.net>` - **Problem:** The implementation commit uses a personal email instead of the corporate email `brent.edwards@cleverthis.com`. The PR was created from the `brent.edwards` Forgejo account (corporate email), so the mismatch is inconsistent. - **Recommendation:** If the branch is rebased anyway (per C2), amend the commit author email. **m4. No negative/error-path test cases** - **Location:** `robot/wf11_graph_actor.robot` (all 8 tests are happy-path) - **Problem:** All 8 tests are success-path scenarios. There are no negative tests (e.g., invalid graph topology, missing nodes, broken edges). The `actor_examples.robot` file includes negative tests as precedent. - **Recommendation:** Add at least 1–2 negative test cases (e.g., reject a graph actor with a missing node referenced in an edge). **m5. `plan_use_review()` has weak plan status verification** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 447–460 - **Problem:** The comment says *"Verify plan status shows strategize/queued"* but the code only checks for absence of "Traceback" and "INTERNAL". It doesn't verify the plan is in the expected state. - **Recommendation:** Add a positive assertion for the expected plan state (e.g., check output contains "strategize" or "queued"). **🆕 m6. Missing `timeout`/`on_timeout` on all Robot `Run Process` calls** - **Location:** `robot/wf11_graph_actor.robot`, lines 19, 29, 39, 49, 59, 70, 81, 91 - **Problem:** All 8 `Run Process` calls lack `timeout` and `on_timeout` parameters. 337 other `Run Process` calls across the project use `timeout=120s on_timeout=kill`. WF11's helper operations (Alembic migrations + CLI subprocesses) are susceptible to hangs that would stall a pabot worker indefinitely. - **Recommendation:** Add `timeout=120s on_timeout=kill` to each `Run Process` call. **🆕 m7. Action YAML missing spec's `arguments:` section** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 141–152 - **Problem:** The spec's `actions/deep-review.yaml` defines typed arguments: `target_paths` (required, string) and `review_depth` (optional, string, default "standard"). The test's `_ACTION_YAML` omits all argument definitions, meaning the argument-passing workflow (`plan use --arg target_paths=... --arg review_depth=...`) is completely untested. - **Recommendation:** Add the `arguments:` section to `_ACTION_YAML` with the two arguments from the spec, or document why they are omitted. **🆕 m8. `compile_graph_actor()` doesn't assert compiled node IDs match expected set** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 291–307 - **Problem:** Verifies `len(compiled.nodes) == 5` but does not assert the actual node IDs match the expected set. If the compiler silently renamed or dropped/duplicated a node such that the count remained 5, this test would pass. By contrast, `verify_graph_topology()` does perform set-equality checks on config nodes. - **Recommendation:** Add: `assert set(compiled.nodes.keys()) == {"dispatcher", "security_reviewer", ...}` **🆕 m9. Coverage acceptance criterion is vacuously satisfied for Robot-only changes** - **Location:** Ticket #775 acceptance criteria, commit message - **Problem:** `nox -s coverage_report` only measures Behave unit test coverage. Robot Framework integration tests run in a separate session and are never coverage-measured. Since this PR adds only Robot files, it is impossible for these changes to affect the 97% metric. The claim "Coverage >= 97% maintained (98%)" is trivially true and provides no signal about test quality. - **Recommendation:** Acknowledge in the PR description that coverage is unaffected by Robot-only changes, rather than claiming the criterion is met. --- ### Nits **N1. Robot test documentation inaccuracy** - **Location:** `robot/wf11_graph_actor.robot`, line 88 - **Problem:** Documentation says *"Verify the action is marked read-only in JSON output"* but the helper actually uses `--format plain` with a YAML fallback, not JSON. - **Recommendation:** Update to *"Verify the action is marked read-only in plain/YAML output."* **N2. `_extract_plan_id()` regex is over-permissive for ULIDs** - **Location:** `robot/helper_wf11_graph_actor.py`, line 162 - **Problem:** `\b([0-9A-Z]{26})\b` accepts characters I, L, O, U which are excluded from Crockford's Base32 encoding used by ULIDs. Practical risk is negligible. - **Recommendation:** For completeness, could use `\b([0-9A-HJKMNP-TV-Z]{26})\b`. **N3. Code duplication in setup boilerplate** - **Location:** `robot/helper_wf11_graph_actor.py`, 4 functions repeating ~20-line actor+action setup - **Problem:** `create_review_action`, `plan_use_review`, `verify_read_only_guard`, and `verify_no_file_modifications` each independently register the actor and create the action. ~80 lines of redundancy. - **Recommendation:** Extract into a `_setup_actor_and_action(prefix)` helper. This also helps with M5 (file length). **🆕 N4. Helper uses bare `if __name__` dispatch instead of `main()` function** - **Location:** `robot/helper_wf11_graph_actor.py`, lines 696–701 - **Problem:** The vast majority of other helpers in the project define a `main()` function for dispatch (e.g., `helper_cli_lifecycle.py`, `helper_decision_recording.py`, `helper_actor_examples.py`). WF11 is an outlier. - **Recommendation:** Refactor to match the established `main()` pattern. **🆕 N5. Inconsistent docstring depth across functions** - **Location:** `robot/helper_wf11_graph_actor.py`, multiple functions - **Problem:** `verify_review_synthesis()` has a thorough multi-line docstring. Most other functions have only single-line docstrings that don't describe key assertion nuances (e.g., `verify_read_only_guard` doesn't mention it accepts non-crash as success). - **Recommendation:** Expand docstrings to briefly describe key assertions and non-obvious acceptance criteria. --- ### Summary The PR adds a structurally sound Robot Framework integration test for Workflow Example 11 that covers graph actor registration, topology validation (5 nodes, 6 edges), compilation, action creation, and plan lifecycle. The tests follow the established `robot/helper_*.py` pattern with good isolation and cleanup. However, the PR needs significant work before merge: 1. **Process compliance** (C1, C2, M7): Empty PR description, merge commits, and missing CHANGELOG update must be fixed. 2. **Code quality** (C3, M4, M5): The `# type: ignore` suppression, scattered imports, and 701-line file exceed project standards. 3. **Test assertion quality** (M1, M2, M3): The three functions that test read-only behavior all have assertion weaknesses that would allow tests to pass even if the read-only guard were broken. 4. **Spec fidelity** (M6, M8, M9): Node IDs diverge from spec, the "trusted profile" from the ticket title is not exercised, and a phantom `provider` field is silently discarded. The second review pass confirmed the first pass's findings and uncovered 3 additional major issues (missing CHANGELOG, missing trusted profile, phantom schema field) plus 4 new minor/nit items. No security issues were found — the code uses secure temp file creation, list-based subprocess invocation, and proper database isolation.

brent.edwards force-pushed test/int-wf11-graph-actor from 0afbe2d14a to 7ab07303f8

2026-03-18 22:14:25 +00:00

Compare

brent.edwards force-pushed test/int-wf11-graph-actor from 7ab07303f8 to b59677927f

2026-03-18 23:14:27 +00:00

Compare

brent.edwards commented

2026-03-18 23:15:35 +00:00

Review Fixes Applied — Commit `b5967792`

Addressed Rui Hu's 26 findings. Merge commits and empty PR body resolved by rebase/user.

Critical

Finding	Fix
C3	Removed `# type: ignore[operator]` — changed `_COMMANDS` type to `dict[str, Callable[[], None]]`

Major

Finding	Fix
M1	`verify_read_only_guard()` — asserts `returncode != 0` and output contains `read-only`/`read_only`
M2	`verify_no_file_modifications()` — checks `"read_only: true"` in YAML output directly
M3	`create_review_action()` — `except json.JSONDecodeError` now calls `fail()`
M4	All `from cleveragents...` imports moved to top of file
M5	Extracted `_setup_actor_and_action()` context manager — 466 lines (down from 701)
M7	CHANGELOG entry for #775
M8	Added `--automation-profile trusted` to `plan use`
M9	Removed phantom `provider: openai` from YAML fixture

Minor

timeout=120s on_timeout=kill on all 8 Robot Run Process calls
Force Tags wf11 graph trusted integration
nox -s lint — PASS
nox -s typecheck — PASS (0 errors)
Helper: 466 lines (under 500, down from 701)

## Review Fixes Applied — Commit `b5967792` Addressed Rui Hu's 26 findings. Merge commits and empty PR body resolved by rebase/user. ### Critical | Finding | Fix | |---------|-----| | **C3** | Removed `# type: ignore[operator]` — changed `_COMMANDS` type to `dict[str, Callable[[], None]]` | ### Major | Finding | Fix | |---------|-----| | **M1** | `verify_read_only_guard()` — asserts `returncode != 0` and output contains `read-only`/`read_only` | | **M2** | `verify_no_file_modifications()` — checks `"read_only: true"` in YAML output directly | | **M3** | `create_review_action()` — `except json.JSONDecodeError` now calls `fail()` | | **M4** | All `from cleveragents...` imports moved to top of file | | **M5** | Extracted `_setup_actor_and_action()` context manager — **466 lines** (down from 701) | | **M7** | CHANGELOG entry for #775 | | **M8** | Added `--automation-profile trusted` to `plan use` | | **M9** | Removed phantom `provider: openai` from YAML fixture | ### Minor - `timeout=120s on_timeout=kill` on all 8 Robot Run Process calls - `Force Tags wf11 graph trusted integration` - `nox -s lint` — **PASS** - `nox -s typecheck` — **PASS** (0 errors) - Helper: **466 lines** (under 500, down from 701)

freemo approved these changes 2026-03-19 04:57:28 +00:00

Dismissed

freemo left a comment

Code Review — PR #810

Well-structured integration test for WF11. Proper labels, milestone, and issue linkage. Approved.

## Code Review — PR #810 Well-structured integration test for WF11. Proper labels, milestone, and issue linkage. **Approved.**

brent.edwards force-pushed test/int-wf11-graph-actor from b59677927f to 39afb0b9b3

2026-03-20 00:13:52 +00:00

Compare

brent.edwards dismissed freemo's review 2026-03-20 00:13:52 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards commented

2026-03-20 03:59:42 +00:00

Rebased onto origin/master (79b0a2c5). CHANGELOG conflict resolved (kept master, re-added PR entry). nox -s lint PASS, nox -s typecheck PASS (0 errors). Commit 39afb0b9.

Rebased onto `origin/master` (`79b0a2c5`). CHANGELOG conflict resolved (kept master, re-added PR entry). `nox -s lint` PASS, `nox -s typecheck` PASS (0 errors). Commit `39afb0b9`.

freemo removed a dependency 2026-03-26 15:14:38 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo added a new dependency 2026-03-26 15:14:42 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo removed a dependency 2026-03-26 18:28:10 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

brent.edwards force-pushed test/int-wf11-graph-actor from b1c345c3a0 to e2fc9c96c3

2026-03-26 20:02:56 +00:00

Compare

freemo self-assigned this 2026-04-02 06:15:22 +00:00

freemo commented

2026-04-02 17:32:21 +00:00

🤖 Backlog Groomer (groomer-1): Closing as duplicate of #775.

Issue #775 (test(integration): workflow example 11 — complex graph actor for multi-stage code generation) is the canonical version with full labels (MoSCoW/Must have, Priority/Medium, State/In Review, Type/Testing) and milestone v3.2.0. This issue is an exact title duplicate.

🤖 **Backlog Groomer (groomer-1):** Closing as duplicate of #775. Issue #775 (`test(integration): workflow example 11 — complex graph actor for multi-stage code generation`) is the canonical version with full labels (`MoSCoW/Must have`, `Priority/Medium`, `State/In Review`, `Type/Testing`) and milestone `v3.2.0`. This issue is an exact title duplicate.

freemo closed this pull request

2026-04-02 17:32:31 +00:00

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 31s

Required

Details

CI / lint (pull_request) Successful in 3m54s

Required

Details

CI / quality (pull_request) Successful in 4m28s

Required

Details

CI / typecheck (pull_request) Successful in 4m33s

Required

Details

CI / security (pull_request) Successful in 4m49s

Required

Details

CI / integration_tests (pull_request) Successful in 7m38s

Required

Details

CI / unit_tests (pull_request) Successful in 8m11s

Required

Details

CI / docker (pull_request) Successful in 1m57s

Required

Details

CI / e2e_tests (pull_request) Successful in 13m16s

Details

CI / benchmark-regression (pull_request) Failing after 15m3s

Details

CI / coverage (pull_request) Successful in 18m11s

Required

Details

CI / status-check (pull_request) Successful in 2s

Details

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Blocks

#775 test(integration): workflow example 11 — complex graph actor for multi-stage code review (trusted profile)

cleveragents/cleveragents-core

Depends on

#627 Implement @tdd_expected_fail tag handling in Behave environment

cleveragents/cleveragents-core

#628 Implement @tdd_expected_fail tag handling in Robot Framework

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#810

test(integration): workflow example 11 — complex graph actor for multi-stage code review (trusted profile) #810

Summary

Graph Topology

New Files

Test Cases (8)

Verification

PM Review — Day 34

Action Items

PM Status — Day 36 (2026-03-16)

PM Status — Day 37

PR Review: !810 (Ticket #775)

Verdict: Request Changes

Critical Issues

Major Issues

Minor Issues

Nits

Summary

Review Fixes Applied — Commit b5967792

Critical

Major

Minor

Code Review — PR #810

Pull request closed

Review Fixes Applied — Commit `b5967792`