test(e2e): workflow example 2 — automated test generation for a module (trusted profile) #795

2026-03-12T23:12:21Z

freemo commented

2026-03-12 23:12:21 +00:00

Summary

E2E Robot Framework test for Workflow Example 2: Automated Test Generation using the trusted automation profile. Exercises the full plan lifecycle against a temporary repository with a low-coverage auth module, verifying that CleverAgents can autonomously generate tests.

Closes #748

Changes

robot/e2e/wf02_test_generation.robot — New E2E test suite covering:
- Fixture setup: temp git repo with src/auth.py (realistic auth module) and tests/test_auth.py (minimal coverage baseline)
- Coverage validation registration (validation add) and project attachment (validation attach) per spec WF02 Step 1
- Action creation with dynamic actor selection (Anthropic/OpenAI based on available API keys)
- Full plan lifecycle: plan use → plan execute (strategize) → plan status → plan execute (execute) → plan artifacts → plan diff → lifecycle-apply
- Trusted profile verification: After plan use --automation-profile trusted, the test asserts the resolved automation_profile field is "trusted" — with WARN fallback if field absent
- Post-apply state assertion: After lifecycle-apply, verifies plan transitioned to apply phase via plan status --format json with Should Contain on phase field, consistent with m6_acceptance.robot Full Flow Apply Step pattern
- Unique per-run entity names: UUID-based RUN_SUFFIX appended to action, project, resource, validation, and temp repo names to avoid UNIQUE constraint collisions on repeated runs
- Plan status assertions: Status check after strategize logs rc WARN on failure and parses/logs the phase field for traceability
- Invariant enforcement: diff must not modify any src/ files (semantic regex (?:---|\\+\\+\\+) for unified diff markers via --format plain), with WARN logged when diff command fails
- Artifacts verification: plan artifacts --format json output checked for generated test files using JSON path patterns, with WARN when rc=0 but zero files parsed
- Post-apply verification: new test file detection or existing file modification, with File Should Exist guard for content-growth fallback
- Coverage improvement check: apply output parsing or direct pytest --cov fallback with hard failure if LLM produced artifacts but coverage doesn't reach 80%
- Zero-artifact visibility: Uses Pass Execution instead of WARN+PASS to make zero-artifact outcomes clearly visible in Robot Framework reports
CHANGELOG.md — Added entry under Unreleased

Key Design Decisions

WF02 Suite Setup with explicit init and UUID suffix: Dedicated suite setup calls E2E Suite Setup then init --force --yes, generates UUID-based RUN_SUFFIX and overrides entity name suite variables — consistent with m6_acceptance.robot pattern
Force Tags E2E: Added in Settings section for consistency with project conventions; redundant [Tags] E2E on test case removed since Force Tags already applies it
[Teardown] on test case: Added per project convention for individual test case cleanup logging
--format plain for plan diff: Ensures unified diff markers (--- a/, +++ b/) are present for the Verify No Production Code Changes regex invariant check — Rich format does not produce these markers
Semantic regex (?:---|\\+\\+\\+) for diff file headers: Replaced overly broad [+-]{3} with exact unified diff markers for precision
--format json for plan artifacts: Enables reliable JSON parsing of files_changed and path fields — Rich table output cannot be parsed as JSON
Regex for files_changed empty check: Uses "files_changed"\\s*:\\s*\\[\\s*\\] instead of exact string match for resilience against JSON formatting differences
Action YAML with arguments, state, and 3 invariants: Matches spec WF02 action definition structure — arguments field (schema name, not spec's args) includes target_module (default "src/auth") and coverage_target (default 80); state: available included; third invariant for conftest.py patterns added
Arguments use required: false with defaults: A production bug in PlanLifecycleService.use_action causes UNIQUE constraint violations when --arg values duplicate action argument definitions; arguments are defined with defaults as a workaround. TODO comment documents this for future removal.
expected_rc=None for LLM-dependent commands: All plan lifecycle commands use flexible return code handling since LLM calls can legitimately fail in E2E
Safe Parse Json Field from common_e2e.resource: Eliminates duplicated Extract Plan Id keyword
Pass Execution for zero-artifact outcomes: When the LLM produces zero artifacts, Verify Test Files Exist and Verify Coverage Improvement use Pass Execution instead of WARN+PASS to make these outcomes clearly visible in Robot Framework reports without false failures
Hard failure for coverage when artifacts exist: Verify Coverage Improvement uses Fail (not WARN) when the LLM produced artifacts but coverage doesn't meet the 80% threshold
Coverage regex anchoring: (?i)coverage.*\\bpass(?:ed)?\\b to prevent false-positive matches
Catenate SEPARATOR=\\n for all multi-line content: Consistent separator style across all keywords including validation YAML
Fixture content length passed as parameter: Verify Test Files Exist receives fixture_content_length as a parameter instead of redundantly calling Build Minimal Test Content
File Should Exist guard for content-growth fallback: In Verify Test Files Exist, checks test_auth.py exists before reading to handle LLM file renames gracefully
Artifacts zero-count WARN: When plan artifacts returns rc=0 with non-trivial output but no files are parsed, a WARN is logged to surface potential JSON schema changes
Validation YAML cwd documented: Subprocess.run in validation code does not set cwd because the validation engine sets the working directory to the resource root before invocation
Validation timeout reduced to 240s: From 600s, appropriate for single-module coverage check
Subprocess timeout in validation YAML: subprocess.run includes timeout=120 with TimeoutExpired exception handling
plan status between execution phases: Added status check after strategize phase with rc WARN and phase logging
Sequential step numbering (1–14): Steps renumbered to sequential integers

Quality Gates

Gate	Result
`nox -e lint`	✅ Pass
`nox -e typecheck`	✅ Pass (0 errors)
`nox -e unit_tests`	✅ Pass (12295 scenarios)
`nox -e integration_tests`	✅ Pass
`nox -e e2e_tests`	✅ Pass (38/38 tests, WF02 passes)
`nox -e coverage_report`	✅ 98% (≥ 97% threshold)

## Summary E2E Robot Framework test for **Workflow Example 2: Automated Test Generation** using the `trusted` automation profile. Exercises the full plan lifecycle against a temporary repository with a low-coverage `auth` module, verifying that CleverAgents can autonomously generate tests. Closes #748 ## Changes - **`robot/e2e/wf02_test_generation.robot`** — New E2E test suite covering: - Fixture setup: temp git repo with `src/auth.py` (realistic auth module) and `tests/test_auth.py` (minimal coverage baseline) - Coverage validation registration (`validation add`) and project attachment (`validation attach`) per spec WF02 Step 1 - Action creation with dynamic actor selection (Anthropic/OpenAI based on available API keys) - Full plan lifecycle: `plan use` → `plan execute` (strategize) → `plan status` → `plan execute` (execute) → `plan artifacts` → `plan diff` → `lifecycle-apply` - **Trusted profile verification**: After `plan use --automation-profile trusted`, the test asserts the resolved `automation_profile` field is `"trusted"` — with WARN fallback if field absent - **Post-apply state assertion**: After `lifecycle-apply`, verifies plan transitioned to apply phase via `plan status --format json` with `Should Contain` on phase field, consistent with `m6_acceptance.robot` `Full Flow Apply Step` pattern - **Unique per-run entity names**: UUID-based `RUN_SUFFIX` appended to action, project, resource, validation, and temp repo names to avoid UNIQUE constraint collisions on repeated runs - **Plan status assertions**: Status check after strategize logs rc WARN on failure and parses/logs the phase field for traceability - Invariant enforcement: diff must not modify any `src/` files (semantic regex `(?:---|\\+\\+\\+)` for unified diff markers via `--format plain`), with WARN logged when diff command fails - Artifacts verification: `plan artifacts --format json` output checked for generated test files using JSON path patterns, with WARN when rc=0 but zero files parsed - Post-apply verification: new test file detection or existing file modification, with `File Should Exist` guard for content-growth fallback - Coverage improvement check: apply output parsing or direct `pytest --cov` fallback with **hard failure** if LLM produced artifacts but coverage doesn't reach 80% - **Zero-artifact visibility**: Uses `Pass Execution` instead of WARN+PASS to make zero-artifact outcomes clearly visible in Robot Framework reports - **`CHANGELOG.md`** — Added entry under Unreleased ## Key Design Decisions - **`WF02 Suite Setup` with explicit `init` and UUID suffix**: Dedicated suite setup calls `E2E Suite Setup` then `init --force --yes`, generates UUID-based `RUN_SUFFIX` and overrides entity name suite variables — consistent with `m6_acceptance.robot` pattern - **`Force Tags E2E`**: Added in Settings section for consistency with project conventions; redundant `[Tags] E2E` on test case removed since `Force Tags` already applies it - **`[Teardown]` on test case**: Added per project convention for individual test case cleanup logging - **`--format plain` for `plan diff`**: Ensures unified diff markers (`--- a/`, `+++ b/`) are present for the `Verify No Production Code Changes` regex invariant check — Rich format does not produce these markers - **Semantic regex `(?:---|\\+\\+\\+)` for diff file headers**: Replaced overly broad `[+-]{3}` with exact unified diff markers for precision - **`--format json` for `plan artifacts`**: Enables reliable JSON parsing of `files_changed` and `path` fields — Rich table output cannot be parsed as JSON - **Regex for `files_changed` empty check**: Uses `"files_changed"\\s*:\\s*\\[\\s*\\]` instead of exact string match for resilience against JSON formatting differences - **Action YAML with `arguments`, `state`, and 3 invariants**: Matches spec WF02 action definition structure — `arguments` field (schema name, not spec's `args`) includes `target_module` (default `"src/auth"`) and `coverage_target` (default `80`); `state: available` included; third invariant for conftest.py patterns added - **Arguments use `required: false` with defaults**: A production bug in `PlanLifecycleService.use_action` causes UNIQUE constraint violations when `--arg` values duplicate action argument definitions; arguments are defined with defaults as a workaround. TODO comment documents this for future removal. - **`expected_rc=None` for LLM-dependent commands**: All plan lifecycle commands use flexible return code handling since LLM calls can legitimately fail in E2E - **`Safe Parse Json Field` from `common_e2e.resource`**: Eliminates duplicated `Extract Plan Id` keyword - **`Pass Execution` for zero-artifact outcomes**: When the LLM produces zero artifacts, `Verify Test Files Exist` and `Verify Coverage Improvement` use `Pass Execution` instead of WARN+PASS to make these outcomes clearly visible in Robot Framework reports without false failures - **Hard failure for coverage when artifacts exist**: `Verify Coverage Improvement` uses `Fail` (not WARN) when the LLM produced artifacts but coverage doesn't meet the 80% threshold - **Coverage regex anchoring**: `(?i)coverage.*\\bpass(?:ed)?\\b` to prevent false-positive matches - **`Catenate SEPARATOR=\\n` for all multi-line content**: Consistent separator style across all keywords including validation YAML - **Fixture content length passed as parameter**: `Verify Test Files Exist` receives `fixture_content_length` as a parameter instead of redundantly calling `Build Minimal Test Content` - **`File Should Exist` guard for content-growth fallback**: In `Verify Test Files Exist`, checks `test_auth.py` exists before reading to handle LLM file renames gracefully - **Artifacts zero-count WARN**: When `plan artifacts` returns rc=0 with non-trivial output but no files are parsed, a WARN is logged to surface potential JSON schema changes - **Validation YAML cwd documented**: Subprocess.run in validation code does not set `cwd` because the validation engine sets the working directory to the resource root before invocation - **Validation timeout reduced to 240s**: From 600s, appropriate for single-module coverage check - **Subprocess timeout in validation YAML**: `subprocess.run` includes `timeout=120` with `TimeoutExpired` exception handling - **`plan status` between execution phases**: Added status check after strategize phase with rc WARN and phase logging - **Sequential step numbering (1–14)**: Steps renumbered to sequential integers ## Quality Gates | Gate | Result | |------|--------| | `nox -e lint` | ✅ Pass | | `nox -e typecheck` | ✅ Pass (0 errors) | | `nox -e unit_tests` | ✅ Pass (12295 scenarios) | | `nox -e integration_tests` | ✅ Pass | | `nox -e e2e_tests` | ✅ Pass (38/38 tests, WF02 passes) | | `nox -e coverage_report` | ✅ 98% (≥ 97% threshold) |

freemo added this to the v3.1.0 milestone 2026-03-12 23:12:26 +00:00

freemo added the

Type

Testing

label 2026-03-12 23:12:26 +00:00

freemo referenced this pull request

2026-03-12 23:13:25 +00:00

test(e2e): workflow example 2 — automated test generation for a module (trusted profile) #748

freemo force-pushed test/e2e-wf02-test-generation from 5fd535996d to c8b7dbe424

2026-03-13 16:24:25 +00:00

Compare

freemo added the

State

In Review

label 2026-03-13 21:16:32 +00:00

freemo force-pushed test/e2e-wf02-test-generation from c8b7dbe424 to 3fcbb0aaf0

2026-03-13 23:19:31 +00:00

Compare

freemo added the

Priority

Medium

label 2026-03-14 04:10:09 +00:00

freemo commented

2026-03-14 04:44:32 +00:00

PM Review — Day 34

Status: Mergeable, 0 reviews, M2 (v3.1.0)
Author: @freemo

E2E test for WF02 (automated test generation for a module, trusted profile). Retroactive M2 coverage.

Action Items

Who	Action	Deadline
@hurui200320	Peer review	Day 37

## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M2 (v3.1.0) **Author**: @freemo E2E test for WF02 (automated test generation for a module, trusted profile). Retroactive M2 coverage. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @hurui200320 | **Peer review** | Day 37 |

freemo added the

labels 2026-03-14 22:11:24 +00:00

freemo modified the milestone from v3.1.0 to v3.2.0

2026-03-16 00:31:59 +00:00

freemo added a new dependency 2026-03-16 02:42:15 +00:00

#627 Implement @tdd_expected_fail tag handling in Behave environment

freemo added a new dependency 2026-03-16 02:42:15 +00:00

#628 Implement @tdd_expected_fail tag handling in Robot Framework

freemo added a new dependency 2026-03-16 02:42:15 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo commented

2026-03-16 09:27:02 +00:00

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has been in review for 2+ days with 0 reviewer activity.

Reminder: Assigned reviewer — please post your review by Day 37 EOD or flag any blockers. These E2E test PRs are foundational for milestone acceptance gates and cannot remain unreviewed indefinitely.

If you are unable to review by the deadline, please comment so the review can be reassigned.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has been in review for 2+ days with 0 reviewer activity. **Reminder**: Assigned reviewer — please post your review by **Day 37 EOD** or flag any blockers. These E2E test PRs are foundational for milestone acceptance gates and cannot remain unreviewed indefinitely. If you are unable to review by the deadline, please comment so the review can be reassigned.

freemo reviewed 2026-03-16 16:15:16 +00:00

freemo left a comment

PM Day 36 Triage: M3 E2E test PR (v3.2.0). Lower priority than bug fixes and TDD infrastructure. Reviewer: @brent.edwards after critical path items clear.

hurui200320 was assigned by freemo

2026-03-16 22:22:43 +00:00

freemo commented

2026-03-16 22:22:48 +00:00

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

freemo requested review from CoreRasurae 2026-03-17 18:24:15 +00:00

freemo requested review from hamza.khyari 2026-03-17 18:24:15 +00:00

hurui200320 force-pushed test/e2e-wf02-test-generation from 3fcbb0aaf0 to 08a0970311

2026-03-18 08:45:47 +00:00

Compare

hurui200320 force-pushed test/e2e-wf02-test-generation from 08a0970311 to 91928627c7

2026-03-24 07:34:48 +00:00

Compare

hurui200320 force-pushed test/e2e-wf02-test-generation from 91928627c7 to 306707538d

2026-03-24 08:32:35 +00:00

Compare

hurui200320 force-pushed test/e2e-wf02-test-generation from 306707538d to 9441193303

2026-03-24 09:10:04 +00:00

Compare

hurui200320 force-pushed test/e2e-wf02-test-generation from 9441193303 to 7bd4f0a0e4

2026-03-24 10:12:53 +00:00

Compare

hurui200320 referenced this pull request

2026-03-24 10:26:31 +00:00

test(e2e): workflow example 2 — automated test generation for a module (trusted profile) #748

freemo removed a dependency 2026-03-26 15:14:40 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo added a new dependency 2026-03-26 15:14:43 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo removed a dependency 2026-03-26 18:27:37 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

hurui200320 force-pushed test/e2e-wf02-test-generation from 7bd4f0a0e4 to cd06fe58f6

2026-03-27 10:00:05 +00:00

Compare

hurui200320 was unassigned by freemo

2026-04-02 06:15:24 +00:00

freemo self-assigned this 2026-04-02 06:15:24 +00:00

freemo commented

2026-04-02 17:34:21 +00:00

🤖 Backlog Groomer (groomer-1): Closing as duplicate of #748.

Issue #748 (test(e2e): workflow example 2 — automated test generation for a module) is the canonical version with full labels (MoSCoW/Must have, Priority/Critical, State/In Review, Type/Testing) and milestone v3.2.0. This issue is an exact title duplicate.

🤖 **Backlog Groomer (groomer-1):** Closing as duplicate of #748. Issue #748 (`test(e2e): workflow example 2 — automated test generation for a module`) is the canonical version with full labels (`MoSCoW/Must have`, `Priority/Critical`, `State/In Review`, `Type/Testing`) and milestone `v3.2.0`. This issue is an exact title duplicate.

freemo closed this pull request

2026-04-02 17:34:34 +00:00

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 45s

Required

Details

CI / lint (pull_request) Successful in 6m18s

Required

Details

CI / quality (pull_request) Successful in 6m45s

Required

Details

CI / security (pull_request) Successful in 6m49s

Required

Details

CI / typecheck (pull_request) Successful in 6m50s

Required

Details

CI / integration_tests (pull_request) Successful in 10m6s

Required

Details

CI / unit_tests (pull_request) Successful in 10m15s

Required

Details

CI / docker (pull_request) Successful in 1m8s

Required

Details

CI / e2e_tests (pull_request) Failing after 13m36s

Details

CI / coverage (pull_request) Successful in 11m47s

Required

Details

CI / status-check (pull_request) Successful in 1s

Details

CI / benchmark-regression (pull_request) Successful in 1h0m14s

Details

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

No Reviewers

freemo

CoreRasurae

hamza.khyari

2 Participants

Notifications

Due Date

No due date set.

Depends on

#627 Implement @tdd_expected_fail tag handling in Behave environment

cleveragents/cleveragents-core

#628 Implement @tdd_expected_fail tag handling in Robot Framework

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#795