test(e2e): TDD failing tests for E2E mock-only coverage (bug #658) #738

2026-03-12T18:36:17Z

brent.edwards commented

2026-03-12 18:36:17 +00:00

Summary

TDD counterpart for bug #658. Adds failing tests that demonstrate M1-M6 E2E verification suites use mocked CliRunner invocations instead of real subprocess.run calls against the agents CLI binary.

Closes #697
Refs: #658

Test Coverage

Behave (3 scenarios)

features/tdd_e2e_mock_only_coverage.feature — 3 scenarios tagged @tdd_bug @tdd_bug_658 @tdd_expected_fail:
1. At least one M1 E2E helper function uses real subprocess CLI invocation — AST-parses helper_m1_e2e_verification.py looking for subprocess.run calls
2. At least one M3 E2E helper function uses real subprocess CLI invocation — same for M3
3. At least one M6 E2E helper function uses real subprocess CLI invocation — same for M6

Robot Framework (3 test cases)

robot/tdd_e2e_mock_only_coverage.robot — 3 test cases tagged tdd_bug, tdd_bug_658, tdd_expected_fail:
1. M1 E2E Helper Has Real CLI Subprocess Call
2. M3 E2E Helper Has Real CLI Subprocess Call
3. M6 E2E Helper Has Real CLI Subprocess Call

Detection Method

Both Behave step definitions and Robot helper use AST analysis to parse the Python helper files and detect:

subprocess.run(...) calls with agents in arguments
run_cli(...) calls (the shared helper from helper_e2e_common.py)

All 6 tests are tagged @tdd_expected_fail and are expected to fail on the current master branch. The companion bugfix PR #784 removes the @tdd_expected_fail tags after converting the helpers to use real subprocess invocations.

TDD Workflow

Per CONTRIBUTING.md: this TDD branch must merge to master first, then the bugfix branch (#784) implements the fix and removes @tdd_expected_fail.

## Summary TDD counterpart for bug #658. Adds failing tests that demonstrate M1-M6 E2E verification suites use mocked `CliRunner` invocations instead of real `subprocess.run` calls against the `agents` CLI binary. Closes #697 Refs: #658 ## Test Coverage ### Behave (3 scenarios) - `features/tdd_e2e_mock_only_coverage.feature` — 3 scenarios tagged `@tdd_bug @tdd_bug_658 @tdd_expected_fail`: 1. **At least one M1 E2E helper function uses real subprocess CLI invocation** — AST-parses `helper_m1_e2e_verification.py` looking for `subprocess.run` calls 2. **At least one M3 E2E helper function uses real subprocess CLI invocation** — same for M3 3. **At least one M6 E2E helper function uses real subprocess CLI invocation** — same for M6 ### Robot Framework (3 test cases) - `robot/tdd_e2e_mock_only_coverage.robot` — 3 test cases tagged `tdd_bug`, `tdd_bug_658`, `tdd_expected_fail`: 1. **M1 E2E Helper Has Real CLI Subprocess Call** 2. **M3 E2E Helper Has Real CLI Subprocess Call** 3. **M6 E2E Helper Has Real CLI Subprocess Call** ### Detection Method Both Behave step definitions and Robot helper use AST analysis to parse the Python helper files and detect: - `subprocess.run(...)` calls with `agents` in arguments - `run_cli(...)` calls (the shared helper from `helper_e2e_common.py`) All 6 tests are tagged `@tdd_expected_fail` and are expected to fail on the current `master` branch. The companion bugfix PR #784 removes the `@tdd_expected_fail` tags after converting the helpers to use real subprocess invocations. ## TDD Workflow Per CONTRIBUTING.md: this TDD branch must merge to master first, then the bugfix branch (#784) implements the fix and removes `@tdd_expected_fail`.

brent.edwards added 1 commit 2026-03-12 18:37:01 +00:00

test(e2e): TDD failing tests for E2E mock-only coverage (bug #658 )

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 16s

Details

CI / lint (pull_request) Successful in 20s

Details

CI / quality (pull_request) Successful in 20s

Details

CI / typecheck (pull_request) Successful in 38s

Details

CI / security (pull_request) Successful in 47s

Details

CI / unit_tests (pull_request) Successful in 3m19s

Details

CI / docker (pull_request) Successful in 40s

Details

CI / integration_tests (pull_request) Successful in 5m17s

Details

CI / coverage (pull_request) Successful in 5m38s

Details

CI / benchmark-regression (pull_request) Successful in 35m56s

Details

6806ef3620

Add Behave and Robot Framework TDD tests that use AST analysis to
inspect the M1-M6 E2E verification helper files and prove that all
21 CLI-facing test functions use unittest.mock.patch + CliRunner
instead of exercising the real CLI via subprocess invocation.

The tests verify three properties that currently fail (proving #658):
- At least one CLI-facing test uses subprocess instead of CliRunner
- At least one CLI-facing test exercises the real service layer
- Every suite with CLI tests has at least one unmocked test

Tagged @tdd_expected_fail so the tests pass CI while the bug is
present.  The @tdd_expected_fail tag inverts the result: the tests
pass because the underlying assertions fail (confirming the bug).
Once #658 is fixed, the tag is removed and the tests run normally.

Files added:
- features/tdd_e2e_mock_only_coverage.feature (3 scenarios)
- features/steps/tdd_e2e_mock_only_coverage_steps.py
- robot/tdd_e2e_mock_only_coverage.robot (3 test cases)
- robot/helper_tdd_e2e_mock_only_coverage.py

ISSUES CLOSED: #697

brent.edwards self-assigned this 2026-03-12 18:37:05 +00:00

brent.edwards added this to the v3.5.0 milestone 2026-03-12 18:37:06 +00:00

brent.edwards requested review from CoreRasurae 2026-03-12 18:37:07 +00:00

brent.edwards added the

labels 2026-03-12 18:37:11 +00:00

brent.edwards referenced this pull request

2026-03-12 18:37:26 +00:00

TDD: Write failing test demonstrating E2E mock-only coverage (bug #658) #697

freemo requested changes 2026-03-12 20:37:20 +00:00

freemo left a comment

Review — PR #738: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)

Diff Summary

Files changed: 4 (3 new files + 1 new feature file)
- features/tdd_e2e_mock_only_coverage.feature (new, 37 lines)
- features/steps/tdd_e2e_mock_only_coverage_steps.py (new, 257 lines)
- robot/tdd_e2e_mock_only_coverage.robot (new, 44 lines)
- robot/helper_tdd_e2e_mock_only_coverage.py (new, 254 lines)
Lines: +476 / −0 (all additions)

Commit Message Compliance

Issue #658 does not have a Metadata section with a prescribed Commit Message field — it is a bug report, not a standard task ticket.
PR commit first line: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)
Since no prescribed commit message exists, this is acceptable. ✅

Closes Keyword

❌ Missing. The PR body is completely empty. There is no Closes #658 keyword anywhere. The title references bug #658 but that is not a recognized auto-close syntax.

Label Compliance

Required Label	Status
Type label	✅ `Type/Testing`
Priority label	✅ `Priority/Critical`
MoSCoW label	❌ Missing (issue has `MoSCoW/Must have`)

PR Body

❌ Empty. The PR body field is blank. Per CONTRIBUTING.md, PRs should include a summary of changes.

Test Coverage

✅ Behave BDD: 3 scenarios in .feature file with step definitions — good.
✅ Robot Framework: 3 corresponding test cases in .robot file with helper — good.
The @tdd_expected_fail / tdd_expected_fail tagging pattern correctly handles the TDD inversion (tests pass CI while the bug exists, fail once fixed).
ASV benchmark: not included, but this is a TDD diagnostic/audit PR, not a feature — benchmarks are not applicable here.

Code Quality

The AST-based analysis approach is well-designed: it programmatically inspects the M1-M6 helper files rather than hardcoding expected outcomes.
The FunctionAnalysis dataclass and helper functions (_is_cli_runner_invoke, _is_patch_call, etc.) are clean and well-documented.
There is significant code duplication between features/steps/tdd_e2e_mock_only_coverage_steps.py and robot/helper_tdd_e2e_mock_only_coverage.py — the FunctionAnalysis dataclass, _analyze_helper(), and all AST utility functions are copy-pasted. Consider extracting the shared analysis logic into a common module (e.g., cleveragents/testing/e2e_mock_audit.py) and importing from both the Behave steps and the Robot helper. This reduces maintenance burden and prevents the two copies from diverging.
The _SERVICE_MOCK_INDICATORS frozenset is identical in both files — another reason to share code.

Required Changes

Add Closes #658 (or appropriate keyword) to the PR body.
Write a PR body with at minimum a summary of what the PR does and why.
Add MoSCoW/Must have label to the PR.

Suggestion (non-blocking)

Extract duplicated AST analysis code into a shared module to avoid maintaining two identical copies.

## Review — PR #738: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658) ### Diff Summary - **Files changed:** 4 (3 new files + 1 new feature file) - `features/tdd_e2e_mock_only_coverage.feature` (new, 37 lines) - `features/steps/tdd_e2e_mock_only_coverage_steps.py` (new, 257 lines) - `robot/tdd_e2e_mock_only_coverage.robot` (new, 44 lines) - `robot/helper_tdd_e2e_mock_only_coverage.py` (new, 254 lines) - **Lines:** +476 / −0 (all additions) ### Commit Message Compliance - Issue #658 does **not** have a Metadata section with a prescribed Commit Message field — it is a bug report, not a standard task ticket. - PR commit first line: `test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)` - Since no prescribed commit message exists, this is acceptable. ✅ ### Closes Keyword - ❌ **Missing.** The PR body is completely empty. There is no `Closes #658` keyword anywhere. The title references `bug #658` but that is not a recognized auto-close syntax. ### Label Compliance | Required Label | Status | |---|---| | Type label | ✅ `Type/Testing` | | Priority label | ✅ `Priority/Critical` | | MoSCoW label | ❌ Missing (issue has `MoSCoW/Must have`) | ### PR Body - ❌ **Empty.** The PR body field is blank. Per CONTRIBUTING.md, PRs should include a summary of changes. ### Test Coverage - ✅ Behave BDD: 3 scenarios in `.feature` file with step definitions — good. - ✅ Robot Framework: 3 corresponding test cases in `.robot` file with helper — good. - The `@tdd_expected_fail` / `tdd_expected_fail` tagging pattern correctly handles the TDD inversion (tests pass CI while the bug exists, fail once fixed). - ASV benchmark: not included, but this is a TDD diagnostic/audit PR, not a feature — benchmarks are not applicable here. ### Code Quality - The AST-based analysis approach is well-designed: it programmatically inspects the M1-M6 helper files rather than hardcoding expected outcomes. - The `FunctionAnalysis` dataclass and helper functions (`_is_cli_runner_invoke`, `_is_patch_call`, etc.) are clean and well-documented. - There is significant code duplication between `features/steps/tdd_e2e_mock_only_coverage_steps.py` and `robot/helper_tdd_e2e_mock_only_coverage.py` — the `FunctionAnalysis` dataclass, `_analyze_helper()`, and all AST utility functions are copy-pasted. Consider extracting the shared analysis logic into a common module (e.g., `cleveragents/testing/e2e_mock_audit.py`) and importing from both the Behave steps and the Robot helper. This reduces maintenance burden and prevents the two copies from diverging. - The `_SERVICE_MOCK_INDICATORS` frozenset is identical in both files — another reason to share code. ### Required Changes 1. **Add `Closes #658`** (or appropriate keyword) to the PR body. 2. **Write a PR body** with at minimum a summary of what the PR does and why. 3. **Add `MoSCoW/Must have` label** to the PR. ### Suggestion (non-blocking) 4. Extract duplicated AST analysis code into a shared module to avoid maintaining two identical copies.

~~brent.edwards referenced this pull request 2026-03-12 20:42:43 +00:00~~

fix(test): convert M1-M6 E2E suites to real subprocess CLI invocations (closes #658) #784

freemo referenced this pull request

2026-03-12 20:50:43 +00:00

bug(test): M1–M6 E2E verification suites use mocks instead of exercising the real system #658

freemo added the

labels 2026-03-12 20:57:15 +00:00

brent.edwards referenced this pull request

2026-03-12 20:57:40 +00:00

fix(test): convert M1-M6 E2E suites to real subprocess CLI invocations (closes #658) #784

freemo referenced this pull request

2026-03-12 21:04:47 +00:00

bug(test): M1–M6 E2E verification suites use mocks instead of exercising the real system #658

brent.edwards added 1 commit 2026-03-12 21:43:26 +00:00

Merge branch 'master' into tdd/m6-e2e-mock-only-coverage

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / e2e_tests (pull_request) Successful in 34s

Details

CI / security (pull_request) Successful in 37s

Details

CI / typecheck (pull_request) Successful in 39s

Details

CI / unit_tests (pull_request) Successful in 4m46s

Details

CI / integration_tests (pull_request) Successful in 5m11s

Details

CI / docker (pull_request) Successful in 42s

Details

CI / coverage (pull_request) Successful in 5m35s

Details

CI / benchmark-regression (pull_request) Successful in 36m16s

Details

703511a28f

brent.edwards commented

2026-03-12 21:50:29 +00:00

Response to Review #2172 (@freemo)

All 3 required changes addressed:

Closes keyword: PR body now includes Closes #697 (the TDD tracking issue) and Refs: #658 (the parent bug). Per TDD workflow, it's PR #784 (the fix) that Closes #658 — this PR closes the TDD issue #697.
PR body: Populated with full summary, test coverage breakdown, detection method, and TDD workflow notes.
MoSCoW/Must have label: Already present on the PR.

Re: suggestion #4 (extract duplicated AST analysis code) — acknowledged. This is a valid DRY improvement. Given this is a TDD diagnostic PR with a short lifespan (will be superseded by #784's fix removing @tdd_expected_fail), I'd prefer to address it as a follow-up to keep this PR focused on the TDD tests. The duplication is between the Behave steps and Robot helper, which are in different test frameworks with different import paths — a shared module is the right approach but adds complexity to the build for what are disposable diagnostic tests.

## Response to Review #2172 (@freemo) All 3 required changes addressed: 1. **`Closes` keyword**: PR body now includes `Closes #697` (the TDD tracking issue) and `Refs: #658` (the parent bug). Per TDD workflow, it's PR #784 (the fix) that `Closes #658` — this PR closes the TDD issue #697. 2. **PR body**: Populated with full summary, test coverage breakdown, detection method, and TDD workflow notes. 3. **`MoSCoW/Must have` label**: Already present on the PR. Re: suggestion #4 (extract duplicated AST analysis code) — acknowledged. This is a valid DRY improvement. Given this is a TDD diagnostic PR with a short lifespan (will be superseded by #784's fix removing `@tdd_expected_fail`), I'd prefer to address it as a follow-up to keep this PR focused on the TDD tests. The duplication is between the Behave steps and Robot helper, which are in different test frameworks with different import paths — a shared module is the right approach but adds complexity to the build for what are disposable diagnostic tests.

brent.edwards referenced this pull request

2026-03-12 21:54:42 +00:00

fix(test): convert M1-M6 E2E suites to real subprocess CLI invocations (closes #658) #784

brent.edwards referenced this pull request

2026-03-12 21:54:53 +00:00

fix(test): convert M1-M6 E2E suites to real subprocess CLI invocations (closes #658) #784

brent.edwards merged commit 9141676329 into master

2026-03-12 22:02:02 +00:00

brent.edwards deleted branch tdd/m6-e2e-mock-only-coverage

2026-03-12 22:02:03 +00:00

brent.edwards referenced this issue from a commit

2026-03-12 22:02:03 +00:00