test(e2e): TDD failing tests for E2E mock-only coverage (bug #658) #738

Merged
brent.edwards merged 2 commits from tdd/m6-e2e-mock-only-coverage into master 2026-03-12 22:02:02 +00:00
Member

Summary

TDD counterpart for bug #658. Adds failing tests that demonstrate M1-M6 E2E verification suites use mocked CliRunner invocations instead of real subprocess.run calls against the agents CLI binary.

Closes #697
Refs: #658

Test Coverage

Behave (3 scenarios)

  • features/tdd_e2e_mock_only_coverage.feature — 3 scenarios tagged @tdd_bug @tdd_bug_658 @tdd_expected_fail:
    1. At least one M1 E2E helper function uses real subprocess CLI invocation — AST-parses helper_m1_e2e_verification.py looking for subprocess.run calls
    2. At least one M3 E2E helper function uses real subprocess CLI invocation — same for M3
    3. At least one M6 E2E helper function uses real subprocess CLI invocation — same for M6

Robot Framework (3 test cases)

  • robot/tdd_e2e_mock_only_coverage.robot — 3 test cases tagged tdd_bug, tdd_bug_658, tdd_expected_fail:
    1. M1 E2E Helper Has Real CLI Subprocess Call
    2. M3 E2E Helper Has Real CLI Subprocess Call
    3. M6 E2E Helper Has Real CLI Subprocess Call

Detection Method

Both Behave step definitions and Robot helper use AST analysis to parse the Python helper files and detect:

  • subprocess.run(...) calls with agents in arguments
  • run_cli(...) calls (the shared helper from helper_e2e_common.py)

All 6 tests are tagged @tdd_expected_fail and are expected to fail on the current master branch. The companion bugfix PR #784 removes the @tdd_expected_fail tags after converting the helpers to use real subprocess invocations.

TDD Workflow

Per CONTRIBUTING.md: this TDD branch must merge to master first, then the bugfix branch (#784) implements the fix and removes @tdd_expected_fail.

## Summary TDD counterpart for bug #658. Adds failing tests that demonstrate M1-M6 E2E verification suites use mocked `CliRunner` invocations instead of real `subprocess.run` calls against the `agents` CLI binary. Closes #697 Refs: #658 ## Test Coverage ### Behave (3 scenarios) - `features/tdd_e2e_mock_only_coverage.feature` — 3 scenarios tagged `@tdd_bug @tdd_bug_658 @tdd_expected_fail`: 1. **At least one M1 E2E helper function uses real subprocess CLI invocation** — AST-parses `helper_m1_e2e_verification.py` looking for `subprocess.run` calls 2. **At least one M3 E2E helper function uses real subprocess CLI invocation** — same for M3 3. **At least one M6 E2E helper function uses real subprocess CLI invocation** — same for M6 ### Robot Framework (3 test cases) - `robot/tdd_e2e_mock_only_coverage.robot` — 3 test cases tagged `tdd_bug`, `tdd_bug_658`, `tdd_expected_fail`: 1. **M1 E2E Helper Has Real CLI Subprocess Call** 2. **M3 E2E Helper Has Real CLI Subprocess Call** 3. **M6 E2E Helper Has Real CLI Subprocess Call** ### Detection Method Both Behave step definitions and Robot helper use AST analysis to parse the Python helper files and detect: - `subprocess.run(...)` calls with `agents` in arguments - `run_cli(...)` calls (the shared helper from `helper_e2e_common.py`) All 6 tests are tagged `@tdd_expected_fail` and are expected to fail on the current `master` branch. The companion bugfix PR #784 removes the `@tdd_expected_fail` tags after converting the helpers to use real subprocess invocations. ## TDD Workflow Per CONTRIBUTING.md: this TDD branch must merge to master first, then the bugfix branch (#784) implements the fix and removes `@tdd_expected_fail`.
test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 20s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 47s
CI / unit_tests (pull_request) Successful in 3m19s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 5m17s
CI / coverage (pull_request) Successful in 5m38s
CI / benchmark-regression (pull_request) Successful in 35m56s
6806ef3620
Add Behave and Robot Framework TDD tests that use AST analysis to
inspect the M1-M6 E2E verification helper files and prove that all
21 CLI-facing test functions use unittest.mock.patch + CliRunner
instead of exercising the real CLI via subprocess invocation.

The tests verify three properties that currently fail (proving #658):
- At least one CLI-facing test uses subprocess instead of CliRunner
- At least one CLI-facing test exercises the real service layer
- Every suite with CLI tests has at least one unmocked test

Tagged @tdd_expected_fail so the tests pass CI while the bug is
present.  The @tdd_expected_fail tag inverts the result: the tests
pass because the underlying assertions fail (confirming the bug).
Once #658 is fixed, the tag is removed and the tests run normally.

Files added:
- features/tdd_e2e_mock_only_coverage.feature (3 scenarios)
- features/steps/tdd_e2e_mock_only_coverage_steps.py
- robot/tdd_e2e_mock_only_coverage.robot (3 test cases)
- robot/helper_tdd_e2e_mock_only_coverage.py

ISSUES CLOSED: #697
brent.edwards added this to the v3.5.0 milestone 2026-03-12 18:37:06 +00:00
freemo left a comment

Review — PR #738: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)

Diff Summary

  • Files changed: 4 (3 new files + 1 new feature file)
    • features/tdd_e2e_mock_only_coverage.feature (new, 37 lines)
    • features/steps/tdd_e2e_mock_only_coverage_steps.py (new, 257 lines)
    • robot/tdd_e2e_mock_only_coverage.robot (new, 44 lines)
    • robot/helper_tdd_e2e_mock_only_coverage.py (new, 254 lines)
  • Lines: +476 / −0 (all additions)

Commit Message Compliance

  • Issue #658 does not have a Metadata section with a prescribed Commit Message field — it is a bug report, not a standard task ticket.
  • PR commit first line: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)
  • Since no prescribed commit message exists, this is acceptable.

Closes Keyword

  • Missing. The PR body is completely empty. There is no Closes #658 keyword anywhere. The title references bug #658 but that is not a recognized auto-close syntax.

Label Compliance

Required Label Status
Type label Type/Testing
Priority label Priority/Critical
MoSCoW label Missing (issue has MoSCoW/Must have)

PR Body

  • Empty. The PR body field is blank. Per CONTRIBUTING.md, PRs should include a summary of changes.

Test Coverage

  • Behave BDD: 3 scenarios in .feature file with step definitions — good.
  • Robot Framework: 3 corresponding test cases in .robot file with helper — good.
  • The @tdd_expected_fail / tdd_expected_fail tagging pattern correctly handles the TDD inversion (tests pass CI while the bug exists, fail once fixed).
  • ASV benchmark: not included, but this is a TDD diagnostic/audit PR, not a feature — benchmarks are not applicable here.

Code Quality

  • The AST-based analysis approach is well-designed: it programmatically inspects the M1-M6 helper files rather than hardcoding expected outcomes.
  • The FunctionAnalysis dataclass and helper functions (_is_cli_runner_invoke, _is_patch_call, etc.) are clean and well-documented.
  • There is significant code duplication between features/steps/tdd_e2e_mock_only_coverage_steps.py and robot/helper_tdd_e2e_mock_only_coverage.py — the FunctionAnalysis dataclass, _analyze_helper(), and all AST utility functions are copy-pasted. Consider extracting the shared analysis logic into a common module (e.g., cleveragents/testing/e2e_mock_audit.py) and importing from both the Behave steps and the Robot helper. This reduces maintenance burden and prevents the two copies from diverging.
  • The _SERVICE_MOCK_INDICATORS frozenset is identical in both files — another reason to share code.

Required Changes

  1. Add Closes #658 (or appropriate keyword) to the PR body.
  2. Write a PR body with at minimum a summary of what the PR does and why.
  3. Add MoSCoW/Must have label to the PR.

Suggestion (non-blocking)

  1. Extract duplicated AST analysis code into a shared module to avoid maintaining two identical copies.
## Review — PR #738: test(e2e): TDD failing tests for E2E mock-only coverage (bug #658) ### Diff Summary - **Files changed:** 4 (3 new files + 1 new feature file) - `features/tdd_e2e_mock_only_coverage.feature` (new, 37 lines) - `features/steps/tdd_e2e_mock_only_coverage_steps.py` (new, 257 lines) - `robot/tdd_e2e_mock_only_coverage.robot` (new, 44 lines) - `robot/helper_tdd_e2e_mock_only_coverage.py` (new, 254 lines) - **Lines:** +476 / −0 (all additions) ### Commit Message Compliance - Issue #658 does **not** have a Metadata section with a prescribed Commit Message field — it is a bug report, not a standard task ticket. - PR commit first line: `test(e2e): TDD failing tests for E2E mock-only coverage (bug #658)` - Since no prescribed commit message exists, this is acceptable. ✅ ### Closes Keyword - ❌ **Missing.** The PR body is completely empty. There is no `Closes #658` keyword anywhere. The title references `bug #658` but that is not a recognized auto-close syntax. ### Label Compliance | Required Label | Status | |---|---| | Type label | ✅ `Type/Testing` | | Priority label | ✅ `Priority/Critical` | | MoSCoW label | ❌ Missing (issue has `MoSCoW/Must have`) | ### PR Body - ❌ **Empty.** The PR body field is blank. Per CONTRIBUTING.md, PRs should include a summary of changes. ### Test Coverage - ✅ Behave BDD: 3 scenarios in `.feature` file with step definitions — good. - ✅ Robot Framework: 3 corresponding test cases in `.robot` file with helper — good. - The `@tdd_expected_fail` / `tdd_expected_fail` tagging pattern correctly handles the TDD inversion (tests pass CI while the bug exists, fail once fixed). - ASV benchmark: not included, but this is a TDD diagnostic/audit PR, not a feature — benchmarks are not applicable here. ### Code Quality - The AST-based analysis approach is well-designed: it programmatically inspects the M1-M6 helper files rather than hardcoding expected outcomes. - The `FunctionAnalysis` dataclass and helper functions (`_is_cli_runner_invoke`, `_is_patch_call`, etc.) are clean and well-documented. - There is significant code duplication between `features/steps/tdd_e2e_mock_only_coverage_steps.py` and `robot/helper_tdd_e2e_mock_only_coverage.py` — the `FunctionAnalysis` dataclass, `_analyze_helper()`, and all AST utility functions are copy-pasted. Consider extracting the shared analysis logic into a common module (e.g., `cleveragents/testing/e2e_mock_audit.py`) and importing from both the Behave steps and the Robot helper. This reduces maintenance burden and prevents the two copies from diverging. - The `_SERVICE_MOCK_INDICATORS` frozenset is identical in both files — another reason to share code. ### Required Changes 1. **Add `Closes #658`** (or appropriate keyword) to the PR body. 2. **Write a PR body** with at minimum a summary of what the PR does and why. 3. **Add `MoSCoW/Must have` label** to the PR. ### Suggestion (non-blocking) 4. Extract duplicated AST analysis code into a shared module to avoid maintaining two identical copies.
Merge branch 'master' into tdd/m6-e2e-mock-only-coverage
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 34s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 4m46s
CI / integration_tests (pull_request) Successful in 5m11s
CI / docker (pull_request) Successful in 42s
CI / coverage (pull_request) Successful in 5m35s
CI / benchmark-regression (pull_request) Successful in 36m16s
703511a28f
Author
Member

Response to Review #2172 (@freemo)

All 3 required changes addressed:

  1. Closes keyword: PR body now includes Closes #697 (the TDD tracking issue) and Refs: #658 (the parent bug). Per TDD workflow, it's PR #784 (the fix) that Closes #658 — this PR closes the TDD issue #697.
  2. PR body: Populated with full summary, test coverage breakdown, detection method, and TDD workflow notes.
  3. MoSCoW/Must have label: Already present on the PR.

Re: suggestion #4 (extract duplicated AST analysis code) — acknowledged. This is a valid DRY improvement. Given this is a TDD diagnostic PR with a short lifespan (will be superseded by #784's fix removing @tdd_expected_fail), I'd prefer to address it as a follow-up to keep this PR focused on the TDD tests. The duplication is between the Behave steps and Robot helper, which are in different test frameworks with different import paths — a shared module is the right approach but adds complexity to the build for what are disposable diagnostic tests.

## Response to Review #2172 (@freemo) All 3 required changes addressed: 1. **`Closes` keyword**: PR body now includes `Closes #697` (the TDD tracking issue) and `Refs: #658` (the parent bug). Per TDD workflow, it's PR #784 (the fix) that `Closes #658` — this PR closes the TDD issue #697. 2. **PR body**: Populated with full summary, test coverage breakdown, detection method, and TDD workflow notes. 3. **`MoSCoW/Must have` label**: Already present on the PR. Re: suggestion #4 (extract duplicated AST analysis code) — acknowledged. This is a valid DRY improvement. Given this is a TDD diagnostic PR with a short lifespan (will be superseded by #784's fix removing `@tdd_expected_fail`), I'd prefer to address it as a follow-up to keep this PR focused on the TDD tests. The duplication is between the Behave steps and Robot helper, which are in different test frameworks with different import paths — a shared module is the right approach but adds complexity to the build for what are disposable diagnostic tests.
brent.edwards deleted branch tdd/m6-e2e-mock-only-coverage 2026-03-12 22:02:03 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!738
No description provided.