test(infra): add stub keyword guard for placeholder detection #9101

Open
opened 2026-04-14 07:34:27 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • The plan diff --correction command is still guarded by a placeholder panel, and our Behave coverage (features/plan_cli_uncovered_region_coverage.feature) explicitly asserts that stub text instead of failing, so CI happily ships stubbed user flows.
  • The current lint/test pipeline (nox lint + Behave + Robot) does not include any guard that scans for stub/placeholder keywords (stub, placeholder, TODO, FIXME, NotImplementedError), allowing new placeholder implementations or CLI strings to merge undetected.

Findings

  • src/cleveragents/cli/commands/plan.py lines 3296-3305 still render a Correction Diff panel with the message "Correction diff will be available after M4.2."; there is no code path producing a real diff when --correction is supplied.
  • features/plan_cli_uncovered_region_coverage.feature (Scenario: "uncov-rgn plan diff with correction flag shows stub panel") codifies this stub behaviour by checking for "Correction Attempt" and the correction ID, so the test suite enforces the placeholder instead of failing.
  • The accompanying step definitions in features/steps/plan_cli_uncovered_region_coverage_steps.py simply capture the CLI output without asserting on real diff content, so there is no test signal if the stub persists.
  • A repo-wide scan currently shows 94 occurrences of NotImplementedError and numerous TODO/placeholder comments (e.g., grep highlighted 17 TODOs), but there is no automated allowlist/denylist to distinguish intentional abstract methods from shipping stubs.
  • The documented test harness (nox sessions) only run ruff, pyright, Behave, and Robot; none of them fail the build when a new placeholder keyword or stub string is introduced.

Proposal

  1. Add a stub keyword guard script (scripts/stub_guard.py) that walks src/, features/, and robot/ looking for high-risk keywords (case-insensitive stub, placeholder, TODO, FIXME, NotImplementedError).
    • Maintain an allowlist file (e.g., config/stub_guard_allowlist.txt) for approved occurrences such as abstract base classes or ADR references.
    • Report any new matches with file, line number, and context; exit non-zero so CI fails.
    • Allow developers to update the allowlist only after review via a dedicated nox -s stub_guard -- --update-allowlist gate to keep intentional exceptions explicit.
  2. Integrate the guard into CI by wiring a new nox session (stub_guard) that runs in the default lint pipeline (alongside nox -s lint). Also add it to the required GitHub/Forgejo workflow so any flagged keyword blocks merges.
  3. Tighten behaviour tests for correction diffs once the real implementation lands: replace the Behave stub scenario with a data-driven step that asserts diff output structure for rich/plain/json/yaml formats. The guard should also scan string literals so CLI placeholders like "available after" cannot reappear unnoticed.
  4. Surface actionable remediation: update CONTRIBUTING.md with the guard’s expectations, document how to review allowlist entries, and add a CI artefact (e.g., build/stub_guard_report.json) so teams can diff newly introduced stubs easily.

Acceptance Criteria

  • nox -s stub_guard runs locally and in CI; the default lint workflow fails when new stub/placeholder keywords appear outside the allowlist.
  • Behave coverage for plan diff --correction no longer asserts placeholder text; instead it validates real diff output across the supported formats.
  • The guard produces a human-readable report (file + line + snippet) and an allowlist-driven exit code path so legitimate abstract interfaces remain green while regressions fail the build.
  • Documentation in CONTRIBUTING.md (or a new docs/testing/stub-guard.md) explains how to maintain the allowlist and how the guard ties into nox / CI.

Duplicate Check

  1. GET /issues?q=placeholder&state=open&limit=50&page=1-2 – reviewed #9085, #9053, #8264, #6335. These track concrete stub features or cleanups, but none propose a keyword-based CI guard.
  2. GET /issues?q=NotImplementedError&state=open&limit=50&page=1-2 – results include #8150, #8010, #2280. They target implementing missing features rather than adding detection infrastructure.
  3. GET /issues?q=placeholder&state=closed&limit=50&page=1-2 – scanned for historical proposals; closed tickets focus on specific modules, not automated stub detection.
  4. GET /issues?q=TODO&state=open&limit=50&page=1 – entries like #9022 and #8889 request manual cleanup of TODO/FIXME comments without introducing tooling.
  5. GET /issues?q=stub%20detection&state=open&limit=50&page=1 – yielded #9085 and other feature bugs but no CI guard proposal; confirms no overlap with an automated keyword detector.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - The `plan diff --correction` command is still guarded by a placeholder panel, and our Behave coverage (`features/plan_cli_uncovered_region_coverage.feature`) explicitly asserts that stub text instead of failing, so CI happily ships stubbed user flows. - The current lint/test pipeline (nox `lint` + Behave + Robot) does not include any guard that scans for stub/placeholder keywords (`stub`, `placeholder`, `TODO`, `FIXME`, `NotImplementedError`), allowing new placeholder implementations or CLI strings to merge undetected. ## Findings - `src/cleveragents/cli/commands/plan.py` lines 3296-3305 still render a `Correction Diff` panel with the message "Correction diff will be available after M4.2."; there is no code path producing a real diff when `--correction` is supplied. - `features/plan_cli_uncovered_region_coverage.feature` (Scenario: "uncov-rgn plan diff with correction flag shows stub panel") codifies this stub behaviour by checking for "Correction Attempt" and the correction ID, so the test suite enforces the placeholder instead of failing. - The accompanying step definitions in `features/steps/plan_cli_uncovered_region_coverage_steps.py` simply capture the CLI output without asserting on real diff content, so there is no test signal if the stub persists. - A repo-wide scan currently shows 94 occurrences of `NotImplementedError` and numerous `TODO`/`placeholder` comments (e.g., `grep` highlighted 17 TODOs), but there is no automated allowlist/denylist to distinguish intentional abstract methods from shipping stubs. - The documented test harness (`nox` sessions) only run `ruff`, `pyright`, Behave, and Robot; none of them fail the build when a new placeholder keyword or stub string is introduced. ## Proposal 1. **Add a stub keyword guard script** (`scripts/stub_guard.py`) that walks `src/`, `features/`, and `robot/` looking for high-risk keywords (case-insensitive `stub`, `placeholder`, `TODO`, `FIXME`, `NotImplementedError`). - Maintain an allowlist file (e.g., `config/stub_guard_allowlist.txt`) for approved occurrences such as abstract base classes or ADR references. - Report any new matches with file, line number, and context; exit non-zero so CI fails. - Allow developers to update the allowlist only after review via a dedicated `nox -s stub_guard -- --update-allowlist` gate to keep intentional exceptions explicit. 2. **Integrate the guard into CI** by wiring a new `nox` session (`stub_guard`) that runs in the default lint pipeline (alongside `nox -s lint`). Also add it to the required GitHub/Forgejo workflow so any flagged keyword blocks merges. 3. **Tighten behaviour tests for correction diffs** once the real implementation lands: replace the Behave stub scenario with a data-driven step that asserts diff output structure for rich/plain/json/yaml formats. The guard should also scan string literals so CLI placeholders like "available after" cannot reappear unnoticed. 4. **Surface actionable remediation**: update `CONTRIBUTING.md` with the guard’s expectations, document how to review allowlist entries, and add a CI artefact (e.g., `build/stub_guard_report.json`) so teams can diff newly introduced stubs easily. ## Acceptance Criteria - `nox -s stub_guard` runs locally and in CI; the default lint workflow fails when new stub/placeholder keywords appear outside the allowlist. - Behave coverage for `plan diff --correction` no longer asserts placeholder text; instead it validates real diff output across the supported formats. - The guard produces a human-readable report (file + line + snippet) and an allowlist-driven exit code path so legitimate abstract interfaces remain green while regressions fail the build. - Documentation in `CONTRIBUTING.md` (or a new `docs/testing/stub-guard.md`) explains how to maintain the allowlist and how the guard ties into `nox` / CI. ### Duplicate Check 1. `GET /issues?q=placeholder&state=open&limit=50&page=1-2` – reviewed #9085, #9053, #8264, #6335. These track concrete stub features or cleanups, but none propose a keyword-based CI guard. 2. `GET /issues?q=NotImplementedError&state=open&limit=50&page=1-2` – results include #8150, #8010, #2280. They target implementing missing features rather than adding detection infrastructure. 3. `GET /issues?q=placeholder&state=closed&limit=50&page=1-2` – scanned for historical proposals; closed tickets focus on specific modules, not automated stub detection. 4. `GET /issues?q=TODO&state=open&limit=50&page=1` – entries like #9022 and #8889 request manual cleanup of TODO/FIXME comments without introducing tooling. 5. `GET /issues?q=stub%20detection&state=open&limit=50&page=1` – yielded #9085 and other feature bugs but no CI guard proposal; confirms no overlap with an automated keyword detector. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker
Author
Owner

🔍 Triage Decision

Status: VERIFIED

Type: Testing
MoSCoW: Should have
Priority: Medium
Milestone: v3.2.0

Reasoning: Adding a stub keyword guard for placeholder detection prevents incomplete implementations from being merged; should be included in v3.2.0 as a test infrastructure improvement.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## 🔍 Triage Decision **Status:** ✅ VERIFIED **Type:** Testing **MoSCoW:** Should have **Priority:** Medium **Milestone:** v3.2.0 **Reasoning:** Adding a stub keyword guard for placeholder detection prevents incomplete implementations from being merged; should be included in v3.2.0 as a test infrastructure improvement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
HAL9000 added this to the v3.2.0 milestone 2026-04-14 08:42:18 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9101
No description provided.