test-infra: Add Semgrep guard for broad exception suppression #9103

Open
opened 2026-04-14 07:36:14 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit Message: chore(testing): enforce semgrep gate for suppressed exceptions
  • Branch: chore/test-infra-broad-exception-lint

Background and Context

  • CONTRIBUTING.md explicitly states: "CRITICAL: Do not suppress errors. Let exceptions propagate to top-level execution."
  • We continue to ship code that violates this rule. Issue #9086 (CheckpointService), #9072 (ParallelStrategyExecutor), and #9060 (reactive_registry_adapter) are the most recent examples where a try/except block hid production defects until humans noticed the problem.
  • Static analysis does not currently guard against this. grep finds 337 occurrences of except Exception across src/, and 19 more uses of contextlib.suppress(Exception) (e.g., CheckpointService._compute_diff_snapshot, reactive_registry_adapter.register_registry_agents, various CLI commands). No gate fails when developers add another suppression.
  • Our lint stack only runs Ruff (E/F/W/B/UP/SIM/RUF) plus a Semgrep config that checks eval/exec. Nothing inspects broad exception handling, so the guideline is purely manual.

Current Behavior

  • nox -s lint only runs ruff check and passes even when except Exception: return [] is committed.
  • The local Semgrep hook (semgrep-eval-exec) ignores error-handling patterns entirely.
  • Developers have no automated feedback when they accidentally introduce a broad except that swallows failures, which is why regressions keep reappearing in services, CLI commands, and adapters.

Expected Behavior

  • A mandatory static-analysis rule fails CI (and pre-commit) whenever code adds except Exception or contextlib.suppress(Exception) without re-raising or narrowing the failure.
  • Intentional suppressions must be rare, reviewed, and explicitly annotated (e.g., # error-propagation: allow) so they are auditable.
  • The lint session under nox runs the same rule so CI, developers, and bots see consistent failures.

Proposal

  1. Extend .semgrep.yml with a python-no-suppressed-exception rule that matches except Exception / except BaseException blocks lacking a raise (including raise exc / raise / raise CustomError from exc) and surfaces a high-severity violation. Provide an opt-out comment (# error-propagation: allow) that the rule recognizes via pattern-not, so the few legitimate cases are documented.
  2. Add a companion rule (python-no-suppress-exception) that flags contextlib.suppress(Exception) and contextlib.suppress(BaseException) without the same annotation.
  3. Update the local pre-commit hook (or add a dedicated one) to run semgrep --config=.semgrep.yml so the new rules execute for developers.
  4. Update the lint nox session to invoke Semgrep after Ruff (e.g., session.run("semgrep", "--config=.semgrep.yml", "src/")) so CI enforces the new rules.
  5. Document the annotation pattern in CONTRIBUTING.md (Error Handling section) with guidance on when a suppression is acceptable and how to justify it.
  6. Include migration guidance in the issue implementation plan: run the new rule in audit mode to produce an allowlist of current violations, fix the high-risk services first (CheckpointService, reactive registry, CLI command helpers), and apply annotations only where documented recovery logic exists.

Acceptance Criteria

  • .semgrep.yml contains rules that fail on a synthetic except Exception: pass example and ignore cases that re-raise or include # error-propagation: allow.
  • nox -s lint runs Ruff and Semgrep; CI fails when the new rule triggers.
  • Pre-commit fails locally when a developer introduces a broad suppression without the required annotation.
  • CONTRIBUTING.md documents the annotation policy and links to the new lint rule.
  • Implementation plan enumerates how existing suppressions will be triaged (fix vs. annotated) so the rule can land without leaving CI red indefinitely.

Subtasks

  • Add python-no-suppressed-exception and python-no-suppress-exception rules to .semgrep.yml with tests/examples in the rule description.
  • Wire Semgrep into nox -s lint and ensure developers can run it locally (semgrep --config=.semgrep.yml src/).
  • Update .pre-commit-config.yaml so the new Semgrep rules run automatically.
  • Document the "# error-propagation: allow" escape hatch and when it is acceptable (rare, justified with inline comment).
  • Produce an initial audit (attach semgrep output or summary) identifying the hot spots the implementor should address.

Definition of Done

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines describing the change.
  • The commit is pushed to the remote on the branch in Metadata.
  • A pull request targeting master is opened, reviewed, and merged.

Duplicate Check

  1. Queried all open issues for "error suppression" (pages 1–12 via /issues?state=open&limit=50&page=N) — saw #9086, #9072, #9060, etc., none propose lint automation.
  2. Filtered the open backlog for "broad exception" (Semgrep/JQ scan) — found issue-level bugs (#9054, #8452, #8242, #8112, #7950, #7938, #7904) but no testing-infra proposal.
  3. Aggregated all open issues and searched for "semgrep" — zero matches; no existing static-analysis upgrade in flight.
  4. Aggregated all open issues and searched for "lint rule" — zero matches; no proposal collides with this request.
  5. Reviewed closed issues containing "semgrep" or "lint rule" (pages 1–11 of closed backlog) — only automation status reports (#8774, #4996) surfaced; nothing about broad-exception linting.

Automated by CleverAgents Bot

## Metadata - **Commit Message**: `chore(testing): enforce semgrep gate for suppressed exceptions` - **Branch**: `chore/test-infra-broad-exception-lint` ## Background and Context - CONTRIBUTING.md explicitly states: "**CRITICAL: Do not suppress errors. Let exceptions propagate to top-level execution.**" - We continue to ship code that violates this rule. Issue #9086 (CheckpointService), #9072 (ParallelStrategyExecutor), and #9060 (reactive_registry_adapter) are the most recent examples where a try/except block hid production defects until humans noticed the problem. - Static analysis does not currently guard against this. `grep` finds 337 occurrences of `except Exception` across `src/`, and 19 more uses of `contextlib.suppress(Exception)` (e.g., `CheckpointService._compute_diff_snapshot`, `reactive_registry_adapter.register_registry_agents`, various CLI commands). No gate fails when developers add another suppression. - Our lint stack only runs Ruff (E/F/W/B/UP/SIM/RUF) plus a Semgrep config that checks eval/exec. Nothing inspects broad exception handling, so the guideline is purely manual. ## Current Behavior - `nox -s lint` only runs `ruff check` and passes even when `except Exception: return []` is committed. - The local Semgrep hook (`semgrep-eval-exec`) ignores error-handling patterns entirely. - Developers have no automated feedback when they accidentally introduce a broad `except` that swallows failures, which is why regressions keep reappearing in services, CLI commands, and adapters. ## Expected Behavior - A mandatory static-analysis rule fails CI (and pre-commit) whenever code adds `except Exception` or `contextlib.suppress(Exception)` without re-raising or narrowing the failure. - Intentional suppressions must be rare, reviewed, and explicitly annotated (e.g., `# error-propagation: allow`) so they are auditable. - The lint session under `nox` runs the same rule so CI, developers, and bots see consistent failures. ## Proposal 1. Extend `.semgrep.yml` with a `python-no-suppressed-exception` rule that matches `except Exception` / `except BaseException` blocks lacking a `raise` (including `raise exc` / `raise` / `raise CustomError from exc`) and surfaces a high-severity violation. Provide an opt-out comment (`# error-propagation: allow`) that the rule recognizes via `pattern-not`, so the few legitimate cases are documented. 2. Add a companion rule (`python-no-suppress-exception`) that flags `contextlib.suppress(Exception)` and `contextlib.suppress(BaseException)` without the same annotation. 3. Update the local pre-commit hook (or add a dedicated one) to run `semgrep --config=.semgrep.yml` so the new rules execute for developers. 4. Update the `lint` nox session to invoke Semgrep after Ruff (e.g., `session.run("semgrep", "--config=.semgrep.yml", "src/")`) so CI enforces the new rules. 5. Document the annotation pattern in CONTRIBUTING.md (Error Handling section) with guidance on when a suppression is acceptable and how to justify it. 6. Include migration guidance in the issue implementation plan: run the new rule in audit mode to produce an allowlist of current violations, fix the high-risk services first (CheckpointService, reactive registry, CLI command helpers), and apply annotations only where documented recovery logic exists. ## Acceptance Criteria - [ ] `.semgrep.yml` contains rules that fail on a synthetic `except Exception: pass` example and ignore cases that re-raise or include `# error-propagation: allow`. - [ ] `nox -s lint` runs Ruff *and* Semgrep; CI fails when the new rule triggers. - [ ] Pre-commit fails locally when a developer introduces a broad suppression without the required annotation. - [ ] CONTRIBUTING.md documents the annotation policy and links to the new lint rule. - [ ] Implementation plan enumerates how existing suppressions will be triaged (fix vs. annotated) so the rule can land without leaving CI red indefinitely. ## Subtasks - [ ] Add `python-no-suppressed-exception` and `python-no-suppress-exception` rules to `.semgrep.yml` with tests/examples in the rule description. - [ ] Wire Semgrep into `nox -s lint` and ensure developers can run it locally (`semgrep --config=.semgrep.yml src/`). - [ ] Update `.pre-commit-config.yaml` so the new Semgrep rules run automatically. - [ ] Document the "# error-propagation: allow" escape hatch and when it is acceptable (rare, justified with inline comment). - [ ] Produce an initial audit (attach semgrep output or summary) identifying the hot spots the implementor should address. ## Definition of Done - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines describing the change. - The commit is pushed to the remote on the branch in Metadata. - A pull request targeting `master` is opened, reviewed, and merged. ### Duplicate Check 1. Queried all open issues for "error suppression" (pages 1–12 via `/issues?state=open&limit=50&page=N`) — saw #9086, #9072, #9060, etc., none propose lint automation. 2. Filtered the open backlog for "broad exception" (Semgrep/JQ scan) — found issue-level bugs (#9054, #8452, #8242, #8112, #7950, #7938, #7904) but no testing-infra proposal. 3. Aggregated all open issues and searched for "semgrep" — zero matches; no existing static-analysis upgrade in flight. 4. Aggregated all open issues and searched for "lint rule" — zero matches; no proposal collides with this request. 5. Reviewed closed issues containing "semgrep" or "lint rule" (pages 1–11 of closed backlog) — only automation status reports (#8774, #4996) surfaced; nothing about broad-exception linting. --- **Automated by CleverAgents Bot**
Author
Owner

🔍 Triage Decision

Status: VERIFIED

Type: Testing
MoSCoW: Should have
Priority: Medium
Milestone: v3.2.0

Reasoning: Adding a Semgrep guard for broad exception suppression improves code quality enforcement and prevents silent error swallowing; should be included in v3.2.0 as a test infrastructure improvement.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## 🔍 Triage Decision **Status:** ✅ VERIFIED **Type:** Testing **MoSCoW:** Should have **Priority:** Medium **Milestone:** v3.2.0 **Reasoning:** Adding a Semgrep guard for broad exception suppression improves code quality enforcement and prevents silent error swallowing; should be included in v3.2.0 as a test infrastructure improvement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
HAL9000 added this to the v3.2.0 milestone 2026-04-14 08:42:18 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#9103
No description provided.