BUG-HUNT: [resource] Subprocess infinite hang risk in scripts/check-quality-gates.py — missing timeout parameters #7297

Open
opened 2026-04-10 15:30:13 +00:00 by HAL9000 · 4 comments
Owner

Background

All five subprocess calls in scripts/check-quality-gates.py (check_coverage, check_typecheck, check_security, check_dead_code, check_complexity) invoke external quality tools — coverage, pyright, bandit, vulture, and radon — without any timeout parameter. If any of these tools becomes unresponsive (e.g., due to a hung language server, a deadlocked analysis pass, or an I/O stall), the CI pipeline will block indefinitely with no recovery path.

This is a resource-management defect: the script holds a subprocess open with no upper bound on wall-clock time, consuming a CI executor slot until the job is manually cancelled or the runner's own global timeout fires (which may be hours away).

Current Behaviour

Every subprocess.run() call in the script omits timeout=:

# check_coverage() — no timeout
result = subprocess.run(
    ["coverage", "report", "--format=total"],
    capture_output=True,
    text=True,
    check=False,
)

# check_typecheck() — no timeout
result = subprocess.run(
    ["pyright", "--outputjson"],
    capture_output=True,
    text=True,
    check=False,
)

# check_security(), check_dead_code(), check_complexity() — same pattern

When a tool hangs, subprocess.run() blocks the calling thread forever. There is no subprocess.TimeoutExpired handler, no fallback, and no log message to aid diagnosis.

Expected Behaviour

  1. Every subprocess.run() call includes a timeout parameter (e.g., timeout=300 seconds as a safe default for CI).
  2. subprocess.TimeoutExpired is caught and converted to a gate failure with a clear diagnostic message (e.g., "Coverage check timed out after 300 s").
  3. Timeouts are configurable via environment variables (e.g., QUALITY_GATE_TIMEOUT_SECONDS) so operators can tune them without modifying the script.
  4. Timeout events are logged to aid post-mortem debugging.

Acceptance Criteria

  • All five gate functions (check_coverage, check_typecheck, check_security, check_dead_code, check_complexity) pass a timeout argument to subprocess.run().
  • A subprocess.TimeoutExpired handler is present in each function (or a shared wrapper) that returns (False, "<gate> timed out after <N> s").
  • Default timeout value is defined as a module-level constant and overridable via QUALITY_GATE_TIMEOUT_SECONDS environment variable.
  • No CI pipeline hang is possible due to an unresponsive quality tool.
  • All existing quality gate checks continue to pass after the fix.

Metadata

  • Branch: bugfix/resource-subprocess-timeout-quality-gates
  • Commit Message: fix(scripts): add subprocess timeouts to check-quality-gates.py to prevent CI infinite hangs
  • Milestone: (none — backlog)
  • Parent Epic: (orphan — see note below)

Subtasks

  • Audit all subprocess.run() calls in scripts/check-quality-gates.py and confirm none have timeout=
  • Define DEFAULT_SUBPROCESS_TIMEOUT module constant (default: 300)
  • Add QUALITY_GATE_TIMEOUT_SECONDS environment variable override logic
  • Refactor check_coverage() to pass timeout= and handle TimeoutExpired
  • Refactor check_typecheck() to pass timeout= and handle TimeoutExpired
  • Refactor check_security() to pass timeout= and handle TimeoutExpired
  • Refactor check_dead_code() to pass timeout= and handle TimeoutExpired
  • Refactor check_complexity() to pass timeout= and handle TimeoutExpired
  • Write BDD scenario (TDD issue will be created separately per bug workflow)
  • Verify all nox stages pass after fix

Definition of Done

  • DEFAULT_SUBPROCESS_TIMEOUT constant defined and used in all five gate functions
  • QUALITY_GATE_TIMEOUT_SECONDS env-var override implemented and documented
  • subprocess.TimeoutExpired handled in every gate function with a descriptive failure message
  • No subprocess call in the script can block indefinitely
  • All nox stages pass
  • Coverage >= 97%

TDD Note: Per the mandatory bug fix workflow in CONTRIBUTING.md, a Type/Testing issue will be created after this issue is verified. The TDD test will use tags @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before the fix is applied.

Backlog note: This issue was discovered during autonomous operation on milestone v3.7.0. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Acting on behalf of: Bug Hunter | Agent: new-issue-creator

## Background All five subprocess calls in `scripts/check-quality-gates.py` (`check_coverage`, `check_typecheck`, `check_security`, `check_dead_code`, `check_complexity`) invoke external quality tools — `coverage`, `pyright`, `bandit`, `vulture`, and `radon` — without any `timeout` parameter. If any of these tools becomes unresponsive (e.g., due to a hung language server, a deadlocked analysis pass, or an I/O stall), the CI pipeline will block indefinitely with no recovery path. This is a resource-management defect: the script holds a subprocess open with no upper bound on wall-clock time, consuming a CI executor slot until the job is manually cancelled or the runner's own global timeout fires (which may be hours away). ## Current Behaviour Every `subprocess.run()` call in the script omits `timeout=`: ```python # check_coverage() — no timeout result = subprocess.run( ["coverage", "report", "--format=total"], capture_output=True, text=True, check=False, ) # check_typecheck() — no timeout result = subprocess.run( ["pyright", "--outputjson"], capture_output=True, text=True, check=False, ) # check_security(), check_dead_code(), check_complexity() — same pattern ``` When a tool hangs, `subprocess.run()` blocks the calling thread forever. There is no `subprocess.TimeoutExpired` handler, no fallback, and no log message to aid diagnosis. ## Expected Behaviour 1. Every `subprocess.run()` call includes a `timeout` parameter (e.g., `timeout=300` seconds as a safe default for CI). 2. `subprocess.TimeoutExpired` is caught and converted to a gate failure with a clear diagnostic message (e.g., `"Coverage check timed out after 300 s"`). 3. Timeouts are configurable via environment variables (e.g., `QUALITY_GATE_TIMEOUT_SECONDS`) so operators can tune them without modifying the script. 4. Timeout events are logged to aid post-mortem debugging. ## Acceptance Criteria - [ ] All five gate functions (`check_coverage`, `check_typecheck`, `check_security`, `check_dead_code`, `check_complexity`) pass a `timeout` argument to `subprocess.run()`. - [ ] A `subprocess.TimeoutExpired` handler is present in each function (or a shared wrapper) that returns `(False, "<gate> timed out after <N> s")`. - [ ] Default timeout value is defined as a module-level constant and overridable via `QUALITY_GATE_TIMEOUT_SECONDS` environment variable. - [ ] No CI pipeline hang is possible due to an unresponsive quality tool. - [ ] All existing quality gate checks continue to pass after the fix. ## Metadata - **Branch**: `bugfix/resource-subprocess-timeout-quality-gates` - **Commit Message**: `fix(scripts): add subprocess timeouts to check-quality-gates.py to prevent CI infinite hangs` - **Milestone**: *(none — backlog)* - **Parent Epic**: *(orphan — see note below)* ## Subtasks - [ ] Audit all `subprocess.run()` calls in `scripts/check-quality-gates.py` and confirm none have `timeout=` - [ ] Define `DEFAULT_SUBPROCESS_TIMEOUT` module constant (default: `300`) - [ ] Add `QUALITY_GATE_TIMEOUT_SECONDS` environment variable override logic - [ ] Refactor `check_coverage()` to pass `timeout=` and handle `TimeoutExpired` - [ ] Refactor `check_typecheck()` to pass `timeout=` and handle `TimeoutExpired` - [ ] Refactor `check_security()` to pass `timeout=` and handle `TimeoutExpired` - [ ] Refactor `check_dead_code()` to pass `timeout=` and handle `TimeoutExpired` - [ ] Refactor `check_complexity()` to pass `timeout=` and handle `TimeoutExpired` - [ ] Write BDD scenario (TDD issue will be created separately per bug workflow) - [ ] Verify all `nox` stages pass after fix ## Definition of Done - [ ] `DEFAULT_SUBPROCESS_TIMEOUT` constant defined and used in all five gate functions - [ ] `QUALITY_GATE_TIMEOUT_SECONDS` env-var override implemented and documented - [ ] `subprocess.TimeoutExpired` handled in every gate function with a descriptive failure message - [ ] No subprocess call in the script can block indefinitely - [ ] All nox stages pass - [ ] Coverage >= 97% > **TDD Note:** Per the mandatory bug fix workflow in CONTRIBUTING.md, a `Type/Testing` issue will be created after this issue is verified. The TDD test will use tags `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before the fix is applied. > **Backlog note:** This issue was discovered during autonomous operation on milestone v3.7.0. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: Bug Hunter | Agent: new-issue-creator
Author
Owner

⚠️ Post-Creation Action Required

Labels Not Applied (MCP Tool Unavailable)

The label manager subagent was unable to apply labels due to a connectivity issue with the Forgejo MCP tool. The following labels must be applied manually:

Label Value
State State/Unverified (ID: 846)
Type Type/Bug (ID: 849)
Priority Priority/High (ID: 859)

Orphan Issue — No Parent Epic Found

No parent Epic was found for scripts/CI quality gate bugs. This issue is an orphan and must be manually linked to an appropriate parent Epic once one is identified or created.

Milestone

Per the Milestone Scope Guard: this is a High priority (non-Critical) bug discovered during autonomous operation. It has been placed in the backlog with no milestone assigned. A human reviewer should assign it to an appropriate future milestone.


Automated by CleverAgents Bot
Supervisor: Acting on behalf of: Bug Hunter | Agent: new-issue-creator

## ⚠️ Post-Creation Action Required ### Labels Not Applied (MCP Tool Unavailable) The label manager subagent was unable to apply labels due to a connectivity issue with the Forgejo MCP tool. The following labels must be applied manually: | Label | Value | |---|---| | State | `State/Unverified` (ID: 846) | | Type | `Type/Bug` (ID: 849) | | Priority | `Priority/High` (ID: 859) | ### Orphan Issue — No Parent Epic Found No parent Epic was found for scripts/CI quality gate bugs. This issue is an orphan and must be manually linked to an appropriate parent Epic once one is identified or created. ### Milestone Per the Milestone Scope Guard: this is a **High priority** (non-Critical) bug discovered during autonomous operation. It has been placed in the **backlog** with no milestone assigned. A human reviewer should assign it to an appropriate future milestone. --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: Bug Hunter | Agent: new-issue-creator
Author
Owner

Verified — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: subprocess infinite hang risk in check-quality-gates.py. MoSCoW: Must-have. Priority: High — can hang CI. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7297
No description provided.