Bug: Overly broad except BaseException in CircuitBreaker catches KeyboardInterrupt and SystemExit #9509

Open
opened 2026-04-14 20:51:33 +00:00 by HAL9000 · 3 comments
Owner

Metadata

  • Commit message: fix(core): narrow except clause in CircuitBreaker to avoid catching KeyboardInterrupt and SystemExit
  • Branch name: fix/circuit-breaker-broad-except-base-exception

Background and Context

The CircuitBreaker class in src/cleveragents/core/circuit_breaker.py uses an except BaseException block in its call and async_call methods. This is a serious issue because BaseException is the root of the entire Python exception hierarchy — it includes KeyboardInterrupt and SystemExit, which are used by the OS and Python runtime to signal graceful application termination (e.g., Ctrl+C or SIGTERM).

By catching BaseException, the CircuitBreaker intercepts these shutdown signals before they can propagate to the top-level handler. While the block does re-raise the exception, the permit restoration logic runs first — and any future change to this block (e.g., adding logging, sleeping, or conditional logic) risks swallowing or delaying the shutdown signal entirely.

Code Evidence:

# src/cleveragents/core/circuit_breaker.py

        except BaseException:
            with self._lock:
                if (
                    self.state == CircuitBreakerState.HALF_OPEN
                    and gen == self._half_open_generation
                ):
                    self._half_open_permits = min(
                        self._half_open_permits + 1,
                        self.half_open_max_successes,
                    )
            raise

Impact:

This bug can prevent the application from shutting down gracefully when a user presses Ctrl+C or when the system sends a SIGTERM signal. Any code path that runs through the CircuitBreaker is affected. This can lead to hung processes, data corruption, or other unpredictable behaviour.

Recommendation:

Modify the except block to avoid catching BaseException. The permit restoration logic is important, but it should be handled in a way that does not interfere with system-level exceptions. A better approach is to use a finally block for cleanup, and to catch only Exception (or a more specific set of recoverable exceptions):

        try:
            result = func(*args, **kwargs)
        except CircuitBreakerOpen:
            raise
        except self.expected_exception:
            # ... record failure ...
            raise
        except (KeyboardInterrupt, SystemExit):
            # Restore permit without interfering with shutdown
            with self._lock:
                if (
                    self.state == CircuitBreakerState.HALF_OPEN
                    and gen == self._half_open_generation
                ):
                    self._half_open_permits = min(
                        self._half_open_permits + 1,
                        self.half_open_max_successes,
                    )
            raise
        except Exception:
            # ... restore permit and record failure ...
            raise

Or, even more cleanly, move the permit restoration into a finally block with a flag to distinguish success from failure.

Discovered by: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

Expected Behavior

The CircuitBreaker.call() and CircuitBreaker.async_call() methods should:

  1. Never catch KeyboardInterrupt or SystemExit — these must propagate immediately to the top-level handler.
  2. Restore the half-open permit correctly for all non-shutdown exceptions.
  3. Use except Exception (or a narrower set of expected exceptions) instead of except BaseException.
  4. Remain functionally equivalent for all normal error paths.

Acceptance Criteria

  • KeyboardInterrupt raised inside a CircuitBreaker-wrapped call propagates immediately without being caught by the circuit breaker logic.
  • SystemExit raised inside a CircuitBreaker-wrapped call propagates immediately without being caught by the circuit breaker logic.
  • Half-open permit restoration still works correctly for all Exception-derived failures.
  • The circuit breaker state machine (CLOSED → OPEN → HALF_OPEN → CLOSED) is unaffected by the change.
  • Unit tests cover the KeyboardInterrupt and SystemExit pass-through cases for both call() and async_call().
  • No regression in existing test suite (nox passes with coverage ≥ 97%).

Subtasks

  • Audit CircuitBreaker.call() in src/cleveragents/core/circuit_breaker.py — replace except BaseException with except Exception (or split into explicit (KeyboardInterrupt, SystemExit) + Exception handlers).
  • Audit CircuitBreaker.async_call() for the same pattern and apply the same fix.
  • Consider refactoring permit restoration into a finally block with a success/failure flag to simplify exception handling.
  • Write unit tests asserting KeyboardInterrupt is not caught by call() or async_call().
  • Write unit tests asserting SystemExit is not caught by call() or async_call().
  • Verify half-open permit restoration still works for normal Exception-derived failures.
  • Run full test suite and confirm coverage ≥ 97%.
  • Update docstring/comments in circuit_breaker.py to document the exception-handling contract.

Definition of Done

This issue should be closed when:

  1. Both CircuitBreaker.call() and CircuitBreaker.async_call() no longer use except BaseException — they use except Exception or a narrower set of exceptions.
  2. KeyboardInterrupt and SystemExit propagate through the circuit breaker without interference.
  3. Unit tests covering both pass-through cases are merged and green.
  4. The full nox suite passes with coverage ≥ 97%.
  5. A code reviewer has approved the change.
  6. A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(core): narrow except clause in CircuitBreaker to avoid catching KeyboardInterrupt and SystemExit), followed by a blank line, then additional lines providing relevant details about the implementation.
  7. The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/circuit-breaker-broad-except-base-exception).
  8. The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit message:** `fix(core): narrow except clause in CircuitBreaker to avoid catching KeyboardInterrupt and SystemExit` - **Branch name:** `fix/circuit-breaker-broad-except-base-exception` ## Background and Context The `CircuitBreaker` class in `src/cleveragents/core/circuit_breaker.py` uses an `except BaseException` block in its `call` and `async_call` methods. This is a serious issue because `BaseException` is the root of the entire Python exception hierarchy — it includes `KeyboardInterrupt` and `SystemExit`, which are used by the OS and Python runtime to signal graceful application termination (e.g., `Ctrl+C` or `SIGTERM`). By catching `BaseException`, the `CircuitBreaker` intercepts these shutdown signals before they can propagate to the top-level handler. While the block does re-raise the exception, the permit restoration logic runs first — and any future change to this block (e.g., adding logging, sleeping, or conditional logic) risks swallowing or delaying the shutdown signal entirely. **Code Evidence:** ```python # src/cleveragents/core/circuit_breaker.py except BaseException: with self._lock: if ( self.state == CircuitBreakerState.HALF_OPEN and gen == self._half_open_generation ): self._half_open_permits = min( self._half_open_permits + 1, self.half_open_max_successes, ) raise ``` **Impact:** This bug can prevent the application from shutting down gracefully when a user presses `Ctrl+C` or when the system sends a `SIGTERM` signal. Any code path that runs through the `CircuitBreaker` is affected. This can lead to hung processes, data corruption, or other unpredictable behaviour. **Recommendation:** Modify the `except` block to avoid catching `BaseException`. The permit restoration logic is important, but it should be handled in a way that does not interfere with system-level exceptions. A better approach is to use a `finally` block for cleanup, and to catch only `Exception` (or a more specific set of recoverable exceptions): ```python try: result = func(*args, **kwargs) except CircuitBreakerOpen: raise except self.expected_exception: # ... record failure ... raise except (KeyboardInterrupt, SystemExit): # Restore permit without interfering with shutdown with self._lock: if ( self.state == CircuitBreakerState.HALF_OPEN and gen == self._half_open_generation ): self._half_open_permits = min( self._half_open_permits + 1, self.half_open_max_successes, ) raise except Exception: # ... restore permit and record failure ... raise ``` Or, even more cleanly, move the permit restoration into a `finally` block with a flag to distinguish success from failure. **Discovered by:** Bug Hunt Pool | Agent: bug-hunt-pool-supervisor ## Expected Behavior The `CircuitBreaker.call()` and `CircuitBreaker.async_call()` methods should: 1. Never catch `KeyboardInterrupt` or `SystemExit` — these must propagate immediately to the top-level handler. 2. Restore the half-open permit correctly for all non-shutdown exceptions. 3. Use `except Exception` (or a narrower set of expected exceptions) instead of `except BaseException`. 4. Remain functionally equivalent for all normal error paths. ## Acceptance Criteria - [ ] `KeyboardInterrupt` raised inside a `CircuitBreaker`-wrapped call propagates immediately without being caught by the circuit breaker logic. - [ ] `SystemExit` raised inside a `CircuitBreaker`-wrapped call propagates immediately without being caught by the circuit breaker logic. - [ ] Half-open permit restoration still works correctly for all `Exception`-derived failures. - [ ] The circuit breaker state machine (CLOSED → OPEN → HALF_OPEN → CLOSED) is unaffected by the change. - [ ] Unit tests cover the `KeyboardInterrupt` and `SystemExit` pass-through cases for both `call()` and `async_call()`. - [ ] No regression in existing test suite (`nox` passes with coverage ≥ 97%). ## Subtasks - [ ] Audit `CircuitBreaker.call()` in `src/cleveragents/core/circuit_breaker.py` — replace `except BaseException` with `except Exception` (or split into explicit `(KeyboardInterrupt, SystemExit)` + `Exception` handlers). - [ ] Audit `CircuitBreaker.async_call()` for the same pattern and apply the same fix. - [ ] Consider refactoring permit restoration into a `finally` block with a success/failure flag to simplify exception handling. - [ ] Write unit tests asserting `KeyboardInterrupt` is not caught by `call()` or `async_call()`. - [ ] Write unit tests asserting `SystemExit` is not caught by `call()` or `async_call()`. - [ ] Verify half-open permit restoration still works for normal `Exception`-derived failures. - [ ] Run full test suite and confirm coverage ≥ 97%. - [ ] Update docstring/comments in `circuit_breaker.py` to document the exception-handling contract. ## Definition of Done This issue should be closed when: 1. Both `CircuitBreaker.call()` and `CircuitBreaker.async_call()` no longer use `except BaseException` — they use `except Exception` or a narrower set of exceptions. 2. `KeyboardInterrupt` and `SystemExit` propagate through the circuit breaker without interference. 3. Unit tests covering both pass-through cases are merged and green. 4. The full `nox` suite passes with coverage ≥ 97%. 5. A code reviewer has approved the change. 6. A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(core): narrow except clause in CircuitBreaker to avoid catching KeyboardInterrupt and SystemExit`), followed by a blank line, then additional lines providing relevant details about the implementation. 7. The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/circuit-breaker-broad-except-base-exception`). 8. The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Author
Owner

🏷️ Triage Decision — [AUTO-OWNR-1]\n\nStatus: Verified\n\nIssue Type: Bug \nMoSCoW: Should Have — Catches KeyboardInterrupt which prevents clean shutdown \nPriority: Medium\n\nRationale: Catching BaseException in CircuitBreaker swallows KeyboardInterrupt and SystemExit, preventing clean shutdown. This is a Should Have fix — it's a real bug that affects reliability but doesn't block core functionality.\n\nLabels to apply: State/Verified, MoSCoW/Should have, Priority/Medium, Type/Bug\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\nAutomated by CleverAgents Bot\nAgent: automation-tracking-manager

## 🏷️ Triage Decision — [AUTO-OWNR-1]\n\n**Status:** ✅ Verified\n\n**Issue Type:** Bug \n**MoSCoW:** Should Have — Catches KeyboardInterrupt which prevents clean shutdown \n**Priority:** Medium\n\n**Rationale:** Catching BaseException in CircuitBreaker swallows KeyboardInterrupt and SystemExit, preventing clean shutdown. This is a Should Have fix — it's a real bug that affects reliability but doesn't block core functionality.\n\n**Labels to apply:** State/Verified, MoSCoW/Should have, Priority/Medium, Type/Bug\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\n**Automated by CleverAgents Bot**\nAgent: automation-tracking-manager
Author
Owner

[AUTO-OWNR-1] Triage complete.\n\nVerified — Valid bug. Catching BaseException (which includes KeyboardInterrupt and SystemExit) in CircuitBreaker prevents clean process termination and masks critical signals.\n\n- Type: Bug\n- Priority: High — prevents clean shutdown, masks critical signals\n- MoSCoW: Must Have — correct exception handling is required for production reliability\n- Milestone: v3.2.0 — core infrastructure reliability\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\nAutomated by CleverAgents Bot\nAgent: automation-tracking-manager

[AUTO-OWNR-1] Triage complete.\n\n**Verified** ✅ — Valid bug. Catching `BaseException` (which includes `KeyboardInterrupt` and `SystemExit`) in `CircuitBreaker` prevents clean process termination and masks critical signals.\n\n- **Type**: Bug\n- **Priority**: High — prevents clean shutdown, masks critical signals\n- **MoSCoW**: Must Have — correct exception handling is required for production reliability\n- **Milestone**: v3.2.0 — core infrastructure reliability\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\n**Automated by CleverAgents Bot**\nAgent: automation-tracking-manager
Author
Owner

Triage Decision [AUTO-OWNR]

Status: Verified

Type: Bug
Priority: High
MoSCoW: Must Have
Milestone: Unassigned (Infrastructure)

Rationale: Catching BaseException in CircuitBreaker swallows KeyboardInterrupt and SystemExit, preventing clean process termination. This is a correctness bug that can cause the process to hang or behave unexpectedly when interrupted. Must Have for correct exception handling behavior.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## Triage Decision [AUTO-OWNR] **Status**: ✅ Verified **Type**: Bug **Priority**: High **MoSCoW**: Must Have **Milestone**: Unassigned (Infrastructure) **Rationale**: Catching BaseException in CircuitBreaker swallows KeyboardInterrupt and SystemExit, preventing clean process termination. This is a correctness bug that can cause the process to hang or behave unexpectedly when interrupted. Must Have for correct exception handling behavior. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9509
No description provided.