BUG-HUNT: [error-handling] Overly Broad Exception Handling in Agent Graphs #1824

Open
opened 2026-04-02 23:55:48 +00:00 by freemo · 4 comments
Owner

Metadata

  • Branch: fix/error-handling-agent-graphs
  • Commit Message: fix(agents): replace broad exception handlers with specific exceptions in agent graphs
  • Milestone: v3.6.0
  • Parent Epic: ⚠️ No open parent Epic found for v3.6.0 — requires manual linking by a maintainer.

Background and Context

The agent graph modules (context_analysis.py and plan_generation.py) use overly broad except Exception: handlers throughout their core node functions. Per the project's error-handling standards (CONTRIBUTING.md — Exception Propagation and Fail-Fast Principles), exceptions must not be suppressed and should only be caught when they can be meaningfully handled. Broad catch-all handlers hide the root cause of bugs, swallow important exceptions from external services (LLM providers, JSON parsing), and make debugging significantly harder.

Current Behavior

The following functions use except Exception: (or except Exception as e:) as a catch-all:

  • src/cleveragents/agents/graphs/context_analysis.py:

    • _analyze_dependencies
    • _score_relevance
    • _summarize_context
  • src/cleveragents/agents/graphs/plan_generation.py:

    • _analyze_requirements
    • _generate_plan
    • _validate

Example:

# context_analysis.py
def _analyze_dependencies(self, state: ContextAnalysisState) -> dict[str, Any]:
    try:
        ...
    except Exception as exc:  # pragma: no cover - defensive
        ...

# plan_generation.py
def _analyze_requirements(self, state: PlanGenerationState) -> dict[str, Any]:
    try:
        ...
    except Exception as e:
        ...

Expected Behavior

Each except clause should catch only the specific exception types that can realistically occur and be meaningfully handled at that point:

  • LLM/provider interactions → catch specific LangChain/provider exceptions (e.g., langchain_core.exceptions.OutputParserException, provider-specific APIError, TimeoutError)
  • JSON parsing → catch json.JSONDecodeError
  • Any remaining truly unexpected exceptions should be allowed to propagate naturally to the top-level handler

Acceptance Criteria

  • All except Exception: and except Exception as ...: clauses in context_analysis.py and plan_generation.py are replaced with specific exception types
  • No exceptions are silently swallowed — any caught exception is either meaningfully handled (retry, cleanup, context enrichment) or re-raised
  • The # pragma: no cover - defensive comments are removed where the handlers are now properly tested
  • All existing tests continue to pass
  • New BDD scenarios cover the specific exception paths

Supporting Information

  • Bug reported by: ca-bug-hunter (Bug Hunting supervisor)
  • Related issue: #1409 (similar error-handling review for src/cleveragents/services)
  • CONTRIBUTING.md — Exception Propagation: "Do not use bare catch-all exception handlers without re-raising unless you have specific recovery logic."
  • CONTRIBUTING.md — Fail-Fast Principles: "No Silent Failures — Avoid returning null or default values when an error condition exists — raise exceptions or return explicit error types."

Subtasks

  • Audit _analyze_dependencies, _score_relevance, _summarize_context in context_analysis.py — identify what exceptions can realistically be raised
  • Audit _analyze_requirements, _generate_plan, _validate in plan_generation.py — identify what exceptions can realistically be raised
  • Replace each except Exception: with the narrowest applicable specific exception type(s)
  • Ensure any caught exception is either meaningfully handled or re-raised; remove silent swallowing
  • Remove # pragma: no cover - defensive comments where handlers are now testable
  • Tests (Behave): Add scenarios covering specific exception paths (e.g., LLM timeout, JSON parse error) for affected graph nodes
  • Tests (Behave): Add scenarios verifying that unexpected exceptions propagate correctly (are not swallowed)
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • All except Exception: catch-all handlers in context_analysis.py and plan_generation.py have been replaced with specific exception types.
  • No exceptions are silently swallowed in the affected functions.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/error-handling-agent-graphs` - **Commit Message**: `fix(agents): replace broad exception handlers with specific exceptions in agent graphs` - **Milestone**: v3.6.0 - **Parent Epic**: ⚠️ *No open parent Epic found for v3.6.0 — requires manual linking by a maintainer.* ## Background and Context The agent graph modules (`context_analysis.py` and `plan_generation.py`) use overly broad `except Exception:` handlers throughout their core node functions. Per the project's error-handling standards (CONTRIBUTING.md — *Exception Propagation* and *Fail-Fast Principles*), exceptions must not be suppressed and should only be caught when they can be meaningfully handled. Broad catch-all handlers hide the root cause of bugs, swallow important exceptions from external services (LLM providers, JSON parsing), and make debugging significantly harder. ## Current Behavior The following functions use `except Exception:` (or `except Exception as e:`) as a catch-all: - `src/cleveragents/agents/graphs/context_analysis.py`: - `_analyze_dependencies` - `_score_relevance` - `_summarize_context` - `src/cleveragents/agents/graphs/plan_generation.py`: - `_analyze_requirements` - `_generate_plan` - `_validate` Example: ```python # context_analysis.py def _analyze_dependencies(self, state: ContextAnalysisState) -> dict[str, Any]: try: ... except Exception as exc: # pragma: no cover - defensive ... # plan_generation.py def _analyze_requirements(self, state: PlanGenerationState) -> dict[str, Any]: try: ... except Exception as e: ... ``` ## Expected Behavior Each `except` clause should catch only the specific exception types that can realistically occur and be meaningfully handled at that point: - LLM/provider interactions → catch specific LangChain/provider exceptions (e.g., `langchain_core.exceptions.OutputParserException`, provider-specific `APIError`, `TimeoutError`) - JSON parsing → catch `json.JSONDecodeError` - Any remaining truly unexpected exceptions should be allowed to propagate naturally to the top-level handler ## Acceptance Criteria - [ ] All `except Exception:` and `except Exception as ...:` clauses in `context_analysis.py` and `plan_generation.py` are replaced with specific exception types - [ ] No exceptions are silently swallowed — any caught exception is either meaningfully handled (retry, cleanup, context enrichment) or re-raised - [ ] The `# pragma: no cover - defensive` comments are removed where the handlers are now properly tested - [ ] All existing tests continue to pass - [ ] New BDD scenarios cover the specific exception paths ## Supporting Information - Bug reported by: ca-bug-hunter (Bug Hunting supervisor) - Related issue: #1409 (similar error-handling review for `src/cleveragents/services`) - CONTRIBUTING.md — *Exception Propagation*: "Do not use bare catch-all exception handlers without re-raising unless you have specific recovery logic." - CONTRIBUTING.md — *Fail-Fast Principles*: "No Silent Failures — Avoid returning null or default values when an error condition exists — raise exceptions or return explicit error types." ## Subtasks - [ ] Audit `_analyze_dependencies`, `_score_relevance`, `_summarize_context` in `context_analysis.py` — identify what exceptions can realistically be raised - [ ] Audit `_analyze_requirements`, `_generate_plan`, `_validate` in `plan_generation.py` — identify what exceptions can realistically be raised - [ ] Replace each `except Exception:` with the narrowest applicable specific exception type(s) - [ ] Ensure any caught exception is either meaningfully handled or re-raised; remove silent swallowing - [ ] Remove `# pragma: no cover - defensive` comments where handlers are now testable - [ ] Tests (Behave): Add scenarios covering specific exception paths (e.g., LLM timeout, JSON parse error) for affected graph nodes - [ ] Tests (Behave): Add scenarios verifying that unexpected exceptions propagate correctly (are not swallowed) - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - All `except Exception:` catch-all handlers in `context_analysis.py` and `plan_generation.py` have been replaced with specific exception types. - No exceptions are silently swallowed in the affected functions. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-02 23:56:17 +00:00
Author
Owner

⚠️ Orphan Issue — Manual Linking Required

This issue was created automatically by the ca-new-issue-creator agent. No open parent Epic was found for the v3.6.0 milestone that covers agent graph error handling or hardening work.

The closest related issues are:

  • #1409 — BUG-HUNT: [error-handling] Review error handling in src/cleveragents/services (similar scope, v3.7.0)
  • Epic #366 — Post-MVP Deferred Work (closed, v3.6.0)

A maintainer should either:

  1. Link this issue to an appropriate existing Epic using Forgejo's dependency system (this issue blocks the parent Epic), or
  2. Create a new "Bug Fixes & Hardening" Epic for v3.6.0 and link this issue as a child.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

⚠️ **Orphan Issue — Manual Linking Required** This issue was created automatically by the ca-new-issue-creator agent. No open parent Epic was found for the v3.6.0 milestone that covers agent graph error handling or hardening work. The closest related issues are: - **#1409** — BUG-HUNT: [error-handling] Review error handling in `src/cleveragents/services` (similar scope, v3.7.0) - **Epic #366** — Post-MVP Deferred Work (closed, v3.6.0) A maintainer should either: 1. Link this issue to an appropriate existing Epic using Forgejo's dependency system (this issue **blocks** the parent Epic), or 2. Create a new "Bug Fixes & Hardening" Epic for v3.6.0 and link this issue as a child. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
Author
Owner

Note from triage: #3183 was closed as a duplicate of this issue. #3183 also included auto_debug.py in its scope (same broad exception handling pattern). Consider expanding this issue's scope to include auto_debug.py when working on it.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

**Note from triage:** #3183 was closed as a duplicate of this issue. #3183 also included `auto_debug.py` in its scope (same broad exception handling pattern). Consider expanding this issue's scope to include `auto_debug.py` when working on it. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium — This is a code quality issue (broad exception handling) that violates CONTRIBUTING.md guidelines. It does not cause data loss but can hide bugs and complicate debugging.
  • Milestone: v3.6.0 — Keeping in v3.6.0 as this is an advanced code quality improvement for the agent graph layer.
  • MoSCoW: Should Have — Per the error-handling guidelines, broad exception handlers should be replaced with specific types. This improves quality but is not blocking core functionality.
  • Parent Epic: Needs manual linking — no clear parent Epic exists for v3.6.0 error-handling work. Consider creating an Epic for error-handling hardening or linking to #362 (Security & Safety Hardening).

Note: #3183 was closed as a duplicate of this issue. The scope of auto_debug.py from #3183 should be included when this issue is worked on.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium — This is a code quality issue (broad exception handling) that violates CONTRIBUTING.md guidelines. It does not cause data loss but can hide bugs and complicate debugging. - **Milestone**: v3.6.0 — Keeping in v3.6.0 as this is an advanced code quality improvement for the agent graph layer. - **MoSCoW**: Should Have — Per the error-handling guidelines, broad exception handlers should be replaced with specific types. This improves quality but is not blocking core functionality. - **Parent Epic**: Needs manual linking — no clear parent Epic exists for v3.6.0 error-handling work. Consider creating an Epic for error-handling hardening or linking to #362 (Security & Safety Hardening). **Note:** #3183 was closed as a duplicate of this issue. The scope of `auto_debug.py` from #3183 should be included when this issue is worked on. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Transitioning issue #1824 from State/Unverified to State/Verified.

Removed labels:

  • State/Unverified (ID 1320)

Added labels:

  • State/Verified (ID 1321)

Current labels now:

  • Priority/Medium (ID 1331)
  • Type/Bug (ID 1325)
  • State/Verified (ID 1321)

Preconditions checked:

  • Transition from State/Unverified to State/Verified requires maintainer triage. This transition is permitted per CONTRIBUTING.md's Ticket Lifecycle.

If you want me to also close or update the issue state in the tracker, tell me.


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-state-updater

Transitioning issue #1824 from State/Unverified to State/Verified. Removed labels: - State/Unverified (ID 1320) Added labels: - State/Verified (ID 1321) Current labels now: - Priority/Medium (ID 1331) - Type/Bug (ID 1325) - State/Verified (ID 1321) Preconditions checked: - Transition from State/Unverified to State/Verified requires maintainer triage. This transition is permitted per CONTRIBUTING.md's Ticket Lifecycle. If you want me to also close or update the issue state in the tracker, tell me. --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-state-updater
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1824
No description provided.