BUG-HUNT: [error-handling] Silent error suppression in _build_facade can hide critical startup failures #3085

Open
opened 2026-04-05 05:27:42 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/error-handling-build-facade-suppress
  • Commit Message: fix(a2a): replace silent exception suppression in _build_facade with logged error handling
  • Milestone: (none — see backlog note below)
  • Parent Epic: #362

Background and Context

The _build_facade function in src/cleveragents/a2a/cli_bootstrap.py uses contextlib.suppress(Exception) to wrap service lookups from the dependency injection container. The project specification and CONTRIBUTING.md enforce a strict "fail-fast" philosophy: exceptions must be allowed to propagate or, at minimum, be logged — they must never be silently swallowed.

This pattern is dangerous because it catches and ignores all exceptions, including system-critical ones like ConfigurationError, ImportError, or database connection errors. If a service fails to instantiate for any reason, the exception is silently swallowed, the facade is created with missing services, and no indication of failure is provided to the operator.

Current Behavior

All exceptions raised during service initialization in _build_facade (lines 39–50 of src/cleveragents/a2a/cli_bootstrap.py) are caught and discarded via contextlib.suppress(Exception):

# Wire available services — each key is optional so the facade
# gracefully stubs operations when a service is absent.
with contextlib.suppress(Exception):
    services["plan_lifecycle_service"] = container.plan_lifecycle_service()

with contextlib.suppress(Exception):
    services["session_service"] = container.session_service()

with contextlib.suppress(Exception):
    services["resource_registry_service"] = container.resource_registry_service()

with contextlib.suppress(Exception):
    services["tool_registry"] = container.tool_registry()

The application continues to run with an incomplete set of services, hiding the underlying problem. This makes it extremely difficult to diagnose and debug startup failures.

Expected Behavior

Exceptions during service initialization should not be silently suppressed. Per the project's "fail-fast" philosophy and CONTRIBUTING.md's explicit exception handling requirements, any exception caught during this process must be logged with ERROR severity so that system operators are aware of the failure. The application should either fail fast or surface the error clearly in logs.

Acceptance Criteria

  • contextlib.suppress(Exception) is removed from all four service wiring blocks in _build_facade
  • Each service wiring block uses a try...except Exception that logs the exception at ERROR level with exc_info=True before continuing
  • The logger is obtained via logging.getLogger(__name__) at module level (no # type: ignore comments)
  • Existing Behave scenarios for cli_bootstrap.py are updated or new scenarios added to verify error logging behaviour
  • nox -e typecheck passes with no new type errors
  • nox -e lint passes
  • nox -e unit_tests passes
  • Coverage for src/cleveragents/a2a/cli_bootstrap.py remains ≥ 97%

Supporting Information

  • File: src/cleveragents/a2a/cli_bootstrap.py
  • Function: _build_facade
  • Lines: 39–50
  • Related issue: #1884 (test coverage for cli_bootstrap.py)
  • CONTRIBUTING.md: "Exceptions must be allowed to propagate to the top-level execution for centralized logging and handling. Errors should never be suppressed or caught just to be logged and re-thrown."
  • Spec §Error Handling: fail-fast philosophy; no silent suppression permitted

Suggested Fix

import logging
logger = logging.getLogger(__name__)

try:
    services["plan_lifecycle_service"] = container.plan_lifecycle_service()
except Exception as e:
    logger.error("Failed to initialize plan_lifecycle_service: %s", e, exc_info=True)

try:
    services["session_service"] = container.session_service()
except Exception as e:
    logger.error("Failed to initialize session_service: %s", e, exc_info=True)

try:
    services["resource_registry_service"] = container.resource_registry_service()
except Exception as e:
    logger.error("Failed to initialize resource_registry_service: %s", e, exc_info=True)

try:
    services["tool_registry"] = container.tool_registry()
except Exception as e:
    logger.error("Failed to initialize tool_registry: %s", e, exc_info=True)

Subtasks

  • Remove all contextlib.suppress(Exception) blocks from _build_facade in src/cleveragents/a2a/cli_bootstrap.py
  • Add module-level logger (logging.getLogger(__name__)) to cli_bootstrap.py
  • Replace each suppression block with a try...except Exception that logs at ERROR level with exc_info=True
  • Tests (Behave): Add/update scenarios in features/ to verify that service initialization failures are logged at ERROR level
  • Tests (Behave): Add scenario verifying facade is still created with successfully initialized services when one service fails
  • Verify coverage ≥ 97% via nox -e coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage ≥ 97%.

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/error-handling-build-facade-suppress` - **Commit Message**: `fix(a2a): replace silent exception suppression in _build_facade with logged error handling` - **Milestone**: *(none — see backlog note below)* - **Parent Epic**: #362 ## Background and Context The `_build_facade` function in `src/cleveragents/a2a/cli_bootstrap.py` uses `contextlib.suppress(Exception)` to wrap service lookups from the dependency injection container. The project specification and CONTRIBUTING.md enforce a strict "fail-fast" philosophy: exceptions must be allowed to propagate or, at minimum, be logged — they must never be silently swallowed. This pattern is dangerous because it catches and ignores *all* exceptions, including system-critical ones like `ConfigurationError`, `ImportError`, or database connection errors. If a service fails to instantiate for any reason, the exception is silently swallowed, the facade is created with missing services, and no indication of failure is provided to the operator. ## Current Behavior All exceptions raised during service initialization in `_build_facade` (lines 39–50 of `src/cleveragents/a2a/cli_bootstrap.py`) are caught and discarded via `contextlib.suppress(Exception)`: ```python # Wire available services — each key is optional so the facade # gracefully stubs operations when a service is absent. with contextlib.suppress(Exception): services["plan_lifecycle_service"] = container.plan_lifecycle_service() with contextlib.suppress(Exception): services["session_service"] = container.session_service() with contextlib.suppress(Exception): services["resource_registry_service"] = container.resource_registry_service() with contextlib.suppress(Exception): services["tool_registry"] = container.tool_registry() ``` The application continues to run with an incomplete set of services, hiding the underlying problem. This makes it extremely difficult to diagnose and debug startup failures. ## Expected Behavior Exceptions during service initialization should not be silently suppressed. Per the project's "fail-fast" philosophy and CONTRIBUTING.md's explicit exception handling requirements, any exception caught during this process must be logged with `ERROR` severity so that system operators are aware of the failure. The application should either fail fast or surface the error clearly in logs. ## Acceptance Criteria - [ ] `contextlib.suppress(Exception)` is removed from all four service wiring blocks in `_build_facade` - [ ] Each service wiring block uses a `try...except Exception` that logs the exception at `ERROR` level with `exc_info=True` before continuing - [ ] The logger is obtained via `logging.getLogger(__name__)` at module level (no `# type: ignore` comments) - [ ] Existing Behave scenarios for `cli_bootstrap.py` are updated or new scenarios added to verify error logging behaviour - [ ] `nox -e typecheck` passes with no new type errors - [ ] `nox -e lint` passes - [ ] `nox -e unit_tests` passes - [ ] Coverage for `src/cleveragents/a2a/cli_bootstrap.py` remains ≥ 97% ## Supporting Information - **File**: `src/cleveragents/a2a/cli_bootstrap.py` - **Function**: `_build_facade` - **Lines**: 39–50 - Related issue: #1884 (test coverage for `cli_bootstrap.py`) - CONTRIBUTING.md: *"Exceptions must be allowed to propagate to the top-level execution for centralized logging and handling. Errors should never be suppressed or caught just to be logged and re-thrown."* - Spec §Error Handling: fail-fast philosophy; no silent suppression permitted ### Suggested Fix ```python import logging logger = logging.getLogger(__name__) try: services["plan_lifecycle_service"] = container.plan_lifecycle_service() except Exception as e: logger.error("Failed to initialize plan_lifecycle_service: %s", e, exc_info=True) try: services["session_service"] = container.session_service() except Exception as e: logger.error("Failed to initialize session_service: %s", e, exc_info=True) try: services["resource_registry_service"] = container.resource_registry_service() except Exception as e: logger.error("Failed to initialize resource_registry_service: %s", e, exc_info=True) try: services["tool_registry"] = container.tool_registry() except Exception as e: logger.error("Failed to initialize tool_registry: %s", e, exc_info=True) ``` ## Subtasks - [ ] Remove all `contextlib.suppress(Exception)` blocks from `_build_facade` in `src/cleveragents/a2a/cli_bootstrap.py` - [ ] Add module-level logger (`logging.getLogger(__name__)`) to `cli_bootstrap.py` - [ ] Replace each suppression block with a `try...except Exception` that logs at `ERROR` level with `exc_info=True` - [ ] Tests (Behave): Add/update scenarios in `features/` to verify that service initialization failures are logged at ERROR level - [ ] Tests (Behave): Add scenario verifying facade is still created with successfully initialized services when one service fails - [ ] Verify coverage ≥ 97% via `nox -e coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-05 05:58:28 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Backlog (kept as-is — this is a code quality improvement, not blocking any milestone)
  • Milestone: v3.6.0 (assigned — error handling improvements fit the "Advanced Concepts" scope)
  • MoSCoW: Could Have — this is a valid error-handling improvement per CONTRIBUTING.md's fail-fast philosophy, but it is not blocking any functionality. The _build_facade function works correctly in the happy path; the issue is about observability of failures.
  • Parent Epic: #362 (Security & Safety Hardening) — dependency link created

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified ✅ - **Priority**: Backlog (kept as-is — this is a code quality improvement, not blocking any milestone) - **Milestone**: v3.6.0 (assigned — error handling improvements fit the "Advanced Concepts" scope) - **MoSCoW**: Could Have — this is a valid error-handling improvement per CONTRIBUTING.md's fail-fast philosophy, but it is not blocking any functionality. The `_build_facade` function works correctly in the happy path; the issue is about observability of failures. - **Parent Epic**: #362 (Security & Safety Hardening) — dependency link created --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.6.0 milestone 2026-04-07 00:20:08 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3085
No description provided.