UAT: LLM provider adapters (Anthropic, OpenAI, Google, OpenRouter) do not map API errors to domain exceptions — RateLimit, NetworkError, and ProviderError never raised #5681

Open
opened 2026-04-09 08:33:34 +00:00 by HAL9000 · 3 comments
Owner

Summary

The LLM provider adapter classes (AnthropicProvider, OpenAIProvider, GoogleProvider, OpenRouterProvider) only validate that an API key is present at construction time. They do not catch or translate provider-specific API errors (rate limits, network timeouts, authentication failures, model unavailability) into the domain exception hierarchy defined in core/exceptions.py.

Expected Behavior (per CONTRIBUTING.md and spec)

Per CONTRIBUTING.md error handling conventions and the spec's retry/resilience requirements:

  • RateLimitError should be raised when the provider returns HTTP 429 or equivalent
  • NetworkError should be raised for connection timeouts and refused connections
  • ProviderError should be raised for general API failures
  • ModelNotAvailableError should be raised when the requested model is deprecated or unavailable
  • TokenLimitExceededError should be raised when the context window is exceeded

The retry infrastructure in core/retry_patterns.py explicitly defines retry categories for ProviderError and RateLimitError:

"provider": {
    "max_attempts": 3,
    "wait": wait_exponential_jitter(max=60),
    "exceptions": (ProviderError, RateLimitError),
},

This retry logic is never triggered because the providers never raise these exceptions.

Actual Behavior

All four provider files (anthropic_provider.py, openai_provider.py, google_provider.py, openrouter_provider.py) only validate the API key at construction:

# anthropic_provider.py line 22
if not api_key:
    raise ValueError("Anthropic API key is required")

No try/except blocks exist to catch provider SDK exceptions and translate them to domain exceptions. Raw SDK exceptions (e.g., anthropic.RateLimitError, openai.APIConnectionError) propagate unhandled through the service layer, bypassing the retry infrastructure entirely.

Impact

  • Retry logic never fires for LLM API errors — the retry_provider_operation decorator and ServiceRetryWiring are wired but ineffective
  • Circuit breakers never trip on provider failuresCircuitBreaker instances for provider services never see ProviderError
  • Error messages are raw SDK exceptions — users see anthropic.RateLimitError: 429 Too Many Requests instead of a clear ProviderError: Rate limit exceeded, retry after 30s
  • Rate limit handling is absentRateLimitError.retry_after field is never populated

Code Locations

  • src/cleveragents/providers/llm/anthropic_provider.py
  • src/cleveragents/providers/llm/openai_provider.py
  • src/cleveragents/providers/llm/google_provider.py
  • src/cleveragents/providers/llm/openrouter_provider.py
  • src/cleveragents/core/exceptions.pyRateLimitError, ProviderError, NetworkError, ModelNotAvailableError, TokenLimitExceededError defined but unused by providers

Fix Required

Each provider adapter should wrap its API calls in try/except blocks that translate SDK-specific exceptions to domain exceptions, e.g.:

try:
    response = self._client.messages.create(...)
except anthropic.RateLimitError as e:
    raise RateLimitError(str(e), retry_after=e.retry_after) from e
except anthropic.APIConnectionError as e:
    raise NetworkError(str(e)) from e
except anthropic.AuthenticationError as e:
    raise AuthenticationError(str(e)) from e

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Summary The LLM provider adapter classes (`AnthropicProvider`, `OpenAIProvider`, `GoogleProvider`, `OpenRouterProvider`) only validate that an API key is present at construction time. They do not catch or translate provider-specific API errors (rate limits, network timeouts, authentication failures, model unavailability) into the domain exception hierarchy defined in `core/exceptions.py`. ## Expected Behavior (per CONTRIBUTING.md and spec) Per CONTRIBUTING.md error handling conventions and the spec's retry/resilience requirements: - `RateLimitError` should be raised when the provider returns HTTP 429 or equivalent - `NetworkError` should be raised for connection timeouts and refused connections - `ProviderError` should be raised for general API failures - `ModelNotAvailableError` should be raised when the requested model is deprecated or unavailable - `TokenLimitExceededError` should be raised when the context window is exceeded The retry infrastructure in `core/retry_patterns.py` explicitly defines retry categories for `ProviderError` and `RateLimitError`: ```python "provider": { "max_attempts": 3, "wait": wait_exponential_jitter(max=60), "exceptions": (ProviderError, RateLimitError), }, ``` This retry logic is **never triggered** because the providers never raise these exceptions. ## Actual Behavior All four provider files (`anthropic_provider.py`, `openai_provider.py`, `google_provider.py`, `openrouter_provider.py`) only validate the API key at construction: ```python # anthropic_provider.py line 22 if not api_key: raise ValueError("Anthropic API key is required") ``` No try/except blocks exist to catch provider SDK exceptions and translate them to domain exceptions. Raw SDK exceptions (e.g., `anthropic.RateLimitError`, `openai.APIConnectionError`) propagate unhandled through the service layer, bypassing the retry infrastructure entirely. ## Impact - **Retry logic never fires for LLM API errors** — the `retry_provider_operation` decorator and `ServiceRetryWiring` are wired but ineffective - **Circuit breakers never trip on provider failures** — `CircuitBreaker` instances for provider services never see `ProviderError` - **Error messages are raw SDK exceptions** — users see `anthropic.RateLimitError: 429 Too Many Requests` instead of a clear `ProviderError: Rate limit exceeded, retry after 30s` - **Rate limit handling is absent** — `RateLimitError.retry_after` field is never populated ## Code Locations - `src/cleveragents/providers/llm/anthropic_provider.py` - `src/cleveragents/providers/llm/openai_provider.py` - `src/cleveragents/providers/llm/google_provider.py` - `src/cleveragents/providers/llm/openrouter_provider.py` - `src/cleveragents/core/exceptions.py` — `RateLimitError`, `ProviderError`, `NetworkError`, `ModelNotAvailableError`, `TokenLimitExceededError` defined but unused by providers ## Fix Required Each provider adapter should wrap its API calls in try/except blocks that translate SDK-specific exceptions to domain exceptions, e.g.: ```python try: response = self._client.messages.create(...) except anthropic.RateLimitError as e: raise RateLimitError(str(e), retry_after=e.retry_after) from e except anthropic.APIConnectionError as e: raise NetworkError(str(e)) from e except anthropic.AuthenticationError as e: raise AuthenticationError(str(e)) from e ``` --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

Label compliance fix applied:

  • Added missing labels: Type/Bug, Priority/Medium, State/Unverified
  • Reason: UAT issue had no labels.

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels: `Type/Bug`, `Priority/Medium`, `State/Unverified` - Reason: UAT issue had no labels. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
HAL9000 added this to the v3.2.0 milestone 2026-04-09 08:46:48 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
HAL9000 modified the milestone from v3.2.0 to v3.6.0 2026-04-09 10:16:55 +00:00
Author
Owner

Milestone compliance fix applied:

  • Assigned to milestone: v3.6.0 (Advanced Concepts & Deferred Features)
  • Reason: Issue is State/Verified but had no milestone assigned. LLM provider error mapping belongs to v3.6.0 scope (additional LLM backends and provider integrations).

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Milestone compliance fix applied: - Assigned to milestone: **v3.6.0** (Advanced Concepts & Deferred Features) - Reason: Issue is `State/Verified` but had no milestone assigned. LLM provider error mapping belongs to v3.6.0 scope (additional LLM backends and provider integrations). --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5681
No description provided.