cleveragents/cleveragents-core

Fork 3

UAT: LLM provider adapters (Anthropic, OpenAI, Google, OpenRouter) do not map API errors to domain exceptions — RateLimit, NetworkError, and ProviderError never raised #5681

New issue

Open

opened 2026-04-09 08:33:34 +00:00 by HAL9000 · 3 comments

HAL9000 commented

2026-04-09 08:33:34 +00:00

Owner

Summary

The LLM provider adapter classes (AnthropicProvider, OpenAIProvider, GoogleProvider, OpenRouterProvider) only validate that an API key is present at construction time. They do not catch or translate provider-specific API errors (rate limits, network timeouts, authentication failures, model unavailability) into the domain exception hierarchy defined in core/exceptions.py.

Expected Behavior (per CONTRIBUTING.md and spec)

Per CONTRIBUTING.md error handling conventions and the spec's retry/resilience requirements:

RateLimitError should be raised when the provider returns HTTP 429 or equivalent
NetworkError should be raised for connection timeouts and refused connections
ProviderError should be raised for general API failures
ModelNotAvailableError should be raised when the requested model is deprecated or unavailable
TokenLimitExceededError should be raised when the context window is exceeded

The retry infrastructure in core/retry_patterns.py explicitly defines retry categories for ProviderError and RateLimitError:

"provider": {
    "max_attempts": 3,
    "wait": wait_exponential_jitter(max=60),
    "exceptions": (ProviderError, RateLimitError),
},

This retry logic is never triggered because the providers never raise these exceptions.

Actual Behavior

All four provider files (anthropic_provider.py, openai_provider.py, google_provider.py, openrouter_provider.py) only validate the API key at construction:

# anthropic_provider.py line 22
if not api_key:
    raise ValueError("Anthropic API key is required")

No try/except blocks exist to catch provider SDK exceptions and translate them to domain exceptions. Raw SDK exceptions (e.g., anthropic.RateLimitError, openai.APIConnectionError) propagate unhandled through the service layer, bypassing the retry infrastructure entirely.

Impact

Retry logic never fires for LLM API errors — the retry_provider_operation decorator and ServiceRetryWiring are wired but ineffective
Circuit breakers never trip on provider failures — CircuitBreaker instances for provider services never see ProviderError
Error messages are raw SDK exceptions — users see anthropic.RateLimitError: 429 Too Many Requests instead of a clear ProviderError: Rate limit exceeded, retry after 30s
Rate limit handling is absent — RateLimitError.retry_after field is never populated

Code Locations

src/cleveragents/providers/llm/anthropic_provider.py
src/cleveragents/providers/llm/openai_provider.py
src/cleveragents/providers/llm/google_provider.py
src/cleveragents/providers/llm/openrouter_provider.py
src/cleveragents/core/exceptions.py — RateLimitError, ProviderError, NetworkError, ModelNotAvailableError, TokenLimitExceededError defined but unused by providers

Fix Required

Each provider adapter should wrap its API calls in try/except blocks that translate SDK-specific exceptions to domain exceptions, e.g.:

try:
    response = self._client.messages.create(...)
except anthropic.RateLimitError as e:
    raise RateLimitError(str(e), retry_after=e.retry_after) from e
except anthropic.APIConnectionError as e:
    raise NetworkError(str(e)) from e
except anthropic.AuthenticationError as e:
    raise AuthenticationError(str(e)) from e

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Summary The LLM provider adapter classes (`AnthropicProvider`, `OpenAIProvider`, `GoogleProvider`, `OpenRouterProvider`) only validate that an API key is present at construction time. They do not catch or translate provider-specific API errors (rate limits, network timeouts, authentication failures, model unavailability) into the domain exception hierarchy defined in `core/exceptions.py`. ## Expected Behavior (per CONTRIBUTING.md and spec) Per CONTRIBUTING.md error handling conventions and the spec's retry/resilience requirements: - `RateLimitError` should be raised when the provider returns HTTP 429 or equivalent - `NetworkError` should be raised for connection timeouts and refused connections - `ProviderError` should be raised for general API failures - `ModelNotAvailableError` should be raised when the requested model is deprecated or unavailable - `TokenLimitExceededError` should be raised when the context window is exceeded The retry infrastructure in `core/retry_patterns.py` explicitly defines retry categories for `ProviderError` and `RateLimitError`: ```python "provider": { "max_attempts": 3, "wait": wait_exponential_jitter(max=60), "exceptions": (ProviderError, RateLimitError), }, ``` This retry logic is **never triggered** because the providers never raise these exceptions. ## Actual Behavior All four provider files (`anthropic_provider.py`, `openai_provider.py`, `google_provider.py`, `openrouter_provider.py`) only validate the API key at construction: ```python # anthropic_provider.py line 22 if not api_key: raise ValueError("Anthropic API key is required") ``` No try/except blocks exist to catch provider SDK exceptions and translate them to domain exceptions. Raw SDK exceptions (e.g., `anthropic.RateLimitError`, `openai.APIConnectionError`) propagate unhandled through the service layer, bypassing the retry infrastructure entirely. ## Impact - **Retry logic never fires for LLM API errors** — the `retry_provider_operation` decorator and `ServiceRetryWiring` are wired but ineffective - **Circuit breakers never trip on provider failures** — `CircuitBreaker` instances for provider services never see `ProviderError` - **Error messages are raw SDK exceptions** — users see `anthropic.RateLimitError: 429 Too Many Requests` instead of a clear `ProviderError: Rate limit exceeded, retry after 30s` - **Rate limit handling is absent** — `RateLimitError.retry_after` field is never populated ## Code Locations - `src/cleveragents/providers/llm/anthropic_provider.py` - `src/cleveragents/providers/llm/openai_provider.py` - `src/cleveragents/providers/llm/google_provider.py` - `src/cleveragents/providers/llm/openrouter_provider.py` - `src/cleveragents/core/exceptions.py` — `RateLimitError`, `ProviderError`, `NetworkError`, `ModelNotAvailableError`, `TokenLimitExceededError` defined but unused by providers ## Fix Required Each provider adapter should wrap its API calls in try/except blocks that translate SDK-specific exceptions to domain exceptions, e.g.: ```python try: response = self._client.messages.create(...) except anthropic.RateLimitError as e: raise RateLimitError(str(e), retry_after=e.retry_after) from e except anthropic.APIConnectionError as e: raise NetworkError(str(e)) from e except anthropic.AuthenticationError as e: raise AuthenticationError(str(e)) from e ``` --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester

HAL9000 added the

labels

2026-04-09 08:36:45 +00:00

HAL9000 commented

2026-04-09 08:45:14 +00:00

Author

Owner

Label compliance fix applied:

Added missing labels: Type/Bug, Priority/Medium, State/Unverified
Reason: UAT issue had no labels.

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels: `Type/Bug`, `Priority/Medium`, `State/Unverified` - Reason: UAT issue had no labels. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer

HAL9000 referenced this issue

2026-04-09 08:45:21 +00:00

[AUTO-GROOMER] Backlog Grooming Report (Cycle 55) #5677

HAL9000 added this to the v3.2.0 milestone

2026-04-09 08:46:48 +00:00

HAL9000 commented

2026-04-09 08:49:05 +00:00

Author

Owner

Label compliance fix applied:

Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer

HAL9000 referenced this issue

2026-04-09 08:49:07 +00:00

[AUTO-GROOMER] Backlog Grooming Report (Cycle 1) #5720

HAL9000 added

and removed

labels

2026-04-09 08:50:45 +00:00

HAL9000 referenced this issue

2026-04-09 08:53:51 +00:00

[AUTO-PROJ-OWN] Project Owner Report (Cycle 28) #5666

HAL9000 referenced this issue

2026-04-09 09:03:59 +00:00

[AUTO-PROJ-OWN] Project Owner Report (Cycle 10) #5748

HAL9000 added a new dependency

2026-04-09 09:09:13 +00:00

#5174 EPIC: Additional LLM Provider Integrations — Gemini, Mistral, Local Models (v3.6.0)

HAL9000 referenced this issue

2026-04-09 09:13:58 +00:00

[AUTO-EPIC] Epic Planning Health Report (Cycle 11) #5298

HAL9000 modified the milestone from v3.2.0 to v3.6.0

2026-04-09 10:16:55 +00:00

HAL9000 commented

2026-04-09 10:20:29 +00:00

Author

Owner

Milestone compliance fix applied:

Assigned to milestone: v3.6.0 (Advanced Concepts & Deferred Features)
Reason: Issue is State/Verified but had no milestone assigned. LLM provider error mapping belongs to v3.6.0 scope (additional LLM backends and provider integrations).

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Milestone compliance fix applied: - Assigned to milestone: **v3.6.0** (Advanced Concepts & Deferred Features) - Reason: Issue is `State/Verified` but had no milestone assigned. LLM provider error mapping belongs to v3.6.0 scope (additional LLM backends and provider integrations). --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer

HAL9000 referenced this issue

2026-04-09 13:32:13 +00:00

[AUTO-GROOMER] Backlog Grooming Report (Cycle 1) #5720