[AUTO-SPEC] Proposal: Document LLM provider fallback behavior in StrategyActor specification #10213

Open
opened 2026-04-17 06:10:18 +00:00 by HAL9000 · 0 comments
Owner

Metadata

  • Commit message: docs(spec): document LLM provider fallback behavior and quota recovery in StrategyActor
  • Branch name: docs/spec-llm-provider-fallback-behavior

Background and Context

Two recent commits implement LLM provider fallback behavior in StrategyActor:

  1. f5712787 (feat: add fallback to Anthropic Sonnet when OpenAI quota is exhausted) — Implements graceful degradation when OpenAI quota is exhausted, falling back to Anthropic Claude Sonnet. Includes quota error detection, fallback LLM caching, 5-minute recovery interval, and comprehensive logging.

  2. 51472c0b (debug: upgrade logging levels for fallback diagnostics) — Upgrades fallback-related log messages from DEBUG to WARNING level for better CI/CD observability.

The current specification does not document this fallback behavior, creating a gap between what the system actually does and what the spec describes.


Discrepancy Report

Type: Implementation found a better approach → spec update needed

Area: Actor Runtime — StrategyActor LLM Provider Configuration

Current Spec State

The specification documents StrategyActor as using a configured LLM provider for strategy generation, but does not document:

  • Quota error detection and classification
  • Fallback provider selection when primary provider quota is exhausted
  • Fallback LLM instance caching behavior
  • Quota recovery interval (5 minutes)
  • Logging behavior for fallback transitions

Implementation State

StrategyActor._execute_with_llm() now implements:

  1. _is_quota_error() — Detects quota-specific API errors (429, insufficient_quota, rate_limit)
  2. Fallback provideranthropic/claude-sonnet-4-20250514 used when primary provider quota is exhausted
  3. Fallback LLM cachingself._fallback_llm cached to avoid per-call recreation overhead
  4. Quota recovery_QUOTA_RECOVERY_INTERVAL = 5 minutes before attempting to recover primary provider
  5. Fallback mode stateself._using_fallback and self._last_quota_error_time track fallback state
  6. Warning-level logging — Fallback transitions logged at WARNING level for CI/CD observability

Proposed Spec Change

Add a subsection to the StrategyActor documentation covering:

#### LLM Provider Fallback

StrategyActor implements graceful degradation when the primary LLM provider encounters quota exhaustion:

- **Quota detection**: HTTP 429 errors and `insufficient_quota`/`rate_limit` error codes trigger fallback
- **Fallback provider**: Configurable secondary provider (default: `anthropic/claude-sonnet-4-20250514`)
- **Fallback caching**: The fallback LLM instance is cached to avoid per-call recreation overhead
- **Recovery interval**: After entering fallback mode, the primary provider is retried every 5 minutes
- **Observability**: Fallback transitions are logged at WARNING level with provider name and error details
- **Both providers exhausted**: If both primary and fallback providers fail, the operation fails with a clear error message indicating that no LLM provider is available

Classification

Implementation found a better approach — The fallback behavior improves CI/CD reliability by gracefully handling quota exhaustion. This is an intentional feature that should be documented in the spec.


Expected Behavior

After this issue is resolved, docs/specification.md accurately documents the LLM provider fallback behavior in StrategyActor, including quota detection, fallback provider selection, caching, recovery interval, and observability.


Acceptance Criteria

  • docs/specification.md documents StrategyActor LLM provider fallback behavior
  • Quota error detection criteria are documented (HTTP 429, insufficient_quota, rate_limit)
  • Fallback provider configuration is documented
  • Recovery interval (5 minutes) is documented
  • Observability (WARNING-level logging) is documented
  • Both-providers-exhausted failure behavior is documented

Subtasks

  • Review src/cleveragents/actors/strategy_actor.py to confirm current fallback implementation details
  • Add LLM Provider Fallback subsection to StrategyActor documentation in docs/specification.md
  • Open a PR targeting master with the spec changes
  • Apply needs feedback label to the PR

Definition of Done

This issue should be closed when:

  1. docs/specification.md has been updated to document LLM provider fallback behavior.
  2. The PR has been reviewed and merged.
  3. The spec accurately reflects the implementation with no remaining gaps for this feature.

Automated by CleverAgents Bot
Supervisor: Spec Update | Agent: spec-update-pool-supervisor

## Metadata - **Commit message**: `docs(spec): document LLM provider fallback behavior and quota recovery in StrategyActor` - **Branch name**: `docs/spec-llm-provider-fallback-behavior` --- ## Background and Context Two recent commits implement LLM provider fallback behavior in `StrategyActor`: 1. **`f5712787`** (feat: add fallback to Anthropic Sonnet when OpenAI quota is exhausted) — Implements graceful degradation when OpenAI quota is exhausted, falling back to Anthropic Claude Sonnet. Includes quota error detection, fallback LLM caching, 5-minute recovery interval, and comprehensive logging. 2. **`51472c0b`** (debug: upgrade logging levels for fallback diagnostics) — Upgrades fallback-related log messages from DEBUG to WARNING level for better CI/CD observability. The current specification does not document this fallback behavior, creating a gap between what the system actually does and what the spec describes. --- ## Discrepancy Report **Type**: Implementation found a better approach → spec update needed **Area**: Actor Runtime — StrategyActor LLM Provider Configuration ### Current Spec State The specification documents `StrategyActor` as using a configured LLM provider for strategy generation, but does not document: - Quota error detection and classification - Fallback provider selection when primary provider quota is exhausted - Fallback LLM instance caching behavior - Quota recovery interval (5 minutes) - Logging behavior for fallback transitions ### Implementation State `StrategyActor._execute_with_llm()` now implements: 1. **`_is_quota_error()`** — Detects quota-specific API errors (429, `insufficient_quota`, `rate_limit`) 2. **Fallback provider** — `anthropic/claude-sonnet-4-20250514` used when primary provider quota is exhausted 3. **Fallback LLM caching** — `self._fallback_llm` cached to avoid per-call recreation overhead 4. **Quota recovery** — `_QUOTA_RECOVERY_INTERVAL = 5 minutes` before attempting to recover primary provider 5. **Fallback mode state** — `self._using_fallback` and `self._last_quota_error_time` track fallback state 6. **Warning-level logging** — Fallback transitions logged at WARNING level for CI/CD observability ### Proposed Spec Change Add a subsection to the StrategyActor documentation covering: ``` #### LLM Provider Fallback StrategyActor implements graceful degradation when the primary LLM provider encounters quota exhaustion: - **Quota detection**: HTTP 429 errors and `insufficient_quota`/`rate_limit` error codes trigger fallback - **Fallback provider**: Configurable secondary provider (default: `anthropic/claude-sonnet-4-20250514`) - **Fallback caching**: The fallback LLM instance is cached to avoid per-call recreation overhead - **Recovery interval**: After entering fallback mode, the primary provider is retried every 5 minutes - **Observability**: Fallback transitions are logged at WARNING level with provider name and error details - **Both providers exhausted**: If both primary and fallback providers fail, the operation fails with a clear error message indicating that no LLM provider is available ``` ### Classification **Implementation found a better approach** — The fallback behavior improves CI/CD reliability by gracefully handling quota exhaustion. This is an intentional feature that should be documented in the spec. --- ## Expected Behavior After this issue is resolved, `docs/specification.md` accurately documents the LLM provider fallback behavior in `StrategyActor`, including quota detection, fallback provider selection, caching, recovery interval, and observability. --- ## Acceptance Criteria - [ ] `docs/specification.md` documents `StrategyActor` LLM provider fallback behavior - [ ] Quota error detection criteria are documented (HTTP 429, `insufficient_quota`, `rate_limit`) - [ ] Fallback provider configuration is documented - [ ] Recovery interval (5 minutes) is documented - [ ] Observability (WARNING-level logging) is documented - [ ] Both-providers-exhausted failure behavior is documented --- ## Subtasks - [ ] Review `src/cleveragents/actors/strategy_actor.py` to confirm current fallback implementation details - [ ] Add LLM Provider Fallback subsection to StrategyActor documentation in `docs/specification.md` - [ ] Open a PR targeting master with the spec changes - [ ] Apply `needs feedback` label to the PR --- ## Definition of Done This issue should be closed when: 1. `docs/specification.md` has been updated to document LLM provider fallback behavior. 2. The PR has been reviewed and merged. 3. The spec accurately reflects the implementation with no remaining gaps for this feature. --- **Automated by CleverAgents Bot** Supervisor: Spec Update | Agent: spec-update-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10213
No description provided.