#10042: Add fallback to Anthropic Haiku when OpenAI quota is exhausted #10043

Merged
HAL9000 merged 2 commits from feature/10042-openai-quota-fallback into master 2026-04-17 04:11:42 +00:00
Member

Summary

Implements graceful degradation for E2E robot integration tests that hit OpenAI 429 quota limit errors.

Changes

  • Add _is_quota_error() helper to detect quota-specific API errors (429, insufficient_quota, rate_limit)
  • Modify _execute_with_llm() in StrategyActor to catch quota errors and attempt fallback to Anthropic Haiku
  • Configure fallback provider as 'anthropic/claude-3-5-haiku-20241022'
  • Add comprehensive logging for quota error detection and provider fallback
  • Add E2E test scenarios for quota fallback verification
  • Add 'Skip If No Fallback LLM Key' keyword for quota fallback tests

Impact

This ensures CI/CD pipelines can complete E2E tests even when the primary provider (OpenAI) hits quota limits, improving pipeline reliability and reducing false negatives caused by provider-specific issues.

Testing

  • All existing unit tests pass
  • All lint and type checks pass
  • New E2E test scenarios added for quota fallback verification

Fixes #10042

## Summary Implements graceful degradation for E2E robot integration tests that hit OpenAI 429 quota limit errors. ## Changes - Add `_is_quota_error()` helper to detect quota-specific API errors (429, insufficient_quota, rate_limit) - Modify `_execute_with_llm()` in StrategyActor to catch quota errors and attempt fallback to Anthropic Haiku - Configure fallback provider as 'anthropic/claude-3-5-haiku-20241022' - Add comprehensive logging for quota error detection and provider fallback - Add E2E test scenarios for quota fallback verification - Add 'Skip If No Fallback LLM Key' keyword for quota fallback tests ## Impact This ensures CI/CD pipelines can complete E2E tests even when the primary provider (OpenAI) hits quota limits, improving pipeline reliability and reducing false negatives caused by provider-specific issues. ## Testing - All existing unit tests pass - All lint and type checks pass - New E2E test scenarios added for quota fallback verification ## Related Issues Fixes #10042
CoreRasurae force-pushed feature/10042-openai-quota-fallback from 36428204df
Some checks failed
CI / lint (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 1m3s
CI / security (pull_request) Successful in 56s
CI / quality (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / push-validation (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / build (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / helm (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
to 9f71f69890
Some checks failed
CI / lint (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 1m3s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 55s
CI / build (pull_request) Successful in 29s
CI / helm (pull_request) Successful in 32s
CI / push-validation (pull_request) Successful in 32s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-04-16 18:19:39 +00:00
Compare
CoreRasurae force-pushed feature/10042-openai-quota-fallback from 9f71f69890
Some checks failed
CI / lint (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 1m3s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 55s
CI / build (pull_request) Successful in 29s
CI / helm (pull_request) Successful in 32s
CI / push-validation (pull_request) Successful in 32s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
to ac65fdb996
Some checks failed
CI / lint (pull_request) Successful in 55s
CI / quality (pull_request) Successful in 50s
CI / security (pull_request) Successful in 55s
CI / typecheck (pull_request) Successful in 59s
CI / build (pull_request) Successful in 25s
CI / helm (pull_request) Successful in 30s
CI / push-validation (pull_request) Successful in 21s
CI / e2e_tests (pull_request) Failing after 5m7s
CI / integration_tests (pull_request) Successful in 7m11s
CI / unit_tests (pull_request) Failing after 8m26s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Successful in 7m10s
CI / status-check (pull_request) Failing after 1s
2026-04-16 18:22:12 +00:00
Compare
Owner

@CoreRasurae — Thank you for the rapid turnaround on PR #10043. Filing the issue and submitting a fix within minutes is exactly the kind of responsiveness that keeps CI healthy.

Acknowledgment

This PR directly addresses the CI blocker identified in #10042. The approach is well-targeted:

  • _is_quota_error() helper for clean error detection
  • Fallback to anthropic/claude-3-5-haiku-20241022 on quota exhaustion
  • Comprehensive logging for observability
  • New E2E test scenarios for fallback verification

Impact

If this PR merges and CI passes, it will unblock:

  • PR #10000 (hamza.khyari — fix: clean up stale worktree branch)
  • PR #10002 (hamza.khyari — feat: plan diff using git worktree branch)
  • Any other PRs currently blocked by E2E quota failures

Next Steps

The PR has been received and is queued for automated review. Please ensure:

  1. CI passes on the current HEAD (especially the new E2E quota fallback scenarios)
  2. CHANGELOG.md includes an entry under [Unreleased] > Fixed
  3. CONTRIBUTORS.md is updated if not already present
  4. The commit footer includes ISSUES CLOSED: #10042

This will be prioritized for review given its CI blocker status.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor
Worker: [AUTO-HUMAN-3]

@CoreRasurae — Thank you for the rapid turnaround on PR #10043. Filing the issue and submitting a fix within minutes is exactly the kind of responsiveness that keeps CI healthy. ## Acknowledgment This PR directly addresses the CI blocker identified in #10042. The approach is well-targeted: - `_is_quota_error()` helper for clean error detection - Fallback to `anthropic/claude-3-5-haiku-20241022` on quota exhaustion - Comprehensive logging for observability - New E2E test scenarios for fallback verification ## Impact If this PR merges and CI passes, it will unblock: - PR #10000 (hamza.khyari — fix: clean up stale worktree branch) - PR #10002 (hamza.khyari — feat: plan diff using git worktree branch) - Any other PRs currently blocked by E2E quota failures ## Next Steps The PR has been received and is queued for automated review. Please ensure: 1. CI passes on the current HEAD (especially the new E2E quota fallback scenarios) 2. `CHANGELOG.md` includes an entry under `[Unreleased] > Fixed` 3. `CONTRIBUTORS.md` is updated if not already present 4. The commit footer includes `ISSUES CLOSED: #10042` This will be prioritized for review given its CI blocker status. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor Worker: [AUTO-HUMAN-3]
CoreRasurae force-pushed feature/10042-openai-quota-fallback from ac65fdb996
Some checks failed
CI / lint (pull_request) Successful in 55s
CI / quality (pull_request) Successful in 50s
CI / security (pull_request) Successful in 55s
CI / typecheck (pull_request) Successful in 59s
CI / build (pull_request) Successful in 25s
CI / helm (pull_request) Successful in 30s
CI / push-validation (pull_request) Successful in 21s
CI / e2e_tests (pull_request) Failing after 5m7s
CI / integration_tests (pull_request) Successful in 7m11s
CI / unit_tests (pull_request) Failing after 8m26s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Successful in 7m10s
CI / status-check (pull_request) Failing after 1s
to f99fd631ea
Some checks failed
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 19s
CI / push-validation (pull_request) Successful in 19s
CI / security (pull_request) Successful in 46s
CI / helm (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 51s
CI / typecheck (pull_request) Successful in 55s
CI / e2e_tests (pull_request) Failing after 3m21s
CI / integration_tests (pull_request) Successful in 4m44s
CI / unit_tests (pull_request) Successful in 5m27s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 8m18s
CI / status-check (pull_request) Failing after 2s
2026-04-16 19:58:16 +00:00
Compare
CoreRasurae force-pushed feature/10042-openai-quota-fallback from 6f669efd91
Some checks failed
CI / lint (pull_request) Successful in 32s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 1m5s
CI / security (pull_request) Successful in 1m12s
CI / integration_tests (pull_request) Has been cancelled
CI / build (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / helm (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / push-validation (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
to 361afdfe8d
Some checks failed
CI / lint (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 43s
CI / typecheck (pull_request) Successful in 50s
CI / security (pull_request) Successful in 59s
CI / build (pull_request) Successful in 27s
CI / push-validation (pull_request) Successful in 21s
CI / helm (pull_request) Successful in 30s
CI / e2e_tests (pull_request) Failing after 4m3s
CI / unit_tests (pull_request) Successful in 7m44s
CI / integration_tests (pull_request) Successful in 7m20s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 9m23s
CI / status-check (pull_request) Failing after 1s
2026-04-16 20:52:34 +00:00
Compare
@ -498,3 +523,3 @@
# Retry loop for transient LLM failures
content = self._invoke_llm_with_retry(llm, messages, plan_id)
try:
Member

Hi Luis (or the bot) --

This changes the code from only using llm to the following:

  • Try using llm
  • If there is a quota error:
    • Create a fallback LLM
    • Try the fallback LLM

There are two obvious ways to improve the code:

  1. fallback_llm doesn't need to be recreated every time. It's fine to create it as a global variable.
  2. Losing quota will not change rapidly. Every time that you send a message through this function, you'll first get a quota error -- then try the other LLM. It would be faster in general if:

2.1 Set the llm variable to the fallback provider, fallback model.
2.2 If it's been, say, 5 minutes since the last time that the quota has been checked, set llm to the old llm and see whether it's been solved.

Hi Luis (or the bot) -- This changes the code from only using `llm` to the following: - Try using llm - If there is a quota error: - - Create a fallback LLM - - Try the fallback LLM There are two obvious ways to improve the code: 1. `fallback_llm` doesn't need to be recreated every time. It's fine to create it as a global variable. 2. Losing quota will not change rapidly. Every time that you send a message through this function, you'll first get a quota error -- then try the other LLM. It would be faster in general if: 2.1 Set the `llm` variable to the fallback provider, fallback model. 2.2 If it's been, say, 5 minutes since the last time that the quota has been checked, set `llm` to the old llm and see whether it's been solved.
CoreRasurae force-pushed feature/10042-openai-quota-fallback from 5ff0c0c4bb
All checks were successful
CI / quality (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 27s
CI / build (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 54s
CI / helm (pull_request) Successful in 27s
CI / security (pull_request) Successful in 56s
CI / push-validation (pull_request) Successful in 22s
CI / e2e_tests (pull_request) Successful in 4m6s
CI / unit_tests (pull_request) Successful in 7m22s
CI / integration_tests (pull_request) Successful in 7m23s
CI / docker (pull_request) Successful in 1m23s
CI / coverage (pull_request) Successful in 11m24s
CI / status-check (pull_request) Successful in 1s
to 8703ddc4d2
Some checks failed
CI / lint (pull_request) Successful in 21s
CI / typecheck (pull_request) Successful in 54s
CI / security (pull_request) Successful in 57s
CI / quality (pull_request) Successful in 32s
CI / build (pull_request) Successful in 24s
CI / push-validation (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 30s
CI / e2e_tests (pull_request) Failing after 4m11s
CI / unit_tests (pull_request) Successful in 5m17s
CI / docker (pull_request) Successful in 1m44s
CI / integration_tests (pull_request) Successful in 9m30s
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-04-16 22:35:08 +00:00
Compare
brent.edwards left a comment

I approve; thank you for listening to my suggested change.

I approve; thank you for listening to my suggested change.
CoreRasurae force-pushed feature/10042-openai-quota-fallback from 8703ddc4d2
Some checks failed
CI / lint (pull_request) Successful in 21s
CI / typecheck (pull_request) Successful in 54s
CI / security (pull_request) Successful in 57s
CI / quality (pull_request) Successful in 32s
CI / build (pull_request) Successful in 24s
CI / push-validation (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 30s
CI / e2e_tests (pull_request) Failing after 4m11s
CI / unit_tests (pull_request) Successful in 5m17s
CI / docker (pull_request) Successful in 1m44s
CI / integration_tests (pull_request) Successful in 9m30s
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
to 5beadc1301
Some checks failed
CI / push-validation (pull_request) Successful in 17s
CI / helm (pull_request) Successful in 25s
CI / build (pull_request) Successful in 29s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 4m20s
CI / e2e_tests (pull_request) Failing after 4m29s
CI / typecheck (pull_request) Successful in 4m38s
CI / security (pull_request) Successful in 4m52s
CI / integration_tests (pull_request) Successful in 7m9s
CI / unit_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-04-16 23:00:06 +00:00
Compare
HAL9000 force-pushed feature/10042-openai-quota-fallback from 53dfda239b
Some checks failed
CI / build (pull_request) Successful in 15s
CI / push-validation (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 32s
CI / helm (pull_request) Successful in 31s
CI / integration_tests (pull_request) Successful in 3m49s
CI / typecheck (pull_request) Successful in 3m56s
CI / e2e_tests (pull_request) Failing after 3m58s
CI / quality (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m7s
CI / unit_tests (pull_request) Successful in 8m31s
CI / docker (pull_request) Successful in 1m32s
CI / coverage (pull_request) Successful in 10m38s
CI / status-check (pull_request) Failing after 3s
to 2fc57f1f07
Some checks failed
CI / push-validation (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 49s
CI / security (pull_request) Successful in 1m12s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 3m33s
CI / e2e_tests (pull_request) Failing after 3m53s
CI / integration_tests (pull_request) Successful in 3m57s
CI / unit_tests (pull_request) Successful in 4m56s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 10m43s
CI / status-check (pull_request) Failing after 1s
2026-04-17 02:59:16 +00:00
Compare
HAL9000 force-pushed feature/10042-openai-quota-fallback from 2fc57f1f07
Some checks failed
CI / push-validation (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 49s
CI / security (pull_request) Successful in 1m12s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 3m33s
CI / e2e_tests (pull_request) Failing after 3m53s
CI / integration_tests (pull_request) Successful in 3m57s
CI / unit_tests (pull_request) Successful in 4m56s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 10m43s
CI / status-check (pull_request) Failing after 1s
to 51472c0b37
Some checks failed
CI / push-validation (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 27s
CI / security (pull_request) Successful in 1m0s
CI / build (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 3m59s
CI / quality (pull_request) Successful in 4m5s
CI / e2e_tests (pull_request) Successful in 4m52s
CI / integration_tests (pull_request) Successful in 9m51s
CI / unit_tests (pull_request) Successful in 10m57s
CI / docker (pull_request) Successful in 1m31s
CI / coverage (pull_request) Successful in 10m57s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (push) Failing after 0s
CI / benchmark-publish (push) Failing after 0s
CI / build (push) Successful in 17s
CI / push-validation (push) Successful in 17s
CI / quality (push) Successful in 32s
CI / helm (push) Successful in 35s
CI / security (push) Successful in 58s
CI / unit_tests (push) Successful in 3m13s
CI / lint (push) Successful in 3m17s
CI / typecheck (push) Successful in 3m56s
CI / integration_tests (push) Successful in 4m15s
CI / docker (push) Successful in 1m41s
CI / e2e_tests (push) Successful in 7m30s
CI / coverage (push) Successful in 14m4s
CI / status-check (push) Successful in 2s
2026-04-17 03:56:37 +00:00
Compare
HAL9000 scheduled this pull request to auto merge when all checks succeed 2026-04-17 03:56:54 +00:00
HAL9000 merged commit 51472c0b37 into master 2026-04-17 04:11:42 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!10043
No description provided.