test(providers): add ASV performance benchmark suite for the providers module #3022

2026-04-05T04:01:42Z

freemo commented

2026-04-05 04:01:42 +00:00

Summary

Adds 5 new ASV benchmark files under benchmarks/ covering all performance-sensitive paths in the providers module that lacked benchmark coverage. This closes the gap identified in #2800 by providing 68 benchmark methods across cost table construction, cost tracking, fallback selection, provider registry lookups, and LLM adapter instantiation.

Changes

benchmarks/providers_cost_table_bench.py — Benchmarks for ProviderCostTable covering:
- Default construction and construction with custom entries
- Iteration throughput across all providers/models
- Fallback path for unknown provider lookups
benchmarks/providers_cost_tracker_bench.py — Benchmarks for CostTracker covering:
- Construction with various budget configurations (no budget, daily cap, total cap, combined)
- Accumulation throughput at 10-call and 50-call batch sizes with mixed providers
- Daily spend tracking
- get_cost_entry delegation to the underlying cost table
benchmarks/providers_fallback_selector_bench.py — Benchmarks for FallbackSelector covering:
- Construction with custom provider order and an attached cost tracker
- Selection when no providers are configured (exercises the full exhaustion path)
- Selection with a configured provider at various list positions (best-case, mid-list, worst-case)
benchmarks/providers_registry_bench.py — Benchmarks for ProviderRegistry covering:
- get_all_providers enumeration
- get_provider_info by enum value and by string name
- is_provider_configured checks
- Multi-provider initialization sequences
benchmarks/providers_llm_adapters_bench.py — Benchmarks for LLM adapter instantiation covering:
- LangChainChatProvider, AnthropicChatProvider, GoogleChatProvider, OpenAIChatProvider, and OpenRouterChatProvider
- Multiple configuration variants per adapter (default model, alternate model, custom parameters)

Design Decisions

Audit of existing benchmarks before writing new ones: cost_controls_bench.py and provider_selection_bench.py were carefully reviewed to ensure zero duplication. All 68 new benchmark methods cover paths not already measured.
MagicMock for Settings objects: Real API keys are not required in the benchmark environment. Mocking Settings keeps benchmarks hermetic and runnable in CI without credentials.
Mock factories for LLM adapters: Adapter benchmarks measure pure instantiation cost (object construction, argument binding, internal wiring) without triggering any network calls or SDK initialisation that would require live credentials.
setup() fixtures isolate measurement: Each benchmark class uses ASV's setup() hook to construct all prerequisite objects before the timed region begins, ensuring that only the target operation is measured.
Single parallel wave: All 5 files were independent of each other and were implemented in a single parallel wave with no sequential dependencies.

Testing

Unit tests (Behave): N/A — benchmark-only PR; no behavioural logic added
Integration tests (Robot): N/A — benchmark-only PR
Coverage: N/A — benchmarks are not included in coverage measurement
Benchmarks: 68 new benchmark methods added across 5 files; all verified to execute without errors via direct Python execution
nox -e lint: ✅ PASSED
nox -e typecheck: ✅ PASSED (0 errors, 0 warnings)

Modules Affected

File	Status
`benchmarks/providers_cost_table_bench.py`	➕ Added
`benchmarks/providers_cost_tracker_bench.py`	➕ Added
`benchmarks/providers_fallback_selector_bench.py`	➕ Added
`benchmarks/providers_registry_bench.py`	➕ Added
`benchmarks/providers_llm_adapters_bench.py`	➕ Added

No production source files were modified.

Related Issues

Closes #2800

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

## Summary Adds 5 new ASV benchmark files under `benchmarks/` covering all performance-sensitive paths in the `providers` module that lacked benchmark coverage. This closes the gap identified in #2800 by providing 68 benchmark methods across cost table construction, cost tracking, fallback selection, provider registry lookups, and LLM adapter instantiation. ## Changes - **`benchmarks/providers_cost_table_bench.py`** — Benchmarks for `ProviderCostTable` covering: - Default construction and construction with custom entries - Iteration throughput across all providers/models - Fallback path for unknown provider lookups - **`benchmarks/providers_cost_tracker_bench.py`** — Benchmarks for `CostTracker` covering: - Construction with various budget configurations (no budget, daily cap, total cap, combined) - Accumulation throughput at 10-call and 50-call batch sizes with mixed providers - Daily spend tracking - `get_cost_entry` delegation to the underlying cost table - **`benchmarks/providers_fallback_selector_bench.py`** — Benchmarks for `FallbackSelector` covering: - Construction with custom provider order and an attached cost tracker - Selection when no providers are configured (exercises the full exhaustion path) - Selection with a configured provider at various list positions (best-case, mid-list, worst-case) - **`benchmarks/providers_registry_bench.py`** — Benchmarks for `ProviderRegistry` covering: - `get_all_providers` enumeration - `get_provider_info` by enum value and by string name - `is_provider_configured` checks - Multi-provider initialization sequences - **`benchmarks/providers_llm_adapters_bench.py`** — Benchmarks for LLM adapter instantiation covering: - `LangChainChatProvider`, `AnthropicChatProvider`, `GoogleChatProvider`, `OpenAIChatProvider`, and `OpenRouterChatProvider` - Multiple configuration variants per adapter (default model, alternate model, custom parameters) ## Design Decisions - **Audit of existing benchmarks before writing new ones:** `cost_controls_bench.py` and `provider_selection_bench.py` were carefully reviewed to ensure zero duplication. All 68 new benchmark methods cover paths not already measured. - **`MagicMock` for `Settings` objects:** Real API keys are not required in the benchmark environment. Mocking `Settings` keeps benchmarks hermetic and runnable in CI without credentials. - **Mock factories for LLM adapters:** Adapter benchmarks measure pure instantiation cost (object construction, argument binding, internal wiring) without triggering any network calls or SDK initialisation that would require live credentials. - **`setup()` fixtures isolate measurement:** Each benchmark class uses ASV's `setup()` hook to construct all prerequisite objects before the timed region begins, ensuring that only the target operation is measured. - **Single parallel wave:** All 5 files were independent of each other and were implemented in a single parallel wave with no sequential dependencies. ## Testing - Unit tests (Behave): N/A — benchmark-only PR; no behavioural logic added - Integration tests (Robot): N/A — benchmark-only PR - Coverage: N/A — benchmarks are not included in coverage measurement - Benchmarks: **68 new benchmark methods added** across 5 files; all verified to execute without errors via direct Python execution - `nox -e lint`: ✅ PASSED - `nox -e typecheck`: ✅ PASSED (0 errors, 0 warnings) ## Modules Affected | File | Status | |------|--------| | `benchmarks/providers_cost_table_bench.py` | ➕ Added | | `benchmarks/providers_cost_tracker_bench.py` | ➕ Added | | `benchmarks/providers_fallback_selector_bench.py` | ➕ Added | | `benchmarks/providers_registry_bench.py` | ➕ Added | | `benchmarks/providers_llm_adapters_bench.py` | ➕ Added | No production source files were modified. ## Related Issues Closes #2800 --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker

freemo added 1 commit 2026-04-05 04:01:42 +00:00

test(providers): add ASV performance benchmark suite for the providers module

CI / lint (pull_request) Successful in 21s

Details

CI / build (pull_request) Successful in 17s

Details

CI / helm (pull_request) Successful in 23s

Details

CI / quality (pull_request) Successful in 44s

Details

CI / typecheck (pull_request) Successful in 1m2s

Details

CI / security (pull_request) Successful in 1m2s

Details

CI / unit_tests (pull_request) Successful in 6m54s

Details

CI / docker (pull_request) Successful in 1m50s

Details

CI / coverage (pull_request) Successful in 11m15s

Details

CI / e2e_tests (pull_request) Successful in 22m16s

Details

CI / integration_tests (pull_request) Successful in 23m31s

Details

CI / status-check (pull_request) Successful in 1s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / benchmark-regression (pull_request) Successful in 58m37s

Details

254fd07496

Implemented 5 new ASV benchmark files under benchmarks/:

- providers_cost_table_bench.py — ProviderCostTable construction (default +
  custom entries), iteration throughput across all providers/models, fallback
  path for unknown providers
- providers_cost_tracker_bench.py — CostTracker construction with various
  budget configurations, accumulation throughput (10/50 calls, mixed
  providers), daily spend tracking, get_cost_entry delegation
- providers_fallback_selector_bench.py — FallbackSelector construction with
  custom order and cost tracker, selection when no providers configured
  (exhausts full list), selection with configured provider at various positions
- providers_registry_bench.py — ProviderRegistry.get_all_providers,
  get_provider_info (by enum and string), is_provider_configured,
  multi-provider initialization
- providers_llm_adapters_bench.py — LangChainChatProvider,
  AnthropicChatProvider, GoogleChatProvider, OpenAIChatProvider,
  OpenRouterChatProvider instantiation with various configurations

Key design decisions:
- Carefully audited existing cost_controls_bench.py and
  provider_selection_bench.py to avoid duplicating any already-covered
  benchmarks
- Used MagicMock for Settings objects to avoid requiring real API keys in
  benchmarks
- LLM adapter benchmarks use mock factories to measure pure instantiation
  cost without network calls
- All benchmark classes use setup() fixtures to isolate measurement from
  fixture construction
- 68 benchmark methods total across 5 files, all verified to execute without
  errors

ISSUES CLOSED: #2800

freemo added this to the v3.8.0 milestone 2026-04-05 04:05:33 +00:00

test(providers): add ASV performance benchmark suite for the providers module #3022

Summary

Changes

Design Decisions

Testing

Modules Affected

Related Issues

Review: APPROVED ✅

Summary

Review Criteria

Specification Alignment ✅

Duplication Avoidance ✅

Code Quality ✅

Correctness ✅

Security ✅

PR Metadata ✅

CI Status ✅