test(providers): add ASV performance benchmark suite for the providers module #3022

Merged
freemo merged 1 commit from test/missing-asv-benchmarks-providers into master 2026-04-05 04:37:15 +00:00
Owner

Summary

Adds 5 new ASV benchmark files under benchmarks/ covering all performance-sensitive paths in the providers module that lacked benchmark coverage. This closes the gap identified in #2800 by providing 68 benchmark methods across cost table construction, cost tracking, fallback selection, provider registry lookups, and LLM adapter instantiation.

Changes

  • benchmarks/providers_cost_table_bench.py — Benchmarks for ProviderCostTable covering:

    • Default construction and construction with custom entries
    • Iteration throughput across all providers/models
    • Fallback path for unknown provider lookups
  • benchmarks/providers_cost_tracker_bench.py — Benchmarks for CostTracker covering:

    • Construction with various budget configurations (no budget, daily cap, total cap, combined)
    • Accumulation throughput at 10-call and 50-call batch sizes with mixed providers
    • Daily spend tracking
    • get_cost_entry delegation to the underlying cost table
  • benchmarks/providers_fallback_selector_bench.py — Benchmarks for FallbackSelector covering:

    • Construction with custom provider order and an attached cost tracker
    • Selection when no providers are configured (exercises the full exhaustion path)
    • Selection with a configured provider at various list positions (best-case, mid-list, worst-case)
  • benchmarks/providers_registry_bench.py — Benchmarks for ProviderRegistry covering:

    • get_all_providers enumeration
    • get_provider_info by enum value and by string name
    • is_provider_configured checks
    • Multi-provider initialization sequences
  • benchmarks/providers_llm_adapters_bench.py — Benchmarks for LLM adapter instantiation covering:

    • LangChainChatProvider, AnthropicChatProvider, GoogleChatProvider, OpenAIChatProvider, and OpenRouterChatProvider
    • Multiple configuration variants per adapter (default model, alternate model, custom parameters)

Design Decisions

  • Audit of existing benchmarks before writing new ones: cost_controls_bench.py and provider_selection_bench.py were carefully reviewed to ensure zero duplication. All 68 new benchmark methods cover paths not already measured.
  • MagicMock for Settings objects: Real API keys are not required in the benchmark environment. Mocking Settings keeps benchmarks hermetic and runnable in CI without credentials.
  • Mock factories for LLM adapters: Adapter benchmarks measure pure instantiation cost (object construction, argument binding, internal wiring) without triggering any network calls or SDK initialisation that would require live credentials.
  • setup() fixtures isolate measurement: Each benchmark class uses ASV's setup() hook to construct all prerequisite objects before the timed region begins, ensuring that only the target operation is measured.
  • Single parallel wave: All 5 files were independent of each other and were implemented in a single parallel wave with no sequential dependencies.

Testing

  • Unit tests (Behave): N/A — benchmark-only PR; no behavioural logic added
  • Integration tests (Robot): N/A — benchmark-only PR
  • Coverage: N/A — benchmarks are not included in coverage measurement
  • Benchmarks: 68 new benchmark methods added across 5 files; all verified to execute without errors via direct Python execution
  • nox -e lint: PASSED
  • nox -e typecheck: PASSED (0 errors, 0 warnings)

Modules Affected

File Status
benchmarks/providers_cost_table_bench.py Added
benchmarks/providers_cost_tracker_bench.py Added
benchmarks/providers_fallback_selector_bench.py Added
benchmarks/providers_registry_bench.py Added
benchmarks/providers_llm_adapters_bench.py Added

No production source files were modified.

Closes #2800


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

## Summary Adds 5 new ASV benchmark files under `benchmarks/` covering all performance-sensitive paths in the `providers` module that lacked benchmark coverage. This closes the gap identified in #2800 by providing 68 benchmark methods across cost table construction, cost tracking, fallback selection, provider registry lookups, and LLM adapter instantiation. ## Changes - **`benchmarks/providers_cost_table_bench.py`** — Benchmarks for `ProviderCostTable` covering: - Default construction and construction with custom entries - Iteration throughput across all providers/models - Fallback path for unknown provider lookups - **`benchmarks/providers_cost_tracker_bench.py`** — Benchmarks for `CostTracker` covering: - Construction with various budget configurations (no budget, daily cap, total cap, combined) - Accumulation throughput at 10-call and 50-call batch sizes with mixed providers - Daily spend tracking - `get_cost_entry` delegation to the underlying cost table - **`benchmarks/providers_fallback_selector_bench.py`** — Benchmarks for `FallbackSelector` covering: - Construction with custom provider order and an attached cost tracker - Selection when no providers are configured (exercises the full exhaustion path) - Selection with a configured provider at various list positions (best-case, mid-list, worst-case) - **`benchmarks/providers_registry_bench.py`** — Benchmarks for `ProviderRegistry` covering: - `get_all_providers` enumeration - `get_provider_info` by enum value and by string name - `is_provider_configured` checks - Multi-provider initialization sequences - **`benchmarks/providers_llm_adapters_bench.py`** — Benchmarks for LLM adapter instantiation covering: - `LangChainChatProvider`, `AnthropicChatProvider`, `GoogleChatProvider`, `OpenAIChatProvider`, and `OpenRouterChatProvider` - Multiple configuration variants per adapter (default model, alternate model, custom parameters) ## Design Decisions - **Audit of existing benchmarks before writing new ones:** `cost_controls_bench.py` and `provider_selection_bench.py` were carefully reviewed to ensure zero duplication. All 68 new benchmark methods cover paths not already measured. - **`MagicMock` for `Settings` objects:** Real API keys are not required in the benchmark environment. Mocking `Settings` keeps benchmarks hermetic and runnable in CI without credentials. - **Mock factories for LLM adapters:** Adapter benchmarks measure pure instantiation cost (object construction, argument binding, internal wiring) without triggering any network calls or SDK initialisation that would require live credentials. - **`setup()` fixtures isolate measurement:** Each benchmark class uses ASV's `setup()` hook to construct all prerequisite objects before the timed region begins, ensuring that only the target operation is measured. - **Single parallel wave:** All 5 files were independent of each other and were implemented in a single parallel wave with no sequential dependencies. ## Testing - Unit tests (Behave): N/A — benchmark-only PR; no behavioural logic added - Integration tests (Robot): N/A — benchmark-only PR - Coverage: N/A — benchmarks are not included in coverage measurement - Benchmarks: **68 new benchmark methods added** across 5 files; all verified to execute without errors via direct Python execution - `nox -e lint`: ✅ PASSED - `nox -e typecheck`: ✅ PASSED (0 errors, 0 warnings) ## Modules Affected | File | Status | |------|--------| | `benchmarks/providers_cost_table_bench.py` | ➕ Added | | `benchmarks/providers_cost_tracker_bench.py` | ➕ Added | | `benchmarks/providers_fallback_selector_bench.py` | ➕ Added | | `benchmarks/providers_registry_bench.py` | ➕ Added | | `benchmarks/providers_llm_adapters_bench.py` | ➕ Added | No production source files were modified. ## Related Issues Closes #2800 --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker
test(providers): add ASV performance benchmark suite for the providers module
All checks were successful
CI / lint (pull_request) Successful in 21s
CI / build (pull_request) Successful in 17s
CI / helm (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 44s
CI / typecheck (pull_request) Successful in 1m2s
CI / security (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 6m54s
CI / docker (pull_request) Successful in 1m50s
CI / coverage (pull_request) Successful in 11m15s
CI / e2e_tests (pull_request) Successful in 22m16s
CI / integration_tests (pull_request) Successful in 23m31s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 58m37s
254fd07496
Implemented 5 new ASV benchmark files under benchmarks/:

- providers_cost_table_bench.py — ProviderCostTable construction (default +
  custom entries), iteration throughput across all providers/models, fallback
  path for unknown providers
- providers_cost_tracker_bench.py — CostTracker construction with various
  budget configurations, accumulation throughput (10/50 calls, mixed
  providers), daily spend tracking, get_cost_entry delegation
- providers_fallback_selector_bench.py — FallbackSelector construction with
  custom order and cost tracker, selection when no providers configured
  (exhausts full list), selection with configured provider at various positions
- providers_registry_bench.py — ProviderRegistry.get_all_providers,
  get_provider_info (by enum and string), is_provider_configured,
  multi-provider initialization
- providers_llm_adapters_bench.py — LangChainChatProvider,
  AnthropicChatProvider, GoogleChatProvider, OpenAIChatProvider,
  OpenRouterChatProvider instantiation with various configurations

Key design decisions:
- Carefully audited existing cost_controls_bench.py and
  provider_selection_bench.py to avoid duplicating any already-covered
  benchmarks
- Used MagicMock for Settings objects to avoid requiring real API keys in
  benchmarks
- LLM adapter benchmarks use mock factories to measure pure instantiation
  cost without network calls
- All benchmark classes use setup() fixtures to isolate measurement from
  fixture construction
- 68 benchmark methods total across 5 files, all verified to execute without
  errors

ISSUES CLOSED: #2800
freemo added this to the v3.8.0 milestone 2026-04-05 04:05:33 +00:00
Author
Owner

🔒 Review claimed by reviewer-pool-1 [claim-token: reviewer-pool-1-3022-1775362000]


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: ca-continuous-pr-reviewer

🔒 Review claimed by reviewer-pool-1 [claim-token: reviewer-pool-1-3022-1775362000] --- **Automated by CleverAgents Bot** Supervisor: PR Review | Agent: ca-continuous-pr-reviewer
freemo left a comment

Review: APPROVED

Summary

This PR adds 5 new ASV benchmark files (846 lines, 68 benchmark methods) covering all performance-sensitive paths in the providers module that lacked benchmark coverage. The implementation is thorough, well-structured, and follows project conventions.

Review Criteria

Specification Alignment

  • The providers module is a critical integration boundary implementing the Provider Registry abstraction per docs/specification.md. Adding benchmark coverage for cost tracking, registry operations, fallback selection, and LLM adapter instantiation aligns with the project's Multi-Level Testing Mandate.

Duplication Avoidance

  • Verified against existing cost_controls_bench.py (covers basic CostEntry, single-call record_usage, check_plan_budget, check_daily_budget, estimate_cost, basic FallbackSelector.select) and provider_selection_bench.py (covers registry init, get_default_provider_type, get_configured_providers, get_default_model).
  • New benchmarks cover genuinely different scenarios: construction variants, throughput at scale (10/50 calls), mixed-provider accumulation, daily spend retrieval, get_all_providers, get_provider_info, is_provider_configured, multi-provider init, and all 5 LLM adapter instantiation paths.

Code Quality

  • Consistent with existing benchmark file patterns (sys.path setup, importlib.reload, ASV class conventions)
  • Proper setup() fixture isolation ensures only target operations are measured
  • Full type annotations on all methods
  • Descriptive docstrings on every benchmark method
  • Module-level docstrings explicitly document what is and isn't covered

Correctness

  • reset_provider_registry() calls properly prevent cross-benchmark contamination
  • MagicMock usage for Settings is consistent with provider_selection_bench.py
  • Mock LLM factories correctly measure pure instantiation cost without network calls
  • No state leakage between benchmark iterations (ASV calls setup() before each method)

Security

  • No real API keys or secrets — all use fake bench keys
  • No file I/O or network calls in benchmarks

PR Metadata

  • Commit message follows Conventional Changelog format: test(providers): add ASV performance benchmark suite for the providers module
  • ISSUES CLOSED: #2800 footer present
  • Branch name matches issue metadata: test/missing-asv-benchmarks-providers
  • Milestone: v3.8.0 (matches issue)
  • Label: Type/Testing (appropriate)
  • Single clean commit

CI Status

  • lint, typecheck, security, quality, unit_tests, coverage, integration_tests, e2e_tests, build, docker, helm: all success
  • benchmark-publish, benchmark-regression, status-check: pending (benchmark jobs still running)

No issues found. Approving for merge.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: ca-pr-self-reviewer

## Review: APPROVED ✅ ### Summary This PR adds 5 new ASV benchmark files (846 lines, 68 benchmark methods) covering all performance-sensitive paths in the `providers` module that lacked benchmark coverage. The implementation is thorough, well-structured, and follows project conventions. ### Review Criteria #### Specification Alignment ✅ - The `providers` module is a critical integration boundary implementing the Provider Registry abstraction per `docs/specification.md`. Adding benchmark coverage for cost tracking, registry operations, fallback selection, and LLM adapter instantiation aligns with the project's Multi-Level Testing Mandate. #### Duplication Avoidance ✅ - Verified against existing `cost_controls_bench.py` (covers basic CostEntry, single-call record_usage, check_plan_budget, check_daily_budget, estimate_cost, basic FallbackSelector.select) and `provider_selection_bench.py` (covers registry init, get_default_provider_type, get_configured_providers, get_default_model). - New benchmarks cover genuinely different scenarios: construction variants, throughput at scale (10/50 calls), mixed-provider accumulation, daily spend retrieval, get_all_providers, get_provider_info, is_provider_configured, multi-provider init, and all 5 LLM adapter instantiation paths. #### Code Quality ✅ - Consistent with existing benchmark file patterns (sys.path setup, importlib.reload, ASV class conventions) - Proper `setup()` fixture isolation ensures only target operations are measured - Full type annotations on all methods - Descriptive docstrings on every benchmark method - Module-level docstrings explicitly document what is and isn't covered #### Correctness ✅ - `reset_provider_registry()` calls properly prevent cross-benchmark contamination - MagicMock usage for Settings is consistent with `provider_selection_bench.py` - Mock LLM factories correctly measure pure instantiation cost without network calls - No state leakage between benchmark iterations (ASV calls `setup()` before each method) #### Security ✅ - No real API keys or secrets — all use fake bench keys - No file I/O or network calls in benchmarks #### PR Metadata ✅ - Commit message follows Conventional Changelog format: `test(providers): add ASV performance benchmark suite for the providers module` - `ISSUES CLOSED: #2800` footer present - Branch name matches issue metadata: `test/missing-asv-benchmarks-providers` - Milestone: v3.8.0 (matches issue) - Label: `Type/Testing` (appropriate) - Single clean commit #### CI Status ✅ - lint, typecheck, security, quality, unit_tests, coverage, integration_tests, e2e_tests, build, docker, helm: all **success** - benchmark-publish, benchmark-regression, status-check: **pending** (benchmark jobs still running) No issues found. Approving for merge. --- **Automated by CleverAgents Bot** Supervisor: PR Review | Agent: ca-pr-self-reviewer
freemo scheduled this pull request to auto merge when all checks succeed 2026-04-05 04:35:09 +00:00
freemo merged commit 31f5997670 into master 2026-04-05 04:37:14 +00:00
freemo deleted branch test/missing-asv-benchmarks-providers 2026-04-05 04:37:15 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!3022
No description provided.