Unify provider factory logic: eliminate divergence between create_llm() and create_ai_provider() #10949

Closed
opened 2026-05-01 05:49:21 +00:00 by hurui200320 · 1 comment
Member

Metadata

  • Commit Message: refactor(providers): unify provider factory behind single source of truth
  • Branch: feature/m3-unify-provider-factory

Background and context

ProviderRegistry exposes two public factory methods that both instantiate LLM backends:

  1. create_ai_provider() → returns AIProviderInterface (used by plan service)
  2. create_llm() → returns BaseLanguageModel (used by reactive stream router / actor run)

Both methods contain separate if/elif chains with duplicated provider-switching logic:

  • create_ai_provider() handles OpenAI, Anthropic, Google, and OpenRouter with direct constructor calls, then falls back to a generic LangChainChatProvider + llm_factory closure for the rest.
  • create_llm() delegates to _create_provider_llm(), which handles a different subset of providers and omits OpenRouter entirely.

This divergence is what caused the bug where agents actor run openrouter/... fails. The fix for that bug is a band-aid. The root cause is that adding a new provider requires touching multiple locations, making divergence almost inevitable.

The design doc (docs/api/providers.md) presents both methods as working for all providers, but the implementation makes that promise impossible to keep without disciplined manual synchronization.


Current behavior

  • Adding a new provider requires updating create_ai_provider(), create_llm(), and often _create_provider_llm().
  • The two paths have diverged in the past (OpenRouter missing from create_llm) and will diverge again.
  • The same provider configuration logic (API key lookup, default model fallback, kwargs construction) is duplicated.

Expected behavior / definition of "done"

A single internal factory handles provider instantiation. Both create_ai_provider() and create_llm() delegate to it, differing only in what they wrap around the result:

  • create_llm() → returns the raw LLM from the factory
  • create_ai_provider() → wraps the raw LLM in a LangChainChatProvider (or a specialized adapter where needed)

Adding a new provider should require touching exactly one place.


Acceptance criteria

  • All provider instantiation logic lives in a single internal method
  • create_ai_provider() delegates to the unified factory, then wraps the result
  • create_llm() delegates to the unified factory, then returns the raw LLM
  • No duplicated if/elif provider-switching logic remains
  • All existing BDD scenarios continue to pass
  • Coverage ≥ 97%, all quality gates pass

Supporting information

  • src/cleveragents/providers/registry.py — lines 404-712 (both factory methods)
  • docs/api/providers.md — design doc showing both methods should support all providers
  • src/cleveragents/providers/llm/langchain_chat_provider.py — the generic wrapper used by create_ai_provider

Subtasks

  • Extract unified _create_provider_instance(provider_type, model_id, **kwargs) factory
  • Refactor create_ai_provider() to delegate to unified factory, then wrap
  • Refactor create_llm() to delegate to unified factory, then return raw LLM
  • Remove _create_provider_llm() or reduce it to a thin wrapper
  • Verify all existing BDD scenarios pass
  • Run nox -s unit_tests, fix failures
  • Run nox -s coverage_report, verify ≥ 97%
  • Run nox full suite
  • Update changelog

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged.
  • Adding a new ProviderType requires modifying only the unified factory method.
## Metadata - **Commit Message**: `refactor(providers): unify provider factory behind single source of truth` - **Branch**: `feature/m3-unify-provider-factory` --- ## Background and context `ProviderRegistry` exposes two public factory methods that both instantiate LLM backends: 1. `create_ai_provider()` → returns `AIProviderInterface` (used by plan service) 2. `create_llm()` → returns `BaseLanguageModel` (used by reactive stream router / `actor run`) Both methods contain **separate `if/elif` chains** with duplicated provider-switching logic: - `create_ai_provider()` handles OpenAI, Anthropic, Google, and OpenRouter with direct constructor calls, then falls back to a generic `LangChainChatProvider` + `llm_factory` closure for the rest. - `create_llm()` delegates to `_create_provider_llm()`, which handles a different subset of providers and **omits OpenRouter entirely**. This divergence is what caused the bug where `agents actor run openrouter/...` fails. The fix for that bug is a band-aid. The root cause is that adding a new provider requires touching **multiple locations**, making divergence almost inevitable. The design doc (`docs/api/providers.md`) presents both methods as working for all providers, but the implementation makes that promise impossible to keep without disciplined manual synchronization. --- ## Current behavior - Adding a new provider requires updating `create_ai_provider()`, `create_llm()`, and often `_create_provider_llm()`. - The two paths have diverged in the past (OpenRouter missing from `create_llm`) and will diverge again. - The same provider configuration logic (API key lookup, default model fallback, kwargs construction) is duplicated. --- ## Expected behavior / definition of "done" A **single internal factory** handles provider instantiation. Both `create_ai_provider()` and `create_llm()` delegate to it, differing only in what they wrap around the result: - `create_llm()` → returns the raw LLM from the factory - `create_ai_provider()` → wraps the raw LLM in a `LangChainChatProvider` (or a specialized adapter where needed) Adding a new provider should require touching **exactly one** place. --- ## Acceptance criteria - [ ] All provider instantiation logic lives in a single internal method - [ ] `create_ai_provider()` delegates to the unified factory, then wraps the result - [ ] `create_llm()` delegates to the unified factory, then returns the raw LLM - [ ] No duplicated `if/elif` provider-switching logic remains - [ ] All existing BDD scenarios continue to pass - [ ] Coverage ≥ 97%, all quality gates pass --- ## Supporting information - `src/cleveragents/providers/registry.py` — lines 404-712 (both factory methods) - `docs/api/providers.md` — design doc showing both methods should support all providers - `src/cleveragents/providers/llm/langchain_chat_provider.py` — the generic wrapper used by `create_ai_provider` --- ## Subtasks - [ ] Extract unified `_create_provider_instance(provider_type, model_id, **kwargs)` factory - [ ] Refactor `create_ai_provider()` to delegate to unified factory, then wrap - [ ] Refactor `create_llm()` to delegate to unified factory, then return raw LLM - [ ] Remove `_create_provider_llm()` or reduce it to a thin wrapper - [ ] Verify all existing BDD scenarios pass - [ ] Run `nox -s unit_tests`, fix failures - [ ] Run `nox -s coverage_report`, verify ≥ 97% - [ ] Run `nox` full suite - [ ] Update changelog --- ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged**. - Adding a new `ProviderType` requires modifying only the unified factory method.
hurui200320 added this to the v3.2.0 milestone 2026-05-01 05:50:25 +00:00
Author
Member

Implementation Notes

Design decisions

_get_api_key(provider_type) -> str (new helper, ProviderRegistry)
Consolidates the three separate API-key guard blocks that previously existed in create_llm(), the OpenRouter branch of _create_provider_llm(), and every branch of create_ai_provider(). Returns the raw key string for providers that have one, or "" for key-free providers (MOCK). The error message format is standardised: "Provider {name} is not configured. Please set the {ENV_VAR} environment variable." — matching the existing wording so all tests continue to pass.

_create_provider_instance(provider_type, model_id, **kwargs) -> BaseLanguageModel (new unified factory, replaces _create_provider_llm())
Single source of truth for raw LLM construction. Calls _get_api_key() once at the top (validates and captures the key for the OpenRouter branch which needs to pass it explicitly to ChatOpenAI). Handles all ten ProviderType values including MOCK via FakeListLLM. The unreachable fallthrough raise ValueError(f"Unsupported provider type: {provider_type}") is retained as a safety net for future ProviderType additions.

create_ai_provider() uniform LangChainChatProvider wrapping
The four specialised adapter constructors (OpenAIChatProvider, AnthropicChatProvider, GoogleChatProvider, OpenRouterChatProvider) are no longer instantiated by the registry. All providers are wrapped via the same LangChainChatProvider + factory-closure pattern. The specialised adapters remain in the codebase for any direct consumers; the registry simply no longer uses them. This matches the approach that was already used for Groq, Together, Cohere, and Azure.

Eager validation in create_ai_provider()
_get_api_key() is called before constructing the LangChainChatProvider. This preserves fail-fast behaviour — a missing API key raises at call time rather than when the factory closure is first invoked at plan execution time.

Test changes

All BDD step definitions and feature scenarios that called _create_provider_llm have been renamed to _create_provider_instance. The four isinstance(context.ai_provider, OpenAIChatProvider/AnthropicChatProvider/GoogleChatProvider/OpenRouterChatProvider) assertions in provider_registry_coverage.feature are updated to isinstance(context.ai_provider, LangChainChatProvider). A new scenario _create_provider_instance handles MOCK provider was added to cover the MOCK branch by stubbing langchain_community.llms.FakeListLLM.

Quality gates

  • nox -e lint: all checks passed
  • nox -e typecheck: 0 errors (3 pre-existing reportMissingModuleSource warnings for langchain_groq, langchain_together, langchain_cohere unchanged)
  • nox -e unit_tests: 15,737 scenarios passed
  • nox -e integration_tests: passed

PR: #10963

## Implementation Notes ### Design decisions **`_get_api_key(provider_type) -> str`** (new helper, `ProviderRegistry`) Consolidates the three separate API-key guard blocks that previously existed in `create_llm()`, the OpenRouter branch of `_create_provider_llm()`, and every branch of `create_ai_provider()`. Returns the raw key string for providers that have one, or `""` for key-free providers (MOCK). The error message format is standardised: `"Provider {name} is not configured. Please set the {ENV_VAR} environment variable."` — matching the existing wording so all tests continue to pass. **`_create_provider_instance(provider_type, model_id, **kwargs) -> BaseLanguageModel`** (new unified factory, replaces `_create_provider_llm()`) Single source of truth for raw LLM construction. Calls `_get_api_key()` once at the top (validates and captures the key for the OpenRouter branch which needs to pass it explicitly to `ChatOpenAI`). Handles all ten `ProviderType` values including MOCK via `FakeListLLM`. The unreachable fallthrough `raise ValueError(f"Unsupported provider type: {provider_type}")` is retained as a safety net for future ProviderType additions. **`create_ai_provider()` uniform `LangChainChatProvider` wrapping** The four specialised adapter constructors (`OpenAIChatProvider`, `AnthropicChatProvider`, `GoogleChatProvider`, `OpenRouterChatProvider`) are no longer instantiated by the registry. All providers are wrapped via the same `LangChainChatProvider` + factory-closure pattern. The specialised adapters remain in the codebase for any direct consumers; the registry simply no longer uses them. This matches the approach that was already used for Groq, Together, Cohere, and Azure. **Eager validation in `create_ai_provider()`** `_get_api_key()` is called before constructing the `LangChainChatProvider`. This preserves fail-fast behaviour — a missing API key raises at call time rather than when the factory closure is first invoked at plan execution time. ### Test changes All BDD step definitions and feature scenarios that called `_create_provider_llm` have been renamed to `_create_provider_instance`. The four `isinstance(context.ai_provider, OpenAIChatProvider/AnthropicChatProvider/GoogleChatProvider/OpenRouterChatProvider)` assertions in `provider_registry_coverage.feature` are updated to `isinstance(context.ai_provider, LangChainChatProvider)`. A new scenario `_create_provider_instance handles MOCK provider` was added to cover the MOCK branch by stubbing `langchain_community.llms.FakeListLLM`. ### Quality gates - `nox -e lint`: ✅ all checks passed - `nox -e typecheck`: ✅ 0 errors (3 pre-existing `reportMissingModuleSource` warnings for langchain_groq, langchain_together, langchain_cohere unchanged) - `nox -e unit_tests`: ✅ 15,737 scenarios passed - `nox -e integration_tests`: ✅ passed PR: https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/10963
hurui200320 2026-05-05 07:43:38 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#10949
No description provided.