feat(testing): Modify test harness for conditional real LLM endpoints #685

Closed
opened 2026-03-11 18:10:49 +00:00 by freemo · 1 comment
Owner

Background and Context

This issue is a companion to the @mocked/@llm tagging system issue. While the tagging system defines the interface (which tags mean what), this issue covers the runtime wiring — modifying the test harness so that when a test is tagged @llm, a real LLM provider is configured and used instead of MockAIProvider.

Currently, MockAIProvider generates deterministic Change objects based on prompt keywords ("error handling", "test", "refactor"). Real LLM integration requires configuring actual API keys, handling rate limits, managing costs, and dealing with non-deterministic responses.

Expected Behavior

When a test is tagged @llm:

  1. MockAIProvider is NOT injected
  2. The real provider from src/cleveragents/providers/llm/ is initialized
  3. API keys are loaded from environment variables
  4. Tests handle non-deterministic responses (assert on structure, not exact content)
  5. Rate limiting / retry logic is respected
  6. Cost tracking is active (per providers/cost_tracker.py)

When a test is tagged @llm_anthropic:

  • ANTHROPIC_API_KEY must be present in env, else test is skipped
  • Anthropic provider is configured

When a test is tagged @llm_openai:

  • OPENAI_API_KEY must be present in env, else test is skipped
  • OpenAI provider is configured

Acceptance Criteria

  • Real LLM provider initialization works in test context
  • API key validation and graceful skip on missing keys
  • Rate limiting / retry is respected during tests
  • Cost tracking logs test LLM costs
  • Non-deterministic response handling patterns documented
  • Robot Framework equivalent implemented

Metadata

  • Commit message: feat(testing): wire real LLM providers for @llm-tagged integration tests
  • Branch name: feature/m3-test-llm-harness

Subtasks

  • Create LLM provider factory for test context
  • Add API key validation and skip logic
  • Configure rate limiting for test runs
  • Add cost tracking integration
  • Test with actual Anthropic and OpenAI endpoints
  • Document non-deterministic assertion patterns

Definition of Done

  • Real LLM tests execute successfully when API keys are present
  • Tests skip gracefully when keys are absent
  • Cost tracking accurately reports test LLM spend
  • No flaky failures from non-deterministic responses
## Background and Context This issue is a companion to the `@mocked/@llm` tagging system issue. While the tagging system defines the **interface** (which tags mean what), this issue covers the **runtime wiring** — modifying the test harness so that when a test is tagged `@llm`, a real LLM provider is configured and used instead of `MockAIProvider`. Currently, `MockAIProvider` generates deterministic `Change` objects based on prompt keywords ("error handling", "test", "refactor"). Real LLM integration requires configuring actual API keys, handling rate limits, managing costs, and dealing with non-deterministic responses. ## Expected Behavior When a test is tagged `@llm`: 1. `MockAIProvider` is NOT injected 2. The real provider from `src/cleveragents/providers/llm/` is initialized 3. API keys are loaded from environment variables 4. Tests handle non-deterministic responses (assert on structure, not exact content) 5. Rate limiting / retry logic is respected 6. Cost tracking is active (per `providers/cost_tracker.py`) When a test is tagged `@llm_anthropic`: - `ANTHROPIC_API_KEY` must be present in env, else test is skipped - Anthropic provider is configured When a test is tagged `@llm_openai`: - `OPENAI_API_KEY` must be present in env, else test is skipped - OpenAI provider is configured ## Acceptance Criteria - [ ] Real LLM provider initialization works in test context - [ ] API key validation and graceful skip on missing keys - [ ] Rate limiting / retry is respected during tests - [ ] Cost tracking logs test LLM costs - [ ] Non-deterministic response handling patterns documented - [ ] Robot Framework equivalent implemented ## Metadata - **Commit message**: `feat(testing): wire real LLM providers for @llm-tagged integration tests` - **Branch name**: `feature/m3-test-llm-harness` ## Subtasks - [ ] Create LLM provider factory for test context - [ ] Add API key validation and skip logic - [ ] Configure rate limiting for test runs - [ ] Add cost tracking integration - [ ] Test with actual Anthropic and OpenAI endpoints - [ ] Document non-deterministic assertion patterns ## Definition of Done - Real LLM tests execute successfully when API keys are present - Tests skip gracefully when keys are absent - Cost tracking accurately reports test LLM spend - No flaky failures from non-deterministic responses
freemo added this to the v3.2.0 milestone 2026-03-11 18:13:19 +00:00
Author
Owner

Closing as superseded. The approach has changed from conditionally supporting real LLM execution to prohibiting all mocking in integration tests outright. Integration tests must always exercise real services and real endpoints. Replaced by new issues covering mock removal from integration tests and CI configuration for LLM API keys.

**Closing as superseded.** The approach has changed from conditionally supporting real LLM execution to prohibiting all mocking in integration tests outright. Integration tests must always exercise real services and real endpoints. Replaced by new issues covering mock removal from integration tests and CI configuration for LLM API keys.
freemo 2026-03-11 19:00:29 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#685
No description provided.