Add ASV benchmark tests for agents module sub-components (base, context_analysis, auto_debug graph, context_analysis graph) #10732

Open
opened 2026-04-19 08:44:06 +00:00 by HAL9000 · 0 comments
Owner

Summary

The agents/ module has four source files with no dedicated ASV benchmark coverage.

Source file Behave coverage Robot coverage Benchmark coverage
agents/base.py features/agent_skills_loader.feature (indirect) robot/agent_skills_loader.robot (indirect) none
agents/context_analysis.py features/context_analysis_coverage_boost.feature robot/context_analysis_agent.robot none
agents/graphs/auto_debug.py features/auto_debug_graph.feature none none
agents/graphs/context_analysis.py features/context_analysis_graph_coverage.feature robot/context_analysis_agent.robot (indirect) none

The existing benchmark files benchmarks/plan_generation_benchmark.py, benchmarks/agent_skills_loader_bench.py, and benchmarks/agent_skills_registry_bench.py cover plan generation and skill loading, but leave the base agent class, context analysis agent, auto-debug graph, and context analysis graph without any performance regression tracking.

Proposed Tests

Add the following ASV benchmark files under benchmarks/:

  1. benchmarks/agent_base_bench.py - benchmark Agent.init(), Agent.process_message() with mock LLM
  2. benchmarks/context_analysis_agent_bench.py - benchmark ContextAnalysisAgent.analyze() with representative fixture
  3. benchmarks/auto_debug_graph_bench.py - benchmark AutoDebugGraph.run() with mock LLM
  4. benchmarks/context_analysis_graph_bench.py - benchmark ContextAnalysisGraph.run() with mock LLM

All benchmarks must use mock LLM providers (not real API calls).

Subtasks

  • Create benchmarks/agent_base_bench.py
  • Create benchmarks/context_analysis_agent_bench.py
  • Create benchmarks/auto_debug_graph_bench.py
  • Create benchmarks/context_analysis_graph_bench.py
  • Verify all benchmarks run under nox -s benchmark without errors
  • Confirm no regressions in existing tests via nox

Acceptance Criteria

  • All four benchmark files exist under benchmarks/ and are importable
  • Each benchmark class has at least one time_* method
  • Benchmarks use mock LLM providers - no real API calls
  • nox -s benchmark completes without errors

Definition of Done

This issue is complete when all subtasks are completed, a commit is created with the exact commit message from Metadata, pushed to the branch from Metadata, and a PR is submitted, reviewed, and merged.

Duplicate Check

  • Searched open issues for agents benchmark, agents ASV, agents/base benchmark, auto_debug benchmark, context_analysis benchmark - no matches found.
  • Reviewed existing issues: #9046 covers LangGraph and TUI but explicitly excludes agents sub-components. Issue #9143 covers application/reactive/domain/shared but not agents. Other AUTO-INF-7 issues do not cover agents.
  • Searched closed issues for agents benchmark - no relevant matches.
  • Cross-area search: no existing issue covers benchmark gaps for agents/base.py, agents/context_analysis.py, agents/graphs/auto_debug.py, or agents/graphs/context_analysis.py.

Metadata

  • Commit message: test(bench): add ASV benchmarks for agents base, context_analysis, auto_debug, and context_analysis_graph
  • Branch name: test/bench-agents-subcomponents

Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-worker

## Summary The agents/ module has four source files with no dedicated ASV benchmark coverage. | Source file | Behave coverage | Robot coverage | Benchmark coverage | |---|---|---|---| | agents/base.py | features/agent_skills_loader.feature (indirect) | robot/agent_skills_loader.robot (indirect) | none | | agents/context_analysis.py | features/context_analysis_coverage_boost.feature | robot/context_analysis_agent.robot | none | | agents/graphs/auto_debug.py | features/auto_debug_graph.feature | none | none | | agents/graphs/context_analysis.py | features/context_analysis_graph_coverage.feature | robot/context_analysis_agent.robot (indirect) | none | The existing benchmark files benchmarks/plan_generation_benchmark.py, benchmarks/agent_skills_loader_bench.py, and benchmarks/agent_skills_registry_bench.py cover plan generation and skill loading, but leave the base agent class, context analysis agent, auto-debug graph, and context analysis graph without any performance regression tracking. ## Proposed Tests Add the following ASV benchmark files under benchmarks/: 1. benchmarks/agent_base_bench.py - benchmark Agent.__init__(), Agent.process_message() with mock LLM 2. benchmarks/context_analysis_agent_bench.py - benchmark ContextAnalysisAgent.analyze() with representative fixture 3. benchmarks/auto_debug_graph_bench.py - benchmark AutoDebugGraph.run() with mock LLM 4. benchmarks/context_analysis_graph_bench.py - benchmark ContextAnalysisGraph.run() with mock LLM All benchmarks must use mock LLM providers (not real API calls). ## Subtasks - [ ] Create benchmarks/agent_base_bench.py - [ ] Create benchmarks/context_analysis_agent_bench.py - [ ] Create benchmarks/auto_debug_graph_bench.py - [ ] Create benchmarks/context_analysis_graph_bench.py - [ ] Verify all benchmarks run under nox -s benchmark without errors - [ ] Confirm no regressions in existing tests via nox ## Acceptance Criteria - All four benchmark files exist under benchmarks/ and are importable - Each benchmark class has at least one time_* method - Benchmarks use mock LLM providers - no real API calls - nox -s benchmark completes without errors ## Definition of Done This issue is complete when all subtasks are completed, a commit is created with the exact commit message from Metadata, pushed to the branch from Metadata, and a PR is submitted, reviewed, and merged. ## Duplicate Check - Searched open issues for agents benchmark, agents ASV, agents/base benchmark, auto_debug benchmark, context_analysis benchmark - no matches found. - Reviewed existing issues: #9046 covers LangGraph and TUI but explicitly excludes agents sub-components. Issue #9143 covers application/reactive/domain/shared but not agents. Other AUTO-INF-7 issues do not cover agents. - Searched closed issues for agents benchmark - no relevant matches. - Cross-area search: no existing issue covers benchmark gaps for agents/base.py, agents/context_analysis.py, agents/graphs/auto_debug.py, or agents/graphs/context_analysis.py. ## Metadata - **Commit message:** test(bench): add ASV benchmarks for agents base, context_analysis, auto_debug, and context_analysis_graph - **Branch name:** test/bench-agents-subcomponents --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-worker
HAL9000 added this to the v3.9.0 milestone 2026-04-19 08:44:06 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10732
No description provided.