Organize ASV benchmark files into domain-aligned subdirectories #2034

Open
opened 2026-04-03 03:01:48 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: refactor/benchmarks-subdirectory-organization
  • Commit Message: refactor(benchmarks): organize ASV benchmark files into domain-aligned subdirectories
  • Milestone: v3.8.0
  • Parent Epic: #1678

Overview

The benchmarks/ directory currently contains all 223 ASV benchmark files in a single flat structure. As the benchmark suite has grown to cover every major domain of the CleverAgents platform — actors, plans, tools, skills, sessions, resources, security, context, and more — the flat layout makes it increasingly difficult to navigate, understand benchmark scope, and maintain related files together.

This mirrors the same structural problem already identified and tracked for Robot Framework files (#2028) and Behave feature files (#2021).

Proposed subdirectory groupings (illustrative, not exhaustive):

Subdirectory Example files
benchmarks/actors/ actor_cli_bench.py, actor_compiler_bench.py, actor_hierarchy_bench.py, actor_loading_bench.py, actor_registry_bench.py, actor_runtime_bench.py, actor_schema_bench.py
benchmarks/plans/ plan_cli_bench.py, plan_diff_bench.py, plan_execute_bench.py, plan_generation_benchmark.py, plan_lifecycle_persistence_bench.py, plan_model_bench.py, plan_phase_bench.py, plan_resume_bench.py
benchmarks/tools/ tool_add_persist_bench.py, tool_builtin_file_bench.py, tool_cli_bench.py, tool_lifecycle_bench.py, tool_model_bench.py, tool_registry_bench.py, tool_router_bench.py, tool_runtime_bench.py
benchmarks/skills/ skill_add_persist_bench.py, skill_cli_bench.py, skill_context_bench.py, skill_flatten_bench.py, skill_protocol_bench.py, skill_registry_bench.py, skill_resolution_bench.py, skill_schema_bench.py
benchmarks/sessions/ session_cli_bench.py, session_create_error_bench.py, session_list_bench.py, session_model_bench.py, session_persistence_bench.py
benchmarks/resources/ resource_cli_bench.py, resource_dag_bench.py, resource_handler_bench.py, resource_registry_bench.py, resource_repository_bench.py, resource_service_bench.py, resource_type_*.py
benchmarks/projects/ project_cli_bench.py, project_context_cli_bench.py, project_context_policy_bench.py, project_create_persist_bench.py, project_migration_bench.py, project_repository_bench.py
benchmarks/security/ security_async_cleanup_bench.py, security_audit_bench.py, security_eval_bench.py, security_exception_bench.py, security_readonly_bench.py, security_scan_bench.py, security_secrets_bench.py, security_template_bench.py
benchmarks/context/ context_assembly_scaling_bench.py, context_fragment_models_bench.py, context_indexing_bench.py, context_strategies_bench.py, context_tiers_bench.py, unified_context_models_bench.py
benchmarks/decisions/ decision_correction_bench.py, decision_correction_model_bench.py, decision_di_bench.py, decision_model_bench.py, decision_persistence_bench.py, decision_recording_bench.py
benchmarks/cli/ cli_benchmark.py, cli_core_bench.py, cli_extension_tests_bench.py, cli_extensions_bench.py, cli_format_bench.py, cli_init_yes_bench.py, cli_robot_flow_bench.py
benchmarks/smoke/ m1_sourcecode_smoke_bench.py, m2_actor_tool_smoke_bench.py, m3_smoke_bench.py, m4_smoke_bench.py, m5_smoke_bench.py, m6_acceptance_bench.py
benchmarks/infra/ bench_audit_service.py, bench_event_bus.py, bench_metrics_collection.py, bench_plugin_loader.py, bench_subprocess_overhead.py, bench_unit_tests.py, coverage_report_bench.py

Each subdirectory must contain an __init__.py so ASV can discover the benchmarks correctly. The shared helper _session_bench_common.py should be moved to a benchmarks/shared/ or benchmarks/common/ location and all imports updated accordingly.

Subtasks

  • Audit all 223 benchmark files and assign each to a target subdirectory
  • Create the subdirectory structure with __init__.py in each directory
  • Move all benchmark files to their respective subdirectories
  • Update all cross-file imports (e.g., _session_bench_common) to reflect new paths
  • Verify ASV can discover and run all benchmarks after reorganization (asv run or nox -e benchmarks)
  • Update asv.conf.json (or equivalent) if benchmark discovery paths need adjustment
  • Update any CI configuration that references specific benchmark file paths
  • Confirm all nox sessions pass after the reorganization

Definition of Done

  • All benchmark files are organized into domain-aligned subdirectories mirroring the src/cleveragents/ module structure
  • Each subdirectory contains an __init__.py for ASV discovery
  • No benchmark files remain in the top-level benchmarks/ directory except __init__.py and asv.conf.json (if present)
  • All cross-file imports are updated and resolve correctly
  • asv run (or equivalent nox session) discovers and executes all benchmarks without errors
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Unknown | Agent: ca-new-issue-creator

## Metadata - **Branch**: `refactor/benchmarks-subdirectory-organization` - **Commit Message**: `refactor(benchmarks): organize ASV benchmark files into domain-aligned subdirectories` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Overview The `benchmarks/` directory currently contains all 223 ASV benchmark files in a single flat structure. As the benchmark suite has grown to cover every major domain of the CleverAgents platform — actors, plans, tools, skills, sessions, resources, security, context, and more — the flat layout makes it increasingly difficult to navigate, understand benchmark scope, and maintain related files together. This mirrors the same structural problem already identified and tracked for Robot Framework files (#2028) and Behave feature files (#2021). **Proposed subdirectory groupings** (illustrative, not exhaustive): | Subdirectory | Example files | |---|---| | `benchmarks/actors/` | `actor_cli_bench.py`, `actor_compiler_bench.py`, `actor_hierarchy_bench.py`, `actor_loading_bench.py`, `actor_registry_bench.py`, `actor_runtime_bench.py`, `actor_schema_bench.py` | | `benchmarks/plans/` | `plan_cli_bench.py`, `plan_diff_bench.py`, `plan_execute_bench.py`, `plan_generation_benchmark.py`, `plan_lifecycle_persistence_bench.py`, `plan_model_bench.py`, `plan_phase_bench.py`, `plan_resume_bench.py` | | `benchmarks/tools/` | `tool_add_persist_bench.py`, `tool_builtin_file_bench.py`, `tool_cli_bench.py`, `tool_lifecycle_bench.py`, `tool_model_bench.py`, `tool_registry_bench.py`, `tool_router_bench.py`, `tool_runtime_bench.py` | | `benchmarks/skills/` | `skill_add_persist_bench.py`, `skill_cli_bench.py`, `skill_context_bench.py`, `skill_flatten_bench.py`, `skill_protocol_bench.py`, `skill_registry_bench.py`, `skill_resolution_bench.py`, `skill_schema_bench.py` | | `benchmarks/sessions/` | `session_cli_bench.py`, `session_create_error_bench.py`, `session_list_bench.py`, `session_model_bench.py`, `session_persistence_bench.py` | | `benchmarks/resources/` | `resource_cli_bench.py`, `resource_dag_bench.py`, `resource_handler_bench.py`, `resource_registry_bench.py`, `resource_repository_bench.py`, `resource_service_bench.py`, `resource_type_*.py` | | `benchmarks/projects/` | `project_cli_bench.py`, `project_context_cli_bench.py`, `project_context_policy_bench.py`, `project_create_persist_bench.py`, `project_migration_bench.py`, `project_repository_bench.py` | | `benchmarks/security/` | `security_async_cleanup_bench.py`, `security_audit_bench.py`, `security_eval_bench.py`, `security_exception_bench.py`, `security_readonly_bench.py`, `security_scan_bench.py`, `security_secrets_bench.py`, `security_template_bench.py` | | `benchmarks/context/` | `context_assembly_scaling_bench.py`, `context_fragment_models_bench.py`, `context_indexing_bench.py`, `context_strategies_bench.py`, `context_tiers_bench.py`, `unified_context_models_bench.py` | | `benchmarks/decisions/` | `decision_correction_bench.py`, `decision_correction_model_bench.py`, `decision_di_bench.py`, `decision_model_bench.py`, `decision_persistence_bench.py`, `decision_recording_bench.py` | | `benchmarks/cli/` | `cli_benchmark.py`, `cli_core_bench.py`, `cli_extension_tests_bench.py`, `cli_extensions_bench.py`, `cli_format_bench.py`, `cli_init_yes_bench.py`, `cli_robot_flow_bench.py` | | `benchmarks/smoke/` | `m1_sourcecode_smoke_bench.py`, `m2_actor_tool_smoke_bench.py`, `m3_smoke_bench.py`, `m4_smoke_bench.py`, `m5_smoke_bench.py`, `m6_acceptance_bench.py` | | `benchmarks/infra/` | `bench_audit_service.py`, `bench_event_bus.py`, `bench_metrics_collection.py`, `bench_plugin_loader.py`, `bench_subprocess_overhead.py`, `bench_unit_tests.py`, `coverage_report_bench.py` | Each subdirectory must contain an `__init__.py` so ASV can discover the benchmarks correctly. The shared helper `_session_bench_common.py` should be moved to a `benchmarks/shared/` or `benchmarks/common/` location and all imports updated accordingly. ## Subtasks - [ ] Audit all 223 benchmark files and assign each to a target subdirectory - [ ] Create the subdirectory structure with `__init__.py` in each directory - [ ] Move all benchmark files to their respective subdirectories - [ ] Update all cross-file imports (e.g., `_session_bench_common`) to reflect new paths - [ ] Verify ASV can discover and run all benchmarks after reorganization (`asv run` or `nox -e benchmarks`) - [ ] Update `asv.conf.json` (or equivalent) if benchmark discovery paths need adjustment - [ ] Update any CI configuration that references specific benchmark file paths - [ ] Confirm all nox sessions pass after the reorganization ## Definition of Done - [ ] All benchmark files are organized into domain-aligned subdirectories mirroring the `src/cleveragents/` module structure - [ ] Each subdirectory contains an `__init__.py` for ASV discovery - [ ] No benchmark files remain in the top-level `benchmarks/` directory except `__init__.py` and `asv.conf.json` (if present) - [ ] All cross-file imports are updated and resolve correctly - [ ] `asv run` (or equivalent nox session) discovers and executes all benchmarks without errors - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Unknown | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-03 03:02:45 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Backlog (confirmed)
  • Milestone: v3.8.0 (confirmed — test infrastructure)
  • MoSCoW: Could Have — Reorganizing 223 ASV benchmark files into subdirectories is a pure refactoring task with no behavioral change. This is the third in a set of test directory reorganization issues (#2021 for Behave, #2028 for Robot, now this for ASV). All three are Could Have and should be done together if at all. Improves maintainability but is not urgent.
  • Parent Epic: #1678 (confirmed correct)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Backlog (confirmed) - **Milestone**: v3.8.0 (confirmed — test infrastructure) - **MoSCoW**: Could Have — Reorganizing 223 ASV benchmark files into subdirectories is a pure refactoring task with no behavioral change. This is the third in a set of test directory reorganization issues (#2021 for Behave, #2028 for Robot, now this for ASV). All three are Could Have and should be done together if at all. Improves maintainability but is not urgent. - **Parent Epic**: #1678 (confirmed correct) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#2034
No description provided.