perf(tests): profile and optimize the 8 slowest feature files (>100s each) #479

Closed
opened 2026-03-01 01:25:57 +00:00 by freemo · 0 comments
Owner

Metadata

  • Commit Message: perf(tests): optimize the 8 slowest BDD feature files
  • Branch: perf/optimize-slowest-features

Background and Context

Part of #478.

The 8 slowest feature files account for 64% of total BDD test runtime (1,505s of 2,352s). Each of these features takes between 113 and 248 seconds to execute — far longer than any reasonable unit test. The root causes are likely: repeated database setup/teardown per scenario, subprocess invocations (CLI runner calls), heavy I/O operations, redundant imports, or scenarios that effectively duplicate integration-level testing.

Target Features

# Feature File Runtime (s) Scenarios Likely Bottleneck
1 cli_plan_context_commands.feature 248.0 CLI subprocess invocations Plan+context CLI commands spawning full CLI processes per scenario
2 services_coverage.feature 245.3 Service-layer coverage Broad service layer setup/teardown per scenario
3 context_service.feature 214.6 Context service operations Database and filesystem I/O per scenario
4 plan_service.feature 214.5 Plan service operations Database setup, plan lifecycle operations
5 cli_streaming.feature 212.8 CLI streaming CLI subprocess + streaming output capture
6 project_service.feature 140.1 Project service operations Database and project setup per scenario
7 plan_commands_coverage.feature 115.9 Plan CLI commands CLI subprocess invocations for plan commands
8 core_cli_commands.feature 113.3 Core CLI commands CLI subprocess invocations for core commands

Target

Reduce each of these features to under 10 seconds (from 100-248s), achieving a ~95% reduction in this tier alone.

Acceptance Criteria

  • Each of the 8 target features completes in under 10 seconds
  • No scenarios are removed — all existing behavior is preserved
  • All tests continue to pass via nox -e unit_tests
  • Coverage remains at or above 97% via nox -e coverage_report

Subtasks

Investigation Phase

  • Profile cli_plan_context_commands.feature (248s): count scenarios, measure per-scenario time, identify if CLI subprocesses or database setup are the bottleneck
  • Profile services_coverage.feature (245s): identify which service-layer step definitions trigger expensive operations (DB, filesystem, network)
  • Profile context_service.feature (215s): measure database connection and teardown overhead per scenario
  • Profile plan_service.feature (215s): identify whether plan lifecycle transitions involve real DB writes or subprocess calls
  • Profile cli_streaming.feature (213s): determine if streaming tests spawn real CLI subprocesses or use in-process invocation
  • Profile project_service.feature (140s): check for redundant project creation/deletion per scenario
  • Profile plan_commands_coverage.feature (116s): count CLI subprocess invocations per scenario
  • Profile core_cli_commands.feature (113s): count CLI subprocess invocations per scenario

Optimization Phase

  • Replace CLI subprocess invocations with CliRunner.invoke() (Click's in-process test runner) where features spawn subprocess.run(["agents", ...]) — eliminates Python interpreter startup per invocation
  • Replace per-scenario database setup/teardown with shared fixtures using Behave's before_feature/after_feature hooks where scenarios are read-only or can use transaction rollback
  • Eliminate redundant import chains by ensuring step definitions use module-level imports only once
  • Replace filesystem I/O with in-memory or tmp_path-style ephemeral fixtures
  • Refactor services_coverage.feature to use mock/stub dependencies instead of full service stack initialization per scenario
  • Ensure cli_streaming.feature tests streaming behavior via in-process pipe mocking rather than spawning real CLI processes
  • Consolidate scenarios in plan_commands_coverage.feature and core_cli_commands.feature that test similar CLI paths into parameterized Scenario Outlines

Verification Phase

  • Run nox -e unit_tests and confirm all 339 features pass
  • Run nox -e coverage_report and confirm coverage >= 97%
  • Record new per-feature timing for each of the 8 features and verify each is under 10s

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `perf(tests): optimize the 8 slowest BDD feature files` - **Branch**: `perf/optimize-slowest-features` ## Background and Context Part of #478. The 8 slowest feature files account for **64% of total BDD test runtime** (1,505s of 2,352s). Each of these features takes between 113 and 248 seconds to execute — far longer than any reasonable unit test. The root causes are likely: repeated database setup/teardown per scenario, subprocess invocations (CLI runner calls), heavy I/O operations, redundant imports, or scenarios that effectively duplicate integration-level testing. ### Target Features | # | Feature File | Runtime (s) | Scenarios | Likely Bottleneck | |---|---|---|---|---| | 1 | `cli_plan_context_commands.feature` | 248.0 | CLI subprocess invocations | Plan+context CLI commands spawning full CLI processes per scenario | | 2 | `services_coverage.feature` | 245.3 | Service-layer coverage | Broad service layer setup/teardown per scenario | | 3 | `context_service.feature` | 214.6 | Context service operations | Database and filesystem I/O per scenario | | 4 | `plan_service.feature` | 214.5 | Plan service operations | Database setup, plan lifecycle operations | | 5 | `cli_streaming.feature` | 212.8 | CLI streaming | CLI subprocess + streaming output capture | | 6 | `project_service.feature` | 140.1 | Project service operations | Database and project setup per scenario | | 7 | `plan_commands_coverage.feature` | 115.9 | Plan CLI commands | CLI subprocess invocations for plan commands | | 8 | `core_cli_commands.feature` | 113.3 | Core CLI commands | CLI subprocess invocations for core commands | ### Target Reduce each of these features to **under 10 seconds** (from 100-248s), achieving a ~95% reduction in this tier alone. ## Acceptance Criteria - [ ] Each of the 8 target features completes in under 10 seconds - [ ] No scenarios are removed — all existing behavior is preserved - [ ] All tests continue to pass via `nox -e unit_tests` - [ ] Coverage remains at or above 97% via `nox -e coverage_report` ## Subtasks ### Investigation Phase - [ ] Profile `cli_plan_context_commands.feature` (248s): count scenarios, measure per-scenario time, identify if CLI subprocesses or database setup are the bottleneck - [ ] Profile `services_coverage.feature` (245s): identify which service-layer step definitions trigger expensive operations (DB, filesystem, network) - [ ] Profile `context_service.feature` (215s): measure database connection and teardown overhead per scenario - [ ] Profile `plan_service.feature` (215s): identify whether plan lifecycle transitions involve real DB writes or subprocess calls - [ ] Profile `cli_streaming.feature` (213s): determine if streaming tests spawn real CLI subprocesses or use in-process invocation - [ ] Profile `project_service.feature` (140s): check for redundant project creation/deletion per scenario - [ ] Profile `plan_commands_coverage.feature` (116s): count CLI subprocess invocations per scenario - [ ] Profile `core_cli_commands.feature` (113s): count CLI subprocess invocations per scenario ### Optimization Phase - [ ] Replace CLI subprocess invocations with `CliRunner.invoke()` (Click's in-process test runner) where features spawn `subprocess.run(["agents", ...])` — eliminates Python interpreter startup per invocation - [ ] Replace per-scenario database setup/teardown with shared fixtures using Behave's `before_feature`/`after_feature` hooks where scenarios are read-only or can use transaction rollback - [ ] Eliminate redundant import chains by ensuring step definitions use module-level imports only once - [ ] Replace filesystem I/O with in-memory or `tmp_path`-style ephemeral fixtures - [ ] Refactor `services_coverage.feature` to use mock/stub dependencies instead of full service stack initialization per scenario - [ ] Ensure `cli_streaming.feature` tests streaming behavior via in-process pipe mocking rather than spawning real CLI processes - [ ] Consolidate scenarios in `plan_commands_coverage.feature` and `core_cli_commands.feature` that test similar CLI paths into parameterized Scenario Outlines ### Verification Phase - [ ] Run `nox -e unit_tests` and confirm all 339 features pass - [ ] Run `nox -e coverage_report` and confirm coverage >= 97% - [ ] Record new per-feature timing for each of the 8 features and verify each is under 10s ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.2.0 milestone 2026-03-02 01:45:01 +00:00
freemo added reference perf/bdd-test-optimization 2026-03-02 01:46:38 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#479
No description provided.