perf(tests): reduce BDD test suite and coverage report runtime by 90%+ #493

Merged
freemo merged 5 commits from perf/bdd-test-optimization into master 2026-03-02 02:24:47 +00:00
Owner

Summary

Consolidates 5 stacked PRs (#484, #487, #488, #489, #490) into a single clean branch with one commit per issue, incorporating all code review fixes. No merge commits — cherry-pick only.

Epic: #478 — Reduce BDD test suite runtime from 26 minutes to under 3 minutes

Results

Session Before After Reduction
nox -s unit_tests 26 min 2 min 05s 92%
nox -s coverage_report 75 min 5 min 93%
Coverage 97% 98% +1pp

Commits (1 per issue)

1. perf(tests): optimize coverage instrumentation and reporting pipeline

Closes #482 — Replace coverage.py (sys.settrace) with slipcover (bytecode instrumentation). Workers produce per-feature JSON; slipcover --merge combines them. CI workflows handle both output formats defensively.

2. perf(tests): reduce per-feature startup cost with shared fixtures and lazy imports

Closes #483 — New scripts/create_template_db.py pre-creates a fully-migrated SQLite template. environment.py monkey-patches MigrationRunner.init_or_upgrade to copy the template (~1ms) instead of running 25 Alembic migrations (~0.5-3s) per scenario.

3. perf(tests): optimize the 8 slowest BDD feature files

Closes #479 — Added @mock_only tag support, lightweight in-memory plan service for 14 actor-resolution scenarios, extracted helper functions in services_coverage steps. The 8 features dropped from 113-248s to 0.7-31.6s each.

4. perf(tests): optimize medium-slow BDD features (10-100s tier)

Closes #480 — Global time.sleep/asyncio.sleep cap at 10ms, subprocess.run replaced with CliRunner, in-memory SQLite defaults, _original_sleep escapes for timing-sensitive tests. 20 features dropped from 10-65s to 0.02-3.4s each.

5. perf(tests): replace behave-parallel subprocess model with in-process parallelism

Closes #481 — Replace 342 subprocess spawns with behave Runner API. Sequential mode for coverage, multiprocessing.Pool with fork for parallel. Removes PyPI tarball download, per-worker subprocess overhead, and regex-based summary parsing.

Review fixes applied (vs original PRs)

  • All commits use proper ISSUES CLOSED: #NNN footer format per CONTRIBUTING.md
  • Each commit includes its own CHANGELOG.md entry under Unreleased
  • Removed debug comment (# DEBUG: trace which calls reach the patch) from features/environment.py
  • Added _original_sleep escape in features/steps/subplan_execution_steps.py for timeout/delay tests broken by the global sleep cap

Quality

  • All nox sessions pass on each commit: lint, format, typecheck, security_scan, dead_code, unit_tests, integration_tests, build, docs
  • coverage_report verified on final commit: 98% (above 97% threshold)
  • No scenarios removed — all existing behavior preserved
## Summary Consolidates 5 stacked PRs (#484, #487, #488, #489, #490) into a single clean branch with one commit per issue, incorporating all code review fixes. No merge commits — cherry-pick only. **Epic**: #478 — Reduce BDD test suite runtime from 26 minutes to under 3 minutes ### Results | Session | Before | After | Reduction | |---------|--------|-------|-----------| | `nox -s unit_tests` | 26 min | 2 min 05s | **92%** | | `nox -s coverage_report` | 75 min | 5 min | **93%** | | Coverage | 97% | **98%** | +1pp | ## Commits (1 per issue) ### 1. `perf(tests): optimize coverage instrumentation and reporting pipeline` **Closes #482** — Replace coverage.py (sys.settrace) with slipcover (bytecode instrumentation). Workers produce per-feature JSON; slipcover `--merge` combines them. CI workflows handle both output formats defensively. ### 2. `perf(tests): reduce per-feature startup cost with shared fixtures and lazy imports` **Closes #483** — New `scripts/create_template_db.py` pre-creates a fully-migrated SQLite template. `environment.py` monkey-patches `MigrationRunner.init_or_upgrade` to copy the template (~1ms) instead of running 25 Alembic migrations (~0.5-3s) per scenario. ### 3. `perf(tests): optimize the 8 slowest BDD feature files` **Closes #479** — Added `@mock_only` tag support, lightweight in-memory plan service for 14 actor-resolution scenarios, extracted helper functions in services_coverage steps. The 8 features dropped from 113-248s to 0.7-31.6s each. ### 4. `perf(tests): optimize medium-slow BDD features (10-100s tier)` **Closes #480** — Global `time.sleep`/`asyncio.sleep` cap at 10ms, subprocess.run replaced with CliRunner, in-memory SQLite defaults, `_original_sleep` escapes for timing-sensitive tests. 20 features dropped from 10-65s to 0.02-3.4s each. ### 5. `perf(tests): replace behave-parallel subprocess model with in-process parallelism` **Closes #481** — Replace 342 subprocess spawns with behave `Runner` API. Sequential mode for coverage, `multiprocessing.Pool` with fork for parallel. Removes PyPI tarball download, per-worker subprocess overhead, and regex-based summary parsing. ## Review fixes applied (vs original PRs) - All commits use proper `ISSUES CLOSED: #NNN` footer format per CONTRIBUTING.md - Each commit includes its own CHANGELOG.md entry under Unreleased - Removed debug comment (`# DEBUG: trace which calls reach the patch`) from `features/environment.py` - Added `_original_sleep` escape in `features/steps/subplan_execution_steps.py` for timeout/delay tests broken by the global sleep cap ## Quality - All nox sessions pass on each commit: lint, format, typecheck, security_scan, dead_code, unit_tests, integration_tests, build, docs - `coverage_report` verified on final commit: **98%** (above 97% threshold) - No scenarios removed — all existing behavior preserved
freemo added this to the v3.2.0 milestone 2026-03-02 01:43:16 +00:00
freemo force-pushed perf/bdd-test-optimization from b3c47a9ef9
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 34s
CI / unit_tests (pull_request) Successful in 2m56s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m52s
CI / coverage (pull_request) Successful in 3m30s
CI / benchmark-regression (pull_request) Has been cancelled
to f26fcfc44e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 33s
CI / unit_tests (pull_request) Successful in 2m11s
CI / integration_tests (pull_request) Successful in 2m48s
CI / docker (pull_request) Successful in 40s
CI / coverage (pull_request) Successful in 3m33s
CI / benchmark-regression (pull_request) Successful in 22m39s
CI / lint (push) Successful in 12s
CI / build (push) Successful in 14s
CI / quality (push) Successful in 17s
CI / security (push) Successful in 29s
CI / typecheck (push) Successful in 30s
CI / benchmark-regression (push) Has been skipped
CI / unit_tests (push) Successful in 2m16s
CI / integration_tests (push) Successful in 2m50s
CI / docker (push) Successful in 39s
CI / coverage (push) Successful in 4m22s
CI / benchmark-publish (push) Successful in 13m2s
2026-03-02 02:01:29 +00:00
Compare
freemo scheduled this pull request to auto merge when all checks succeed 2026-03-02 02:01:43 +00:00
freemo self-assigned this 2026-03-02 02:03:24 +00:00
freemo merged commit f26fcfc44e into master 2026-03-02 02:24:47 +00:00
freemo deleted branch perf/bdd-test-optimization 2026-03-02 02:24:48 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!493
No description provided.