perf(tests): reduce BDD test suite and coverage report runtime by 90%+ #493

2026-03-02T01:43:09Z

freemo commented

2026-03-02 01:43:09 +00:00

Summary

Consolidates 5 stacked PRs (#484, #487, #488, #489, #490) into a single clean branch with one commit per issue, incorporating all code review fixes. No merge commits — cherry-pick only.

Epic: #478 — Reduce BDD test suite runtime from 26 minutes to under 3 minutes

Results

Session	Before	After	Reduction
`nox -s unit_tests`	26 min	2 min 05s	92%
`nox -s coverage_report`	75 min	5 min	93%
Coverage	97%	98%	+1pp

Commits (1 per issue)

1. `perf(tests): optimize coverage instrumentation and reporting pipeline`

Closes #482 — Replace coverage.py (sys.settrace) with slipcover (bytecode instrumentation). Workers produce per-feature JSON; slipcover --merge combines them. CI workflows handle both output formats defensively.

2. `perf(tests): reduce per-feature startup cost with shared fixtures and lazy imports`

Closes #483 — New scripts/create_template_db.py pre-creates a fully-migrated SQLite template. environment.py monkey-patches MigrationRunner.init_or_upgrade to copy the template (~1ms) instead of running 25 Alembic migrations (~0.5-3s) per scenario.

3. `perf(tests): optimize the 8 slowest BDD feature files`

Closes #479 — Added @mock_only tag support, lightweight in-memory plan service for 14 actor-resolution scenarios, extracted helper functions in services_coverage steps. The 8 features dropped from 113-248s to 0.7-31.6s each.

4. `perf(tests): optimize medium-slow BDD features (10-100s tier)`

Closes #480 — Global time.sleep/asyncio.sleep cap at 10ms, subprocess.run replaced with CliRunner, in-memory SQLite defaults, _original_sleep escapes for timing-sensitive tests. 20 features dropped from 10-65s to 0.02-3.4s each.

5. `perf(tests): replace behave-parallel subprocess model with in-process parallelism`

Closes #481 — Replace 342 subprocess spawns with behave Runner API. Sequential mode for coverage, multiprocessing.Pool with fork for parallel. Removes PyPI tarball download, per-worker subprocess overhead, and regex-based summary parsing.

Review fixes applied (vs original PRs)

All commits use proper ISSUES CLOSED: #NNN footer format per CONTRIBUTING.md
Each commit includes its own CHANGELOG.md entry under Unreleased
Removed debug comment (# DEBUG: trace which calls reach the patch) from features/environment.py
Added _original_sleep escape in features/steps/subplan_execution_steps.py for timeout/delay tests broken by the global sleep cap

Quality

All nox sessions pass on each commit: lint, format, typecheck, security_scan, dead_code, unit_tests, integration_tests, build, docs
coverage_report verified on final commit: 98% (above 97% threshold)
No scenarios removed — all existing behavior preserved

## Summary Consolidates 5 stacked PRs (#484, #487, #488, #489, #490) into a single clean branch with one commit per issue, incorporating all code review fixes. No merge commits — cherry-pick only. **Epic**: #478 — Reduce BDD test suite runtime from 26 minutes to under 3 minutes ### Results | Session | Before | After | Reduction | |---------|--------|-------|-----------| | `nox -s unit_tests` | 26 min | 2 min 05s | **92%** | | `nox -s coverage_report` | 75 min | 5 min | **93%** | | Coverage | 97% | **98%** | +1pp | ## Commits (1 per issue) ### 1. `perf(tests): optimize coverage instrumentation and reporting pipeline` **Closes #482** — Replace coverage.py (sys.settrace) with slipcover (bytecode instrumentation). Workers produce per-feature JSON; slipcover `--merge` combines them. CI workflows handle both output formats defensively. ### 2. `perf(tests): reduce per-feature startup cost with shared fixtures and lazy imports` **Closes #483** — New `scripts/create_template_db.py` pre-creates a fully-migrated SQLite template. `environment.py` monkey-patches `MigrationRunner.init_or_upgrade` to copy the template (~1ms) instead of running 25 Alembic migrations (~0.5-3s) per scenario. ### 3. `perf(tests): optimize the 8 slowest BDD feature files` **Closes #479** — Added `@mock_only` tag support, lightweight in-memory plan service for 14 actor-resolution scenarios, extracted helper functions in services_coverage steps. The 8 features dropped from 113-248s to 0.7-31.6s each. ### 4. `perf(tests): optimize medium-slow BDD features (10-100s tier)` **Closes #480** — Global `time.sleep`/`asyncio.sleep` cap at 10ms, subprocess.run replaced with CliRunner, in-memory SQLite defaults, `_original_sleep` escapes for timing-sensitive tests. 20 features dropped from 10-65s to 0.02-3.4s each. ### 5. `perf(tests): replace behave-parallel subprocess model with in-process parallelism` **Closes #481** — Replace 342 subprocess spawns with behave `Runner` API. Sequential mode for coverage, `multiprocessing.Pool` with fork for parallel. Removes PyPI tarball download, per-worker subprocess overhead, and regex-based summary parsing. ## Review fixes applied (vs original PRs) - All commits use proper `ISSUES CLOSED: #NNN` footer format per CONTRIBUTING.md - Each commit includes its own CHANGELOG.md entry under Unreleased - Removed debug comment (`# DEBUG: trace which calls reach the patch`) from `features/environment.py` - Added `_original_sleep` escape in `features/steps/subplan_execution_steps.py` for timeout/delay tests broken by the global sleep cap ## Quality - All nox sessions pass on each commit: lint, format, typecheck, security_scan, dead_code, unit_tests, integration_tests, build, docs - `coverage_report` verified on final commit: **98%** (above 97% threshold) - No scenarios removed — all existing behavior preserved

freemo added this to the v3.2.0 milestone 2026-03-02 01:43:16 +00:00

freemo added the

Type

Task

label 2026-03-02 01:43:24 +00:00

freemo referenced this pull request

2026-03-02 01:45:22 +00:00

perf(tests): optimize coverage instrumentation and reporting pipeline #484

freemo referenced this pull request

2026-03-02 01:45:24 +00:00

perf(tests): reduce per-feature startup cost with shared fixtures and lazy imports #487

freemo referenced this pull request