perf(tests): replace subprocess-per-feature parallelism with in-process execution #481

Closed
opened 2026-03-01 01:26:51 +00:00 by freemo · 0 comments
Owner

Metadata

  • Commit Message: perf(tests): replace behave-parallel subprocess model with in-process parallelism
  • Branch: perf/in-process-parallel-behave

Background and Context

Part of #478.

The current behave-parallel implementation (defined inline in noxfile.py) spawns 339 independent Python subprocesses — one per .feature file — via concurrent.futures.ThreadPoolExecutor calling subprocess.run(). Each subprocess:

  1. Starts a new Python interpreter (~200-500ms overhead)
  2. Re-imports the entire cleveragents package and all step definitions (~500ms-2s)
  3. Re-initializes database connections, DI containers, and any other setup
  4. Runs the feature's scenarios
  5. Exits and returns summary text via stdout

With 32 worker threads, this means the system is constantly paying interpreter startup costs. For the 141 features that complete in under 0.1s of actual test time, the subprocess overhead dominates.

Current Architecture (in noxfile.py)

# noxfile.py lines 196-202
def _run_feature(base_args, feature):
    proc = subprocess.run([*base_args, feature], capture_output=True, text=True)
    ...

Each call spawns python -m behave <feature> as a full subprocess.

Target Architecture

Replace with an in-process parallel runner that:

  • Loads step definitions and environment hooks once in the parent process
  • Distributes feature files to worker threads/processes that share the loaded module state
  • Avoids 339 Python interpreter startups entirely

Acceptance Criteria

  • behave-parallel no longer spawns subprocess per feature file
  • Feature files execute in-process using shared step definitions
  • Wall-clock time for nox -e unit_tests is reduced by at least 30% from subprocess elimination alone
  • All 339 features pass with identical behavior
  • Coverage remains at or above 97%
  • The --processes flag continues to work for controlling parallelism

Subtasks

Research Phase

  • Measure current per-subprocess overhead: time the difference between an empty .feature file run via subprocess vs in-process behave API
  • Evaluate behave's Python API (behave.runner.Runner) for in-process invocation — determine if Runner can be instantiated multiple times in the same process
  • Evaluate whether multiprocessing.Pool with fork start method can share loaded modules without re-import (requires testing on Linux)
  • Evaluate pytest-bdd as a potential alternative that natively supports pytest-xdist for in-process parallel execution (would require step definition migration)
  • Evaluate thread safety of behave's Runner — determine if features can be run in parallel threads within a single process

Implementation Phase

  • Implement BehaveInProcessRunner class that accepts a list of feature paths and runs them via behave's Python API
  • Implement worker pool using multiprocessing.Pool with fork to share loaded modules across workers
  • Update _run_feature() in noxfile.py to use in-process runner instead of subprocess.run()
  • Preserve the summary aggregation logic (_merge_summaries, _print_overall_summary)
  • Ensure coverage integration still works (BEHAVE_PARALLEL_COVERAGE flag) — coverage.py supports multiprocessing natively via coverage.process_startup()
  • Update behave-parallel CLI wrapper to support both subprocess and in-process modes via --mode=subprocess|inprocess flag
  • Handle worker isolation: ensure each worker gets its own database connection and temporary directory

Verification Phase

  • Run nox -e unit_tests and confirm all 339 features pass
  • Run nox -e coverage_report and confirm coverage >= 97%
  • Measure wall-clock improvement from subprocess elimination alone (before other optimizations)
  • Verify no test pollution between features running in same process (scenario isolation)

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `perf(tests): replace behave-parallel subprocess model with in-process parallelism` - **Branch**: `perf/in-process-parallel-behave` ## Background and Context Part of #478. The current `behave-parallel` implementation (defined inline in `noxfile.py`) spawns **339 independent Python subprocesses** — one per `.feature` file — via `concurrent.futures.ThreadPoolExecutor` calling `subprocess.run()`. Each subprocess: 1. Starts a new Python interpreter (~200-500ms overhead) 2. Re-imports the entire `cleveragents` package and all step definitions (~500ms-2s) 3. Re-initializes database connections, DI containers, and any other setup 4. Runs the feature's scenarios 5. Exits and returns summary text via stdout With 32 worker threads, this means the system is constantly paying interpreter startup costs. For the 141 features that complete in under 0.1s of actual test time, the subprocess overhead dominates. ### Current Architecture (in `noxfile.py`) ```python # noxfile.py lines 196-202 def _run_feature(base_args, feature): proc = subprocess.run([*base_args, feature], capture_output=True, text=True) ... ``` Each call spawns `python -m behave <feature>` as a full subprocess. ### Target Architecture Replace with an in-process parallel runner that: - Loads step definitions and environment hooks **once** in the parent process - Distributes feature files to worker threads/processes that share the loaded module state - Avoids 339 Python interpreter startups entirely ## Acceptance Criteria - [ ] behave-parallel no longer spawns subprocess per feature file - [ ] Feature files execute in-process using shared step definitions - [ ] Wall-clock time for `nox -e unit_tests` is reduced by at least 30% from subprocess elimination alone - [ ] All 339 features pass with identical behavior - [ ] Coverage remains at or above 97% - [ ] The `--processes` flag continues to work for controlling parallelism ## Subtasks ### Research Phase - [ ] Measure current per-subprocess overhead: time the difference between an empty `.feature` file run via subprocess vs in-process behave API - [ ] Evaluate `behave`'s Python API (`behave.runner.Runner`) for in-process invocation — determine if `Runner` can be instantiated multiple times in the same process - [ ] Evaluate whether `multiprocessing.Pool` with `fork` start method can share loaded modules without re-import (requires testing on Linux) - [ ] Evaluate `pytest-bdd` as a potential alternative that natively supports `pytest-xdist` for in-process parallel execution (would require step definition migration) - [ ] Evaluate thread safety of behave's `Runner` — determine if features can be run in parallel threads within a single process ### Implementation Phase - [ ] Implement `BehaveInProcessRunner` class that accepts a list of feature paths and runs them via behave's Python API - [ ] Implement worker pool using `multiprocessing.Pool` with `fork` to share loaded modules across workers - [ ] Update `_run_feature()` in `noxfile.py` to use in-process runner instead of `subprocess.run()` - [ ] Preserve the summary aggregation logic (`_merge_summaries`, `_print_overall_summary`) - [ ] Ensure coverage integration still works (`BEHAVE_PARALLEL_COVERAGE` flag) — coverage.py supports `multiprocessing` natively via `coverage.process_startup()` - [ ] Update `behave-parallel` CLI wrapper to support both subprocess and in-process modes via `--mode=subprocess|inprocess` flag - [ ] Handle worker isolation: ensure each worker gets its own database connection and temporary directory ### Verification Phase - [ ] Run `nox -e unit_tests` and confirm all 339 features pass - [ ] Run `nox -e coverage_report` and confirm coverage >= 97% - [ ] Measure wall-clock improvement from subprocess elimination alone (before other optimizations) - [ ] Verify no test pollution between features running in same process (scenario isolation) ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.2.0 milestone 2026-03-02 01:45:02 +00:00
freemo added reference perf/bdd-test-optimization 2026-03-02 01:46:39 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#481
No description provided.