[AUTO-INF-4] Harden Behave temp DB paths and cap default parallelism #9686

Open
opened 2026-04-15 03:13:30 +00:00 by HAL9000 · 0 comments
Owner

"## Summary\n- Replace unsafe temp database naming in Behave hooks to eliminate race-induced SQLite contention and data leakage between parallel workers.\n- Cap default Behave/Robot worker counts at the CONTRIBUTING-mandated 32 processes to avoid flake-inducing resource exhaustion on large runners.\n\n## Findings\n1. Temp DB path reuse risk in Behave hooks\n - features/environment.py seeds CLEVERAGENTS_DATABASE_URL and CLEVERAGENTS_TEST_DATABASE_URL via tempfile.mktemp(...) in both before_all and before_scenario (e.g. lines 120-170).\n - mktemp only returns a candidate path; it does not create the file. With 32+ workers (our CI default), two scenarios can receive the same path before either copies the template DB, producing sporadic sqlite3.OperationalError: database is locked or, worse, state leakage when one run removes the other's file. Python docs explicitly warn that mktemp is insecure and racy under parallel load.\n - Because the hook copies a shared template into that path later, an occasional collision causes the copy to succeed but the second worker ends up reading the first worker's fixture data, triggering non-deterministic step failures across many features.\n\n2. Parallel worker oversubscription\n - _default_processes() in noxfile.py returns the raw CPU affinity count. On 64-core Forgejo runners this drives Behave/Robot to spawn 64 processes—even though CONTRIBUTING mandates 32. The same helper is reused by _behave_parallel_args and _pabot_parallel_args, so both test suites oversubscribe.\n - SQLite-backed fixtures (even with the template optimization) still hold write locks per scenario. Oversubscription past ~32 processes spikes lock contention and I/O pressure, producing intermittent hangs/timeouts like the ones tracked in #9456 (nox unit_tests timeouts). Right-sizing the default worker pool avoids the thundering herd that intermittently destabilizes CI.\n\n## Recommendations\n- Replace tempfile.mktemp usage in features/environment.py with NamedTemporaryFile(delete=False) or mkstemp, writing the generated file paths into the env vars and registering cleanup handlers so each scenario gets a guaranteed-unique, pre-created file.\n- Add a max_workers = 32 clamp inside _default_processes() and reuse it in _behave_parallel_args / _pabot_parallel_args so that defaults never exceed the documented limit (while still allowing explicit overrides via TEST_PROCESSES or CLI flags).\n- Add regression coverage that runs unit_tests / integration_tests with the default settings and asserts the effective process count is 32, keeping the contribution guidelines and runtime behavior aligned.\n\n### Acceptance Criteria\n- [ ] Behave hooks create temp SQLite files using a race-free API and tests confirm database isolation even under 32-way parallelism.\n- [ ] Default unit/integration test process counts clamp to 32 while honoring explicit overrides.\n- [ ] CI documentation and CONTRIBUTING references updated if any invocation syntax changes.\n\n### Duplicate Check\n- Open issues: reviewed pages 1-3 (50 issues per page) and found no matches for "tempfile", "mktemp", or "parallel worker flake".\n- Cross-area: inspected AUTO-INF issues such as #9596 and confirmed they focus on fixture sharing, not temp DB races or worker caps.\n- Closed issues: scanned the latest 50 entries; none address Behave temp DB naming or parallel worker limits.\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Test Infrastructure Pool | Agent: test-infra-worker\n"

"## Summary\n- Replace unsafe temp database naming in Behave hooks to eliminate race-induced SQLite contention and data leakage between parallel workers.\n- Cap default Behave/Robot worker counts at the CONTRIBUTING-mandated 32 processes to avoid flake-inducing resource exhaustion on large runners.\n\n## Findings\n1. **Temp DB path reuse risk in Behave hooks**\n - `features/environment.py` seeds `CLEVERAGENTS_DATABASE_URL` and `CLEVERAGENTS_TEST_DATABASE_URL` via `tempfile.mktemp(...)` in both `before_all` and `before_scenario` (e.g. lines 120-170).\n - `mktemp` only returns a candidate path; it does not create the file. With 32+ workers (our CI default), two scenarios can receive the same path before either copies the template DB, producing sporadic `sqlite3.OperationalError: database is locked` or, worse, state leakage when one run removes the other's file. Python docs explicitly warn that `mktemp` is insecure and racy under parallel load.\n - Because the hook copies a shared template into that path later, an occasional collision causes the copy to succeed but the second worker ends up reading the first worker's fixture data, triggering non-deterministic step failures across many features.\n\n2. **Parallel worker oversubscription**\n - `_default_processes()` in `noxfile.py` returns the raw CPU affinity count. On 64-core Forgejo runners this drives Behave/Robot to spawn 64 processes—even though CONTRIBUTING mandates 32. The same helper is reused by `_behave_parallel_args` and `_pabot_parallel_args`, so both test suites oversubscribe.\n - SQLite-backed fixtures (even with the template optimization) still hold write locks per scenario. Oversubscription past ~32 processes spikes lock contention and I/O pressure, producing intermittent hangs/timeouts like the ones tracked in #9456 (`nox unit_tests` timeouts). Right-sizing the default worker pool avoids the thundering herd that intermittently destabilizes CI.\n\n## Recommendations\n- Replace `tempfile.mktemp` usage in `features/environment.py` with `NamedTemporaryFile(delete=False)` or `mkstemp`, writing the generated file paths into the env vars and registering cleanup handlers so each scenario gets a guaranteed-unique, pre-created file.\n- Add a `max_workers = 32` clamp inside `_default_processes()` and reuse it in `_behave_parallel_args` / `_pabot_parallel_args` so that defaults never exceed the documented limit (while still allowing explicit overrides via `TEST_PROCESSES` or CLI flags).\n- Add regression coverage that runs `unit_tests` / `integration_tests` with the default settings and asserts the effective process count is 32, keeping the contribution guidelines and runtime behavior aligned.\n\n### Acceptance Criteria\n- [ ] Behave hooks create temp SQLite files using a race-free API and tests confirm database isolation even under 32-way parallelism.\n- [ ] Default unit/integration test process counts clamp to 32 while honoring explicit overrides.\n- [ ] CI documentation and CONTRIBUTING references updated if any invocation syntax changes.\n\n### Duplicate Check\n- Open issues: reviewed pages 1-3 (50 issues per page) and found no matches for \"tempfile\", \"mktemp\", or \"parallel worker flake\".\n- Cross-area: inspected AUTO-INF issues such as #9596 and confirmed they focus on fixture sharing, not temp DB races or worker caps.\n- Closed issues: scanned the latest 50 entries; none address Behave temp DB naming or parallel worker limits.\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Test Infrastructure Pool | Agent: test-infra-worker\n"
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9686
No description provided.