Racing condition in integration test for M4 validation #563

Closed
opened 2026-03-04 13:18:50 +00:00 by hurui200320 · 3 comments
Member

Metadata

  • Commit Message: fix(test): resolve race condition in M4 validation integration test
  • Branch: fix/m4-validation-race-condition

Background and Context

The M4 validation integration test occasionally fails due to a racing condition. This appears to be a timing issue in the parallel test execution where database state from one test leaks into another.

Current Behavior

The M4 validation integration test intermittently fails with inconsistent state errors during parallel execution.

Expected Behavior

The M4 validation integration test should pass reliably in both sequential and parallel execution modes.

Acceptance Criteria

  • M4 validation integration test passes reliably in sequential mode
  • M4 validation integration test passes reliably in parallel mode (pabot)
  • No test isolation issues between concurrent test runs
  • Full nox suite passes without flaky failures

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Subtasks

  • Code: Identify the specific race condition in M4 validation integration test
  • Code: Add proper test isolation (per-scenario temp database or locking)
  • Code: Ensure database state cleanup between test runs
  • Docs: Document the test isolation approach in test infrastructure docs
  • Tests (Behave): Verify existing Behave tests are not affected by the fix
  • Tests (Robot): Fix the failing Robot integration test and add a regression guard
  • Tests (ASV): Verify benchmark tests are not affected by isolation changes
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage.
  • Quality: Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base.
## Metadata - **Commit Message**: `fix(test): resolve race condition in M4 validation integration test` - **Branch**: `fix/m4-validation-race-condition` ## Background and Context The M4 validation integration test occasionally fails due to a racing condition. This appears to be a timing issue in the parallel test execution where database state from one test leaks into another. ## Current Behavior The M4 validation integration test intermittently fails with inconsistent state errors during parallel execution. ## Expected Behavior The M4 validation integration test should pass reliably in both sequential and parallel execution modes. ## Acceptance Criteria - [ ] M4 validation integration test passes reliably in sequential mode - [ ] M4 validation integration test passes reliably in parallel mode (pabot) - [ ] No test isolation issues between concurrent test runs - [ ] Full nox suite passes without flaky failures ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [ ] Code: Identify the specific race condition in M4 validation integration test - [ ] Code: Add proper test isolation (per-scenario temp database or locking) - [ ] Code: Ensure database state cleanup between test runs - [ ] Docs: Document the test isolation approach in test infrastructure docs - [ ] Tests (Behave): Verify existing Behave tests are not affected by the fix - [ ] Tests (Robot): Fix the failing Robot integration test and add a regression guard - [ ] Tests (ASV): Verify benchmark tests are not affected by isolation changes - [ ] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. - [ ] Quality: Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base.
freemo added this to the v3.3.0 milestone 2026-03-05 00:29:34 +00:00
Member

Implementation Summary

Root Cause

The M4 validation Robot Framework integration tests experienced intermittent race conditions during parallel execution via pabot. Three contributing factors were identified:

  1. Shared database URL: common.resource did not set CLEVERAGENTS_DATABASE_URL, so all pabot workers fell back to the default sqlite:///cleveragents.db (relative to CWD), causing concurrent workers to contend on the same SQLite file.

  2. Racy shared directory cleanup: common.resource previously removed ${TEMPDIR}/.cleveragents (the default home dir), which could interfere with parallel suites.

  3. No singleton reset in Robot helpers: Helper scripts like helper_m4_correction_subplan_smoke.py chain multiple CliRunner invocations within full_flow(). The Settings singleton, DI container, and MEMORY_ENGINES engine cache persisted between invocations, causing stale state leakage.

Fix

robot/common.resource — Refactored Setup Test Environment into two composable keywords:

  • Setup Test Environment — Creates per-suite CLEVERAGENTS_HOME, sets migration/mock env vars, and injects ${PYTHON}. Safe for all suites including those that run real cleveragents CLI commands.
  • Setup Database Isolation — Sets per-suite CLEVERAGENTS_DATABASE_URL and CLEVERAGENTS_TEST_DATABASE_URL pointing to SQLite files inside the per-suite home dir. Only called by helper-based suites that need it.
  • Setup Test Environment With Database Isolation — Convenience keyword calling both.

robot/m4_e2e_verification.robot, robot/m4_correction_subplan_smoke.robot, robot/m3_decision_validation_smoke.robot — Updated Suite Setup to call Setup Test Environment With Database Isolation.

robot/helper_m4_e2e_verification.py, robot/helper_m4_correction_subplan_smoke.py, robot/helper_m3_decision_validation_smoke.py — Added _reset_global_state() function that resets:

  • Settings._instance (singleton config)
  • reset_container() (DI container)
  • MEMORY_ENGINES (SQLAlchemy engine cache, with engine.dispose())

Called at main entry point and between chained CLI invocations in full_flow().

docs/development/testing.md — Added "Parallel Execution Isolation (pabot)" section documenting the isolation pattern and how to add new Robot helpers.

Verification

  • nox -s lint — passed (0 errors)
  • nox -s typecheck — passed (0 errors, 0 warnings)
  • nox -s unit_tests — passed (277 features, 8936 scenarios, 0 failures)
  • nox -s integration_tests — passed (1282 tests, 0 failures)
  • nox -s coverage_report — passed (97% threshold met)
## Implementation Summary ### Root Cause The M4 validation Robot Framework integration tests experienced intermittent race conditions during parallel execution via `pabot`. Three contributing factors were identified: 1. **Shared database URL**: `common.resource` did not set `CLEVERAGENTS_DATABASE_URL`, so all pabot workers fell back to the default `sqlite:///cleveragents.db` (relative to CWD), causing concurrent workers to contend on the same SQLite file. 2. **Racy shared directory cleanup**: `common.resource` previously removed `${TEMPDIR}/.cleveragents` (the default home dir), which could interfere with parallel suites. 3. **No singleton reset in Robot helpers**: Helper scripts like `helper_m4_correction_subplan_smoke.py` chain multiple `CliRunner` invocations within `full_flow()`. The `Settings` singleton, DI container, and `MEMORY_ENGINES` engine cache persisted between invocations, causing stale state leakage. ### Fix **`robot/common.resource`** — Refactored `Setup Test Environment` into two composable keywords: - `Setup Test Environment` — Creates per-suite `CLEVERAGENTS_HOME`, sets migration/mock env vars, and injects `${PYTHON}`. Safe for all suites including those that run real `cleveragents` CLI commands. - `Setup Database Isolation` — Sets per-suite `CLEVERAGENTS_DATABASE_URL` and `CLEVERAGENTS_TEST_DATABASE_URL` pointing to SQLite files inside the per-suite home dir. Only called by helper-based suites that need it. - `Setup Test Environment With Database Isolation` — Convenience keyword calling both. **`robot/m4_e2e_verification.robot`**, **`robot/m4_correction_subplan_smoke.robot`**, **`robot/m3_decision_validation_smoke.robot`** — Updated Suite Setup to call `Setup Test Environment With Database Isolation`. **`robot/helper_m4_e2e_verification.py`**, **`robot/helper_m4_correction_subplan_smoke.py`**, **`robot/helper_m3_decision_validation_smoke.py`** — Added `_reset_global_state()` function that resets: - `Settings._instance` (singleton config) - `reset_container()` (DI container) - `MEMORY_ENGINES` (SQLAlchemy engine cache, with `engine.dispose()`) Called at main entry point and between chained CLI invocations in `full_flow()`. **`docs/development/testing.md`** — Added "Parallel Execution Isolation (pabot)" section documenting the isolation pattern and how to add new Robot helpers. ### Verification - `nox -s lint` — passed (0 errors) - `nox -s typecheck` — passed (0 errors, 0 warnings) - `nox -s unit_tests` — passed (277 features, 8936 scenarios, 0 failures) - `nox -s integration_tests` — passed (1282 tests, 0 failures) - `nox -s coverage_report` — passed (97% threshold met)
Owner

PM Note — Dependency Added

This bug fix is blocked by TDD issue #635 per the mandatory TDD workflow in CONTRIBUTING.md. The failing test scaffolding must be merged before the fix can proceed.

Dependencies:

  • Blocked by: #635 (TDD failing tests for M4 race condition)
  • Transitively blocked by: #627 (Behave @tdd_expected_fail tags), #628 (Robot tdd_expected_fail tags)
**PM Note — Dependency Added** This bug fix is **blocked by** TDD issue #635 per the mandatory TDD workflow in CONTRIBUTING.md. The failing test scaffolding must be merged before the fix can proceed. **Dependencies:** - Blocked by: #635 (TDD failing tests for M4 race condition) - Transitively blocked by: #627 (Behave `@tdd_expected_fail` tags), #628 (Robot `tdd_expected_fail` tags)
Owner

PM Closure (Day 31): Bug fixed. PR #619 (fix(test): resolve race condition in M4 validation integration test) merged Day 30 by @brent.edwards.

The race condition in the M4 validation integration test has been resolved. The fix branch fix/m4-validation-race-condition was merged to develop.

TDD workflow note: TDD counterpart #635 was closed as State/Wont Do — the fix was applied before the TDD test was written (workflow deviation, accepted by PM decision).

Closing as State/Completed.

**PM Closure (Day 31)**: Bug fixed. PR #619 (`fix(test): resolve race condition in M4 validation integration test`) merged Day 30 by @brent.edwards. The race condition in the M4 validation integration test has been resolved. The fix branch `fix/m4-validation-race-condition` was merged to develop. **TDD workflow note**: TDD counterpart #635 was closed as `State/Wont Do` — the fix was applied before the TDD test was written (workflow deviation, accepted by PM decision). Closing as `State/Completed`.
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Depends on
Reference
cleveragents/cleveragents-core#563
No description provided.