fix(test): resolve race condition in M4 validation integration test #619

Merged
brent.edwards merged 1 commit from fix/m4-validation-race-condition into master 2026-03-10 19:13:23 +00:00
Member

Summary

Fixes the intermittent race condition in M4 validation integration tests when running under pabot. Three root causes identified and addressed:

  1. Shared SQLite DB URL — pabot workers all fell back to sqlite:///cleveragents.db relative to CWD. New composable Setup Database Isolation keyword gives each suite a unique CLEVERAGENTS_DATABASE_URL.

  2. Racy shared directory cleanup — old Setup Test Environment used a shared temp directory. Now uses per-suite ${TEMPDIR}/.cleveragents_${SuiteName}.

  3. Singleton leak in helpersreset_global_state() centralised in robot/helpers_common.py, now also resets reset_provider_registry().

Changes

  • robot/common.resource — composable keyword redesign with optional mock_ai and auto_apply_migrations arguments for backward compatibility
  • robot/helpers_common.py — new shared module with reset_global_state()
  • robot/helper_m4_e2e_verification.py, helper_m4_correction_subplan_smoke.py, helper_m3_decision_validation_smoke.py — delegate to helpers_common, fixed _COMMANDS typing to dict[str, Callable[[], None]]
  • robot/cli_plan_context_commands.robot — added missing Suite Teardown
  • robot/m4_e2e_verification.robot — added timeout=30s to all 10 Run Process calls
  • docs/development/testing.md — updated to reflect helpers_common delegation pattern
  • CHANGELOG.md — added entry

Quality Gates

Gate Result
nox -s lint Pass
nox -s typecheck 0 errors, 0 warnings

Fixes #563

## Summary Fixes the intermittent race condition in M4 validation integration tests when running under pabot. Three root causes identified and addressed: 1. **Shared SQLite DB URL** — pabot workers all fell back to `sqlite:///cleveragents.db` relative to CWD. New composable `Setup Database Isolation` keyword gives each suite a unique `CLEVERAGENTS_DATABASE_URL`. 2. **Racy shared directory cleanup** — old `Setup Test Environment` used a shared temp directory. Now uses per-suite `${TEMPDIR}/.cleveragents_${SuiteName}`. 3. **Singleton leak in helpers** — `reset_global_state()` centralised in `robot/helpers_common.py`, now also resets `reset_provider_registry()`. ### Changes - `robot/common.resource` — composable keyword redesign with optional `mock_ai` and `auto_apply_migrations` arguments for backward compatibility - `robot/helpers_common.py` — new shared module with `reset_global_state()` - `robot/helper_m4_e2e_verification.py`, `helper_m4_correction_subplan_smoke.py`, `helper_m3_decision_validation_smoke.py` — delegate to `helpers_common`, fixed `_COMMANDS` typing to `dict[str, Callable[[], None]]` - `robot/cli_plan_context_commands.robot` — added missing `Suite Teardown` - `robot/m4_e2e_verification.robot` — added `timeout=30s` to all 10 `Run Process` calls - `docs/development/testing.md` — updated to reflect `helpers_common` delegation pattern - `CHANGELOG.md` — added entry ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | Pass | | `nox -s typecheck` | 0 errors, 0 warnings | Fixes #563
brent.edwards added this to the v3.3.0 milestone 2026-03-07 00:01:03 +00:00
brent.edwards left a comment

Self-Review: PR #619 — fix(test): resolve race condition in M4 validation integration test

Reviewer: Brent Edwards (primary for robot/ and docs/)
Review scope: Test review (Robot tests & helpers), Docs review
Note: Self-review — cannot formally approve own PR. Requesting secondary reviewer (Rui) for approval.


Nox Verification Matrix

Session Result
nox -s lint Pass
nox -s typecheck Pass
nox -s unit_tests Pass
nox -s integration_tests Pass
nox -s coverage_report Pass (≥97%)
nox -s security_scan Pass (0 high/critical)
nox -s dead_code Pass
nox -s complexity Pass
nox -s benchmark N/A (no performance-affecting changes)

Root Cause & Fix Assessment

The three-pronged root cause analysis is correct and well-targeted:

  1. Shared DB URL — pabot workers all fell back to sqlite:///cleveragents.db relative to CWD. The composable Setup Database Isolation keyword correctly isolates per-suite DB files inside CLEVERAGENTS_HOME.

  2. Racy shared directory cleanup — The old Setup Test Environment removed ${TEMPDIR}/.cleveragents, a directory potentially shared by all workers. The new keyword uses ${TEMPDIR}/.cleveragents_${SuiteName}, providing per-suite directories.

  3. Singleton leak in helpers_reset_global_state() correctly resets Settings._instance, reset_container(), and MEMORY_ENGINES between chained CLI invocations.

The decision to split Setup Test Environment into composable keywords (base + optional DB isolation) instead of always setting CLEVERAGENTS_DATABASE_URL was the right call — it preserves the 150+ other suites that rely on the CLI determining its own DB path, while fixing the 3 helper-based suites that need explicit isolation.


Findings

P2:should-fix_reset_global_state() duplicated in 3 helper files

The identical ~30-line function is copy-pasted in helper_m4_e2e_verification.py:69-98, helper_m4_correction_subplan_smoke.py:46-69, and helper_m3_decision_validation_smoke.py:38-61. A shared module (e.g. robot/helpers_common.py) would reduce maintenance burden — if the singleton pattern changes (e.g. _instance renamed, new global state added), all three files must be updated in lockstep.

The documentation in testing.md:216-218 acknowledges this duplication as intentional, and the helpers are standalone scripts invoked via Run Process, so a shared import is straightforward.

Remediation: Extract to a shared module in a follow-up PR within 3 days.


P3:nit — testing.md says "≤2 workers by default" but actual default is CPU count

Line 196 of docs/development/testing.md states "≤2 workers by default". However, noxfile.py:17-25 sets the default to os.sched_getaffinity(0) (CPU count), which is typically 4-16 on dev machines. Minor doc inaccuracy.


P3:nit — Redundant CLEVERAGENTS_AUTO_APPLY_MIGRATIONS in cli_plan_context_commands.robot

Setup Test Environment now sets CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true for all suites. The cli_plan_context_commands.robot Suite Setup still has AND Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS true, which is now redundant. Harmless but could be cleaned up.


Pre-existing Issues (Not Introduced by This PR)

  • # type: ignore[operator] in helper_m4_correction_subplan_smoke.py:491 and helper_m3_decision_validation_smoke.py:351 — confirmed present on master. Caused by _COMMANDS: dict[str, object] typing; the M4 E2E helper uses dict[str, Callable[[], None]] which avoids this. Not a finding against this PR.

Verdict

No P0 or P1 findings. The fix is well-designed, correctly targeted at all three contributing causes, and preserves backward compatibility with the 150+ existing Robot suites. All nox sessions pass. Documentation is thorough and includes a template for future helpers.

Would approve with P2/P3 comments above. The P2 duplication finding should be addressed in a follow-up PR. Requesting Rui as secondary reviewer for formal approval.

## Self-Review: PR #619 — fix(test): resolve race condition in M4 validation integration test **Reviewer:** Brent Edwards (primary for `robot/` and `docs/`) **Review scope:** Test review (Robot tests & helpers), Docs review **Note:** Self-review — cannot formally approve own PR. Requesting secondary reviewer (Rui) for approval. --- ### Nox Verification Matrix | Session | Result | |---------|--------| | `nox -s lint` | ✅ Pass | | `nox -s typecheck` | ✅ Pass | | `nox -s unit_tests` | ✅ Pass | | `nox -s integration_tests` | ✅ Pass | | `nox -s coverage_report` | ✅ Pass (≥97%) | | `nox -s security_scan` | ✅ Pass (0 high/critical) | | `nox -s dead_code` | ✅ Pass | | `nox -s complexity` | ✅ Pass | | `nox -s benchmark` | N/A (no performance-affecting changes) | --- ### Root Cause & Fix Assessment The three-pronged root cause analysis is correct and well-targeted: 1. **Shared DB URL** — pabot workers all fell back to `sqlite:///cleveragents.db` relative to CWD. The composable `Setup Database Isolation` keyword correctly isolates per-suite DB files inside `CLEVERAGENTS_HOME`. 2. **Racy shared directory cleanup** — The old `Setup Test Environment` removed `${TEMPDIR}/.cleveragents`, a directory potentially shared by all workers. The new keyword uses `${TEMPDIR}/.cleveragents_${SuiteName}`, providing per-suite directories. 3. **Singleton leak in helpers** — `_reset_global_state()` correctly resets `Settings._instance`, `reset_container()`, and `MEMORY_ENGINES` between chained CLI invocations. The decision to split `Setup Test Environment` into composable keywords (base + optional DB isolation) instead of always setting `CLEVERAGENTS_DATABASE_URL` was the right call — it preserves the 150+ other suites that rely on the CLI determining its own DB path, while fixing the 3 helper-based suites that need explicit isolation. --- ### Findings **`P2:should-fix` — `_reset_global_state()` duplicated in 3 helper files** The identical ~30-line function is copy-pasted in `helper_m4_e2e_verification.py:69-98`, `helper_m4_correction_subplan_smoke.py:46-69`, and `helper_m3_decision_validation_smoke.py:38-61`. A shared module (e.g. `robot/helpers_common.py`) would reduce maintenance burden — if the singleton pattern changes (e.g. `_instance` renamed, new global state added), all three files must be updated in lockstep. The documentation in `testing.md:216-218` acknowledges this duplication as intentional, and the helpers are standalone scripts invoked via `Run Process`, so a shared import is straightforward. **Remediation:** Extract to a shared module in a follow-up PR within 3 days. --- **`P3:nit` — testing.md says "≤2 workers by default" but actual default is CPU count** Line 196 of `docs/development/testing.md` states "≤2 workers by default". However, `noxfile.py:17-25` sets the default to `os.sched_getaffinity(0)` (CPU count), which is typically 4-16 on dev machines. Minor doc inaccuracy. --- **`P3:nit` — Redundant `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS` in `cli_plan_context_commands.robot`** `Setup Test Environment` now sets `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true` for all suites. The `cli_plan_context_commands.robot` Suite Setup still has `AND Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS true`, which is now redundant. Harmless but could be cleaned up. --- ### Pre-existing Issues (Not Introduced by This PR) - `# type: ignore[operator]` in `helper_m4_correction_subplan_smoke.py:491` and `helper_m3_decision_validation_smoke.py:351` — confirmed present on `master`. Caused by `_COMMANDS: dict[str, object]` typing; the M4 E2E helper uses `dict[str, Callable[[], None]]` which avoids this. Not a finding against this PR. --- ### Verdict No P0 or P1 findings. The fix is well-designed, correctly targeted at all three contributing causes, and preserves backward compatibility with the 150+ existing Robot suites. All nox sessions pass. Documentation is thorough and includes a template for future helpers. **Would approve** with P2/P3 comments above. The P2 duplication finding should be addressed in a follow-up PR. Requesting Rui as secondary reviewer for formal approval.
@ -193,1 +193,4 @@
### Parallel Execution Isolation (pabot)
Robot integration tests run in parallel via `pabot` (≤2 workers by default).
Author
Member

P3:nit — Line says "≤2 workers by default" but noxfile.py:17-25 defaults to os.sched_getaffinity(0) (CPU count). On most dev machines this is 4-16.

P3:nit — Line says "≤2 workers by default" but `noxfile.py:17-25` defaults to `os.sched_getaffinity(0)` (CPU count). On most dev machines this is 4-16.
@ -65,2 +66,4 @@
runner = CliRunner()
def _reset_global_state() -> None:
Author
Member

P2:should-fix — This _reset_global_state() function is duplicated identically in 3 helper files. Consider extracting to a shared robot/helpers_common.py module in a follow-up PR to reduce maintenance burden if the singleton pattern changes.

P2:should-fix — This `_reset_global_state()` function is duplicated identically in 3 helper files. Consider extracting to a shared `robot/helpers_common.py` module in a follow-up PR to reduce maintenance burden if the singleton pattern changes.
Author
Member

Bug Review — Additional Findings

Two bugs found during a deeper code review pass, both introduced or relevant to this PR.


P1:must-fix — Incomplete env var cleanup in Cleanup Test Environment

File: robot/common.resource:75-86

Setup Test Environment (lines 36-40) sets 5 environment variables:

  • CLEVERAGENTS_HOME
  • CLEVERAGENTS_AUTO_APPLY_MIGRATIONS
  • CLEVERAGENTS_TESTING_USE_MOCK_AI
  • CLEVERAGENTS_DATABASE_URL (via Setup Database Isolation)
  • CLEVERAGENTS_TEST_DATABASE_URL (via Setup Database Isolation)

Cleanup Test Environment (lines 85-86) only removes 2 of them:

Remove Environment Variable    CLEVERAGENTS_DATABASE_URL
Remove Environment Variable    CLEVERAGENTS_TEST_DATABASE_URL

Missing cleanup for CLEVERAGENTS_HOME, CLEVERAGENTS_AUTO_APPLY_MIGRATIONS, and CLEVERAGENTS_TESTING_USE_MOCK_AI. The comment on line 84 says "Clean env vars to avoid leaking into subsequent suites" but doesn't follow through.

Impact: In sequential Robot runs (non-pabot), CLEVERAGENTS_HOME leaks pointing to a deleted directory (lines 82-83 remove it). If a subsequent suite calls Setup Database Isolation without Setup Test Environment, it reads a stale CLEVERAGENTS_HOME pointing to a non-existent path. The other two env vars leak silently and could mask failures in suites that expect them unset.

Fix: Add to Cleanup Test Environment:

Remove Environment Variable    CLEVERAGENTS_HOME
Remove Environment Variable    CLEVERAGENTS_AUTO_APPLY_MIGRATIONS
Remove Environment Variable    CLEVERAGENTS_TESTING_USE_MOCK_AI

P1:must-fix — Missing timeout= on all 10 Run Process calls in m4_e2e_verification.robot

File: robot/m4_e2e_verification.robot:17,27,37,47,56,65,75,85,95,106

All 10 test cases use Run Process without a timeout= parameter. The other two .robot files in this PR both have timeouts:

  • m3_decision_validation_smoke.robot — every test has timeout=30s
  • m4_correction_subplan_smoke.robot — every test has timeout=30s (60s for full-flow)

While technically pre-existing on master, this PR: (a) modified this file for pabot reliability, (b) added a testing.md section that explicitly says to always add timeout= to Run Process calls, and (c) the sibling suites already have timeouts, making this an inconsistency.

Impact: A hung helper process blocks a pabot worker indefinitely. Since pabot uses CPU-count workers (not "≤2" as the docs state), one stuck process reduces throughput and can stall CI.

Fix: Add timeout=30s to all 10 Run Process calls in m4_e2e_verification.robot.

## Bug Review — Additional Findings Two bugs found during a deeper code review pass, both introduced or relevant to this PR. --- ### P1:must-fix — Incomplete env var cleanup in `Cleanup Test Environment` **File:** `robot/common.resource:75-86` `Setup Test Environment` (lines 36-40) sets 5 environment variables: - `CLEVERAGENTS_HOME` - `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS` - `CLEVERAGENTS_TESTING_USE_MOCK_AI` - `CLEVERAGENTS_DATABASE_URL` (via `Setup Database Isolation`) - `CLEVERAGENTS_TEST_DATABASE_URL` (via `Setup Database Isolation`) `Cleanup Test Environment` (lines 85-86) only removes 2 of them: ```robot Remove Environment Variable CLEVERAGENTS_DATABASE_URL Remove Environment Variable CLEVERAGENTS_TEST_DATABASE_URL ``` Missing cleanup for `CLEVERAGENTS_HOME`, `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS`, and `CLEVERAGENTS_TESTING_USE_MOCK_AI`. The comment on line 84 says *"Clean env vars to avoid leaking into subsequent suites"* but doesn't follow through. **Impact:** In sequential Robot runs (non-pabot), `CLEVERAGENTS_HOME` leaks pointing to a **deleted** directory (lines 82-83 remove it). If a subsequent suite calls `Setup Database Isolation` without `Setup Test Environment`, it reads a stale `CLEVERAGENTS_HOME` pointing to a non-existent path. The other two env vars leak silently and could mask failures in suites that expect them unset. **Fix:** Add to `Cleanup Test Environment`: ```robot Remove Environment Variable CLEVERAGENTS_HOME Remove Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS Remove Environment Variable CLEVERAGENTS_TESTING_USE_MOCK_AI ``` --- ### P1:must-fix — Missing `timeout=` on all 10 `Run Process` calls in `m4_e2e_verification.robot` **File:** `robot/m4_e2e_verification.robot:17,27,37,47,56,65,75,85,95,106` All 10 test cases use `Run Process` without a `timeout=` parameter. The other two `.robot` files in this PR both have timeouts: - `m3_decision_validation_smoke.robot` — every test has `timeout=30s` - `m4_correction_subplan_smoke.robot` — every test has `timeout=30s` (`60s` for full-flow) While technically pre-existing on master, this PR: (a) modified this file for pabot reliability, (b) added a `testing.md` section that explicitly says to always add `timeout=` to `Run Process` calls, and (c) the sibling suites already have timeouts, making this an inconsistency. **Impact:** A hung helper process blocks a pabot worker indefinitely. Since pabot uses CPU-count workers (not "≤2" as the docs state), one stuck process reduces throughput and can stall CI. **Fix:** Add `timeout=30s` to all 10 `Run Process` calls in `m4_e2e_verification.robot`.
Author
Member

Review Fixes — 29b924ac

All findings from both review comments addressed.

Disposition of self-review #2033 (comment #55531)

ID Severity Resolution
P2 should-fix Fixed — Extracted _reset_global_state() to new shared robot/helpers_common.py module. The three helper files (helper_m4_e2e_verification.py, helper_m4_correction_subplan_smoke.py, helper_m3_decision_validation_smoke.py) now delegate to helpers_common.reset_global_state() instead of each carrying an identical ~30-line copy. Removed now-unused import contextlib from all three files.
P3 nit Fixedtesting.md worker count corrected from "≤2" to "CPU-count" (already fixed in d2d9681d).
P3 nit Fixed — Removed redundant Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS true from cli_plan_context_commands.robot Suite Setup. Setup Test Environment now sets this automatically (line 39 of common.resource).

Disposition of bug review (comment #55532)

ID Severity Resolution
P1 must-fix Fixed — Added Remove Environment Variable for CLEVERAGENTS_HOME, CLEVERAGENTS_AUTO_APPLY_MIGRATIONS, and CLEVERAGENTS_TESTING_USE_MOCK_AI to Cleanup Test Environment (already fixed in d2d9681d).
P1 must-fix Fixed — Added timeout=30s to all 10 Run Process calls in m4_e2e_verification.robot (already fixed in d2d9681d).

Quality gates

Gate Result
nox -s lint Pass
nox -s typecheck Pass (0 errors)
nox -s unit_tests Pass — 9,029 scenarios, 0 failures
## Review Fixes — `29b924ac` All findings from both review comments addressed. ### Disposition of self-review #2033 (comment #55531) | ID | Severity | Resolution | |----|----------|------------| | **P2** | should-fix | **Fixed** — Extracted `_reset_global_state()` to new shared `robot/helpers_common.py` module. The three helper files (`helper_m4_e2e_verification.py`, `helper_m4_correction_subplan_smoke.py`, `helper_m3_decision_validation_smoke.py`) now delegate to `helpers_common.reset_global_state()` instead of each carrying an identical ~30-line copy. Removed now-unused `import contextlib` from all three files. | | **P3** | nit | **Fixed** — `testing.md` worker count corrected from "≤2" to "CPU-count" *(already fixed in `d2d9681d`)*. | | **P3** | nit | **Fixed** — Removed redundant `Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS true` from `cli_plan_context_commands.robot` Suite Setup. `Setup Test Environment` now sets this automatically (line 39 of `common.resource`). | ### Disposition of bug review (comment #55532) | ID | Severity | Resolution | |----|----------|------------| | **P1** | must-fix | **Fixed** — Added `Remove Environment Variable` for `CLEVERAGENTS_HOME`, `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS`, and `CLEVERAGENTS_TESTING_USE_MOCK_AI` to `Cleanup Test Environment` *(already fixed in `d2d9681d`)*. | | **P1** | must-fix | **Fixed** — Added `timeout=30s` to all 10 `Run Process` calls in `m4_e2e_verification.robot` *(already fixed in `d2d9681d`)*. | ### Quality gates | Gate | Result | |------|--------| | `nox -s lint` | Pass | | `nox -s typecheck` | Pass (0 errors) | | `nox -s unit_tests` | Pass — **9,029 scenarios, 0 failures** |
freemo left a comment

Planning Authority Review — PR #619

Branch Naming Convention:
This PR uses the fix/ prefix (fix/m4-validation-race-condition), but per CONTRIBUTING.md, bug fix branches should use the bugfix/ prefix (e.g., bugfix/m4-validation-race-condition). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent bugfix branches.

TDD Counterpart Tracking:
TDD counterpart issue #635 has been created for the originating bug #563. This ensures the TDD cycle is properly tracked alongside the fix.

Review Status:
This PR has 2 comments with P1 fixes, all addressed by the author. No additional blocking concerns from planning at this time.

## Planning Authority Review — PR #619 **Branch Naming Convention:** This PR uses the `fix/` prefix (`fix/m4-validation-race-condition`), but per CONTRIBUTING.md, bug fix branches should use the `bugfix/` prefix (e.g., `bugfix/m4-validation-race-condition`). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent bugfix branches. **TDD Counterpart Tracking:** TDD counterpart issue #635 has been created for the originating bug #563. This ensures the TDD cycle is properly tracked alongside the fix. **Review Status:** This PR has 2 comments with P1 fixes, all addressed by the author. No additional blocking concerns from planning at this time.
Owner

PM Status Check — Day 29

Author: @brent.edwards | Milestone: v3.3.0 (M4) | Mergeable: Yes | Reviews: Self-review only

Current State

Fix for M4 validation integration test race condition (bug #563). Brent self-reviewed and addressed all findings in commit 29b924ac. Includes:

  • Extracted shared _reset_global_state() to robot/helpers_common.py
  • Fixed env var cleanup leak (3 missing Remove Environment Variable calls)
  • Added timeout=30s to all 10 Run Process calls in m4_e2e_verification.robot
  • Quality gates: lint pass, typecheck 0 errors, 9,029 scenarios 0 failures

Issue: No External Reviewer

This PR has zero assigned reviewers and only self-review. Per CONTRIBUTING.md, bug fix PRs require at least one external reviewer approval.

Status: NEEDS REVIEWER ASSIGNMENT

Action Required

Who Action Deadline
@aditya Review this bug fix PR Mar 10 EOD
@hurui200320 Secondary review (optional) Mar 11

Note: PR body is empty — @brent.edwards please fill in per CONTRIBUTING template (summary, changes, quality gates, ISSUES CLOSED: line). This is a bug fix for #563, which is on the M4 critical path.

## PM Status Check — Day 29 **Author**: @brent.edwards | **Milestone**: v3.3.0 (M4) | **Mergeable**: Yes | **Reviews**: Self-review only ### Current State Fix for M4 validation integration test race condition (bug #563). Brent self-reviewed and addressed all findings in commit `29b924ac`. Includes: - Extracted shared `_reset_global_state()` to `robot/helpers_common.py` - Fixed env var cleanup leak (3 missing `Remove Environment Variable` calls) - Added `timeout=30s` to all 10 `Run Process` calls in `m4_e2e_verification.robot` - Quality gates: lint pass, typecheck 0 errors, 9,029 scenarios 0 failures ### Issue: No External Reviewer This PR has **zero assigned reviewers** and only self-review. Per CONTRIBUTING.md, bug fix PRs require at least one external reviewer approval. ### Status: NEEDS REVIEWER ASSIGNMENT ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @aditya | Review this bug fix PR | **Mar 10 EOD** | | @hurui200320 | Secondary review (optional) | Mar 11 | **Note**: PR body is empty — @brent.edwards please fill in per CONTRIBUTING template (summary, changes, quality gates, `ISSUES CLOSED:` line). This is a bug fix for #563, which is on the M4 critical path.
hurui200320 requested changes 2026-03-09 06:35:41 +00:00
Dismissed
hurui200320 left a comment

Review: PR #619 — fix(test): resolve race condition in M4 validation integration test

Reviewer: Rui Hu (@hurui200320) | Scope: Code review, test review, docs review

Technical Assessment

The core fix is sound and well-designed. The three-pronged root cause analysis (shared DB URL, racy shared directory, singleton leaks in chained CLI invocations) is correct. The composable keyword design (Setup Test Environment + optional Setup Database Isolation) preserves backward compatibility with 150+ existing suites while properly isolating the 3 affected helper-based suites. The full_flow() chained invocations correctly call _reset_global_state() between CLI steps 1-2 and correctly skip it before step 3 (pure domain logic, no CLI/DB). The centralization of reset logic in helpers_common.py is a clean refactor.

All findings from Brent's self-review and bug review have been verified as resolved in the latest code.


Finding 1 — P2:should-fix — helpers_common.reset_global_state() does not reset ProviderRegistry

File: robot/helpers_common.py:18-44

The function resets 3 singletons (Settings._instance, reset_container(), MEMORY_ENGINES) but omits reset_provider_registry() from cleveragents.providers.registry.

The BDD test harness (features/environment.py, after_scenario hook) resets all four:

Settings._instance = None
reset_container()
reset_provider_registry()   # <-- missing from helpers_common.py

The ProviderRegistry at src/cleveragents/providers/registry.py maintains its own module-level _registry global. After reset_container(), this stale _registry persists. In current code paths this is mitigated because the DI container passes settings explicitly to get_provider_registry(settings=...), which forces a fresh registry. However, any code that calls get_provider_registry() without the settings argument between a reset and the first container access would get the stale registry (with the old Settings and thus the old database URL). This is a latent bug.

Recommendation: Add to helpers_common.reset_global_state():

with contextlib.suppress(ImportError):
    from cleveragents.providers.registry import reset_provider_registry
    reset_provider_registry()

Finding 2 — P2:should-fix — cli_plan_context_commands.robot missing Suite Teardown after expanded env var setup

File: robot/cli_plan_context_commands.robot:1-9

This file has Suite Setup but no Suite Teardown. This PR expanded Setup Test Environment to also set CLEVERAGENTS_AUTO_APPLY_MIGRATIONS and CLEVERAGENTS_TESTING_USE_MOCK_AI (common.resource:39-40), and the PR touched this file to remove the now-redundant explicit env var. The missing teardown now leaks 3 env vars (up from 1 pre-PR), plus the per-suite temp directory is never cleaned up. The other three .robot files modified by this PR all correctly have Suite Teardown Cleanup Test Environment.

Recommendation: Add:

Suite Teardown    Cleanup Test Environment

Finding 3 — P2:should-fix — testing.md template contradicts the code it documents

File: docs/development/testing.md:216-247

Two inconsistencies in the newly added documentation:

  1. Lines 216-218 say "The reset pattern is defined identically in each helper that needs it" — but the code in this same PR centralizes it in helpers_common.py. The helpers delegate; they no longer carry identical copies.

  2. Lines 231-246 show a ~15-line inline _reset_global_state() template with contextlib.suppress(ImportError) blocks. This does not match the delegation pattern now used in all three helpers. A developer following this template would re-introduce the duplication that commit 29b924ac explicitly eliminated.

Recommendation: Update lines 216-218 to:

"The reset logic is centralised in robot/helpers_common.py. Each helper imports and delegates to helpers_common.reset_global_state()."

Replace the code example (lines 231-246) with:

from helpers_common import reset_global_state

def _reset_global_state() -> None:
    """Reset process-wide singletons between CLI invocations."""
    reset_global_state()

Finding 4 — P1:must-fix — PR body is empty

Per CONTRIBUTING.md: "PRs submitted without a description or without an issue reference will not be reviewed." The PR body is blank. Missing: summary of changes, closing keyword (Closes #563), and Forgejo dependency link.


Finding 5 — P1:must-fix — Commit footers have missing/incorrect issue references

Per CONTRIBUTING.md requirement #4: "Every commit must reference the issue it addresses in its commit message footer."

  • Commit d2d9681d: No issue reference footer at all.
  • Commit 29b924ac: Footer says Refs: #619 (the PR number) instead of Refs: #563 (the issue number).

Verdict: Request Changes

The technical fix is well-designed and the root cause analysis is correct. The blocking items are Finding 1 (incomplete singleton reset in new code), Finding 4 (empty PR body), and Finding 5 (commit footers). Findings 2-3 should also be addressed before merge.

## Review: PR #619 — fix(test): resolve race condition in M4 validation integration test **Reviewer:** Rui Hu (`@hurui200320`) | **Scope:** Code review, test review, docs review ### Technical Assessment The core fix is **sound and well-designed**. The three-pronged root cause analysis (shared DB URL, racy shared directory, singleton leaks in chained CLI invocations) is correct. The composable keyword design (`Setup Test Environment` + optional `Setup Database Isolation`) preserves backward compatibility with 150+ existing suites while properly isolating the 3 affected helper-based suites. The `full_flow()` chained invocations correctly call `_reset_global_state()` between CLI steps 1-2 and correctly skip it before step 3 (pure domain logic, no CLI/DB). The centralization of reset logic in `helpers_common.py` is a clean refactor. All findings from Brent's self-review and bug review have been verified as resolved in the latest code. --- ### Finding 1 — P2:should-fix — `helpers_common.reset_global_state()` does not reset `ProviderRegistry` **File:** `robot/helpers_common.py:18-44` The function resets 3 singletons (`Settings._instance`, `reset_container()`, `MEMORY_ENGINES`) but omits `reset_provider_registry()` from `cleveragents.providers.registry`. The BDD test harness (`features/environment.py`, `after_scenario` hook) resets all four: ```python Settings._instance = None reset_container() reset_provider_registry() # <-- missing from helpers_common.py ``` The `ProviderRegistry` at `src/cleveragents/providers/registry.py` maintains its own module-level `_registry` global. After `reset_container()`, this stale `_registry` persists. In current code paths this is mitigated because the DI container passes `settings` explicitly to `get_provider_registry(settings=...)`, which forces a fresh registry. However, any code that calls `get_provider_registry()` **without** the `settings` argument between a reset and the first container access would get the stale registry (with the old `Settings` and thus the old database URL). This is a latent bug. **Recommendation:** Add to `helpers_common.reset_global_state()`: ```python with contextlib.suppress(ImportError): from cleveragents.providers.registry import reset_provider_registry reset_provider_registry() ``` --- ### Finding 2 — P2:should-fix — `cli_plan_context_commands.robot` missing `Suite Teardown` after expanded env var setup **File:** `robot/cli_plan_context_commands.robot:1-9` This file has `Suite Setup` but no `Suite Teardown`. This PR expanded `Setup Test Environment` to also set `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS` and `CLEVERAGENTS_TESTING_USE_MOCK_AI` (`common.resource:39-40`), and the PR touched this file to remove the now-redundant explicit env var. The missing teardown now leaks 3 env vars (up from 1 pre-PR), plus the per-suite temp directory is never cleaned up. The other three `.robot` files modified by this PR all correctly have `Suite Teardown Cleanup Test Environment`. **Recommendation:** Add: ```robot Suite Teardown Cleanup Test Environment ``` --- ### Finding 3 — P2:should-fix — `testing.md` template contradicts the code it documents **File:** `docs/development/testing.md:216-247` Two inconsistencies in the newly added documentation: 1. **Lines 216-218** say "The reset pattern is defined identically in each helper that needs it" — but the code in this same PR centralizes it in `helpers_common.py`. The helpers delegate; they no longer carry identical copies. 2. **Lines 231-246** show a ~15-line inline `_reset_global_state()` template with `contextlib.suppress(ImportError)` blocks. This does not match the delegation pattern now used in all three helpers. A developer following this template would re-introduce the duplication that commit `29b924ac` explicitly eliminated. **Recommendation:** Update lines 216-218 to: > "The reset logic is centralised in `robot/helpers_common.py`. Each helper imports and delegates to `helpers_common.reset_global_state()`." Replace the code example (lines 231-246) with: ```python from helpers_common import reset_global_state def _reset_global_state() -> None: """Reset process-wide singletons between CLI invocations.""" reset_global_state() ``` --- ### Finding 4 — P1:must-fix — PR body is empty Per CONTRIBUTING.md: "PRs submitted without a description or without an issue reference will not be reviewed." The PR body is blank. Missing: summary of changes, closing keyword (`Closes #563`), and Forgejo dependency link. --- ### Finding 5 — P1:must-fix — Commit footers have missing/incorrect issue references Per CONTRIBUTING.md requirement #4: "Every commit must reference the issue it addresses in its commit message footer." - Commit `d2d9681d`: No issue reference footer at all. - Commit `29b924ac`: Footer says `Refs: #619` (the PR number) instead of `Refs: #563` (the issue number). --- ### Verdict: **Request Changes** The technical fix is well-designed and the root cause analysis is correct. The blocking items are Finding 1 (incomplete singleton reset in new code), Finding 4 (empty PR body), and Finding 5 (commit footers). Findings 2-3 should also be addressed before merge.
Member

Code Review — PR #619: fix(test): resolve race condition in M4 validation integration test

Latest Commit: 29b924ac6efix(test): deduplicate _reset_global_state and clean up redundant env var
Issue: #563 — Racing condition in integration test for M4 validation
Branch: fix/m4-validation-race-condition
Reviewer: Aditya Chhabra


Scope

This PR fixes a pabot-parallel race condition in the M4 validation integration tests. Changes span:

  • robot/common.resource — composable keyword redesign (Setup Database Isolation, Setup Test Environment With Database Isolation, expanded Cleanup Test Environment)
  • robot/m4_e2e_verification.robot, robot/m4_correction_subplan_smoke.robot, robot/m3_decision_validation_smoke.robot — upgraded Suite Setup to Setup Test Environment With Database Isolation
  • robot/helper_m4_e2e_verification.py, robot/helper_m4_correction_subplan_smoke.py, robot/helper_m3_decision_validation_smoke.py — added thin _reset_global_state() delegates
  • robot/helpers_common.py — new shared module with reset_global_state()
  • robot/cli_plan_context_commands.robot — removed now-redundant explicit env var set
  • docs/development/testing.md — new "Parallel Execution Isolation (pabot)" section

New Findings

F1 — Buried Import in _reset_global_state() Thin Wrappers [HIGH — Standards Violation]

Files:

  • robot/helper_m4_e2e_verification.py:73
  • robot/helper_m4_correction_subplan_smoke.py:50
  • robot/helper_m3_decision_validation_smoke.py:42

All three helper files define an identical thin wrapper:

def _reset_global_state() -> None:
    """..."""
    from helpers_common import reset_global_state   # <-- buried inside function
    reset_global_state()

CONTRIBUTING.md (lines 1187-1188) is explicit: "Ensure all imports (including from statements) are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."

The P2 self-review finding correctly identified the duplication and extracted the implementation to helpers_common.py. However, the thin wrappers still have the import inside the function body. There is no technical justification for this — helpers_common is a stable, unconditional dependency of each file (it lives in the same robot/ directory, which Python adds to sys.path automatically when the script is executed). This differs from the deferred imports inside helpers_common.reset_global_state() itself, which are legitimately guarded by contextlib.suppress(ImportError) to handle optional application-level modules.

What needs to be done:

Move helpers_common to the module-level import block in all three files — alongside the existing from cleveragents... imports. The thin wrapper function then contains only the call:

# At module level:
from helpers_common import reset_global_state

# Wrapper (function body is now clean):
def _reset_global_state() -> None:
    """Reset process-wide singletons between CLI invocations."""
    reset_global_state()

Alternatively, remove _reset_global_state() entirely and call reset_global_state() from helpers_common directly at the call sites — the thin wrapper provides no added value.


F2 — Double Reset Before full_flow() Step 1 Is Redundant and Obscures Intent [LOW — Code Clarity]

File: robot/helper_m4_correction_subplan_smoke.py:468-474 (the __main__ block) and robot/helper_m4_correction_subplan_smoke.py:380 (start of full_flow())

The __main__ block calls _reset_global_state() before dispatching:

if __name__ == "__main__":
    ...
    _reset_global_state()   # reset at process entry
    fn = _COMMANDS[sys.argv[1]]
    fn()

full_flow() also calls _reset_global_state() as its very first line before Step 1:

def full_flow() -> None:
    # Step 1: correction revert
    _reset_global_state()   # reset again immediately
    ...

When the full-flow command is invoked, state is reset twice before the first CLI call. For all other single-step commands, the __main__ reset is correct; full_flow() should not need to duplicate it.

The double-reset is harmless but creates ambiguity: it is not clear whether the full_flow()-level reset is defensive programming, an oversight from when full_flow() was written before the __main__-level reset was added, or intentional for a reason not documented in the comment. A reader following the "reset between chained invocations" pattern documented in testing.md would not expect a reset before the first invocation.

What needs to be done:

Remove the _reset_global_state() call at the top of full_flow() (Step 1). The __main__ block's reset is sufficient to ensure a clean starting state. The in-function resets before Steps 2 (line 414) are correct and must be retained. Add a brief comment to the __main__ block explaining that the reset covers the full_flow() case:

if __name__ == "__main__":
    ...
    # Reset singletons once at process entry; full_flow() resets again
    # between each chained CLI invocation.
    _reset_global_state()
    fn = _COMMANDS[sys.argv[1]]
    fn()

F3 — Cleanup Test Environment Re-reads CLEVERAGENTS_HOME from Environment Instead of Using ${SUITE_HOME} [LOW — Correctness / Consistency]

File: robot/common.resource:81-83

Setup Test Environment sets both the CLEVERAGENTS_HOME environment variable and the ${SUITE_HOME} suite variable (line 37) — the latter exists precisely so that the suite retains a reliable reference to the directory it created. Cleanup Test Environment ignores ${SUITE_HOME} and reads CLEVERAGENTS_HOME back out of the environment:

Cleanup Test Environment
    ${home}=    Get Environment Variable    CLEVERAGENTS_HOME    default=${EMPTY}
    Run Keyword If    '${home}' != '${EMPTY}'    Run Keyword And Ignore Error
    ...    Remove Directory    ${home}    recursive=True

If a test case modifies CLEVERAGENTS_HOME (e.g., to verify behavior with a custom home path), cleanup removes the modified path rather than the one the suite created. The ${SUITE_HOME} variable cannot be modified by individual test cases without an explicit Set Suite Variable call in the test body.

Using ${SUITE_HOME} directly also eliminates the Get Environment Variable call and the '${home}' != '${EMPTY}' null-guard — ${SUITE_HOME} is guaranteed to be set if Setup Test Environment ran, and if it did not run, the suite setup itself would have failed first.

What needs to be done:

Cleanup Test Environment
    [Documentation]    Clean up after tests.
    ...
    ...                Removes the per-suite CLEVERAGENTS_HOME directory and
    ...                any associated SQLite files.
    Log    Cleaning up test environment
    Run Keyword And Ignore Error    Remove Directory    ${SUITE_HOME}    recursive=True
    Remove Environment Variable    CLEVERAGENTS_HOME
    Remove Environment Variable    CLEVERAGENTS_AUTO_APPLY_MIGRATIONS
    Remove Environment Variable    CLEVERAGENTS_TESTING_USE_MOCK_AI
    Remove Environment Variable    CLEVERAGEPS_DATABASE_URL
    Remove Environment Variable    CLEVERAGENTS_TEST_DATABASE_URL

Note: ${SUITE_DB_PATH} and ${SUITE_TEST_DB_PATH} are set by Setup Database Isolation but are not currently consumed by any test case or teardown. If they are not intended for external use, they can be removed.


F4 — contextlib.suppress(Exception) Is Overly Broad in reset_global_state() [LOW — Error Handling]

File: robot/helpers_common.py:42-44

for _url, engine in list(MEMORY_ENGINES.items()):
    with contextlib.suppress(Exception):
        engine.dispose()
MEMORY_ENGINES.clear()

contextlib.suppress(Exception) catches all non-fatal exceptions, including AttributeError, TypeError, and other programming errors. If engine.dispose() raises due to a defect in the engine object (e.g., an unexpected type stored in MEMORY_ENGINES), the exception is silently discarded and MEMORY_ENGINES.clear() runs regardless. This can mask bugs in the test infrastructure that would be caught immediately if the exception propagated or was logged.

The rationale for suppressing here is presumably that a partially-disposed engine should not abort the reset cycle — but suppressing all exceptions is broader than that intent.

What needs to be done:

Target the suppression to the specific exceptions engine.dispose() legitimately raises (e.g., OperationalError or DBAPIError from SQLAlchemy), or log the exception to sys.stderr before suppressing:

import sys

for _url, engine in list(MEMORY_ENGINES.items()):
    try:
        engine.dispose()
    except Exception as exc:  # noqa: BLE001
        print(f"[helpers_common] engine.dispose() suppressed: {exc}", file=sys.stderr)
MEMORY_ENGINES.clear()

This keeps the non-blocking reset behavior while making silent failures visible during debugging.


F5 — from __future__ import annotations in helpers_common.py Is Unnecessary [Nit]

File: robot/helpers_common.py:13

from __future__ import annotations enables PEP 563 deferred evaluation of annotations and is needed only when type annotations reference names that are not yet defined at parse time (forward references) or when targeting Python versions before 3.10 that do not support union syntax (X | Y). helpers_common.py has a single function with the signature def reset_global_state() -> None:, which does not require deferred evaluation under any supported Python version.

The import adds no value and was likely copied from the helper file template that uses it for domain-model imports. Remove it.


Severity Summary

# Finding Severity Category File
F1 Buried from helpers_common import inside _reset_global_state() function body HIGH Standards / Policy helper_m4_e2e_verification.py, helper_m4_correction_subplan_smoke.py, helper_m3_decision_validation_smoke.py
F2 Double reset before full_flow() Step 1 — redundant and obscures intent LOW Code Clarity helper_m4_correction_subplan_smoke.py
F3 Cleanup Test Environment reads env var instead of using ${SUITE_HOME} LOW Correctness robot/common.resource
F4 contextlib.suppress(Exception) too broad — swallows programming errors silently LOW Error Handling robot/helpers_common.py
F5 from __future__ import annotations unnecessary Nit Style robot/helpers_common.py

Positive Observations

  • The three-pronged root cause analysis (shared DB URL, racy shared directory cleanup, singleton leak in chained CLI helpers) is accurate and complete. Each cause is addressed with a targeted, minimal fix.
  • The composable keyword design (Setup Test Environment + optional Setup Database Isolation + convenience Setup Test Environment With Database Isolation) is the correct architectural choice. It preserves backward compatibility with 150+ existing suites while adding isolation only where needed, without requiring a global change.
  • The per-suite CLEVERAGENTS_HOME (${TEMPDIR}/.cleveragents_${safe_suite}) is a clean solution that avoids any shared mutable state between pabot workers.
  • helpers_common.reset_global_state() using contextlib.suppress(ImportError) for each optional import is the correct defensive pattern for a shared utility that may be loaded in contexts where not all application modules are on the path.
  • The full_flow() docstring in helper_m4_correction_subplan_smoke.py now clearly explains why _reset_global_state() is called between steps and why Step 3 (pure domain logic) does not need one. This is exactly the kind of non-obvious reasoning that deserves a comment.
  • timeout=30s on all 10 Run Process calls in m4_e2e_verification.robot and timeout=60s on the chained full-flow command are well-calibrated and consistent across the three .robot files.
  • Remove Keyword And Ignore Error is used correctly for the directory removal in Cleanup Test Environment — a failing cleanup should not mark an otherwise-passing test as failed.

Verdict

Requesting changes on F1 (buried import in all three helper wrappers — explicit CONTRIBUTING.md violation). F2–F5 are lower-priority but should be addressed in the same commit to avoid accumulating small debt. The core fix is technically sound and the composable keyword design is the right architectural approach.

# Code Review — PR #619: `fix(test): resolve race condition in M4 validation integration test` **Latest Commit:** `29b924ac6e` — `fix(test): deduplicate _reset_global_state and clean up redundant env var` **Issue:** #563 — Racing condition in integration test for M4 validation **Branch:** `fix/m4-validation-race-condition` **Reviewer:** Aditya Chhabra --- ## Scope This PR fixes a pabot-parallel race condition in the M4 validation integration tests. Changes span: - `robot/common.resource` — composable keyword redesign (`Setup Database Isolation`, `Setup Test Environment With Database Isolation`, expanded `Cleanup Test Environment`) - `robot/m4_e2e_verification.robot`, `robot/m4_correction_subplan_smoke.robot`, `robot/m3_decision_validation_smoke.robot` — upgraded `Suite Setup` to `Setup Test Environment With Database Isolation` - `robot/helper_m4_e2e_verification.py`, `robot/helper_m4_correction_subplan_smoke.py`, `robot/helper_m3_decision_validation_smoke.py` — added thin `_reset_global_state()` delegates - `robot/helpers_common.py` — new shared module with `reset_global_state()` - `robot/cli_plan_context_commands.robot` — removed now-redundant explicit env var set - `docs/development/testing.md` — new "Parallel Execution Isolation (pabot)" section --- ## New Findings ### F1 — Buried Import in `_reset_global_state()` Thin Wrappers [HIGH — Standards Violation] **Files:** - `robot/helper_m4_e2e_verification.py:73` - `robot/helper_m4_correction_subplan_smoke.py:50` - `robot/helper_m3_decision_validation_smoke.py:42` All three helper files define an identical thin wrapper: ```python def _reset_global_state() -> None: """...""" from helpers_common import reset_global_state # <-- buried inside function reset_global_state() ``` `CONTRIBUTING.md` (lines 1187-1188) is explicit: *"Ensure all imports (including `from` statements) are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."* The P2 self-review finding correctly identified the duplication and extracted the implementation to `helpers_common.py`. However, the thin wrappers still have the import inside the function body. There is no technical justification for this — `helpers_common` is a stable, unconditional dependency of each file (it lives in the same `robot/` directory, which Python adds to `sys.path` automatically when the script is executed). This differs from the deferred imports inside `helpers_common.reset_global_state()` itself, which are legitimately guarded by `contextlib.suppress(ImportError)` to handle optional application-level modules. **What needs to be done:** Move `helpers_common` to the module-level import block in all three files — alongside the existing `from cleveragents...` imports. The thin wrapper function then contains only the call: ```python # At module level: from helpers_common import reset_global_state # Wrapper (function body is now clean): def _reset_global_state() -> None: """Reset process-wide singletons between CLI invocations.""" reset_global_state() ``` Alternatively, remove `_reset_global_state()` entirely and call `reset_global_state()` from `helpers_common` directly at the call sites — the thin wrapper provides no added value. --- ### F2 — Double Reset Before `full_flow()` Step 1 Is Redundant and Obscures Intent [LOW — Code Clarity] **File:** `robot/helper_m4_correction_subplan_smoke.py:468-474` (the `__main__` block) and `robot/helper_m4_correction_subplan_smoke.py:380` (start of `full_flow()`) The `__main__` block calls `_reset_global_state()` before dispatching: ```python if __name__ == "__main__": ... _reset_global_state() # reset at process entry fn = _COMMANDS[sys.argv[1]] fn() ``` `full_flow()` also calls `_reset_global_state()` as its very first line before Step 1: ```python def full_flow() -> None: # Step 1: correction revert _reset_global_state() # reset again immediately ... ``` When the `full-flow` command is invoked, state is reset twice before the first CLI call. For all other single-step commands, the `__main__` reset is correct; `full_flow()` should not need to duplicate it. The double-reset is harmless but creates ambiguity: it is not clear whether the `full_flow()`-level reset is defensive programming, an oversight from when `full_flow()` was written before the `__main__`-level reset was added, or intentional for a reason not documented in the comment. A reader following the "reset between chained invocations" pattern documented in `testing.md` would not expect a reset before the first invocation. **What needs to be done:** Remove the `_reset_global_state()` call at the top of `full_flow()` (Step 1). The `__main__` block's reset is sufficient to ensure a clean starting state. The in-function resets before Steps 2 (line 414) are correct and must be retained. Add a brief comment to the `__main__` block explaining that the reset covers the `full_flow()` case: ```python if __name__ == "__main__": ... # Reset singletons once at process entry; full_flow() resets again # between each chained CLI invocation. _reset_global_state() fn = _COMMANDS[sys.argv[1]] fn() ``` --- ### F3 — `Cleanup Test Environment` Re-reads `CLEVERAGENTS_HOME` from Environment Instead of Using `${SUITE_HOME}` [LOW — Correctness / Consistency] **File:** `robot/common.resource:81-83` `Setup Test Environment` sets both the `CLEVERAGENTS_HOME` environment variable **and** the `${SUITE_HOME}` suite variable (line 37) — the latter exists precisely so that the suite retains a reliable reference to the directory it created. `Cleanup Test Environment` ignores `${SUITE_HOME}` and reads `CLEVERAGENTS_HOME` back out of the environment: ```robotframework Cleanup Test Environment ${home}= Get Environment Variable CLEVERAGENTS_HOME default=${EMPTY} Run Keyword If '${home}' != '${EMPTY}' Run Keyword And Ignore Error ... Remove Directory ${home} recursive=True ``` If a test case modifies `CLEVERAGENTS_HOME` (e.g., to verify behavior with a custom home path), cleanup removes the modified path rather than the one the suite created. The `${SUITE_HOME}` variable cannot be modified by individual test cases without an explicit `Set Suite Variable` call in the test body. Using `${SUITE_HOME}` directly also eliminates the `Get Environment Variable` call and the `'${home}' != '${EMPTY}'` null-guard — `${SUITE_HOME}` is guaranteed to be set if `Setup Test Environment` ran, and if it did not run, the suite setup itself would have failed first. **What needs to be done:** ```robotframework Cleanup Test Environment [Documentation] Clean up after tests. ... ... Removes the per-suite CLEVERAGENTS_HOME directory and ... any associated SQLite files. Log Cleaning up test environment Run Keyword And Ignore Error Remove Directory ${SUITE_HOME} recursive=True Remove Environment Variable CLEVERAGENTS_HOME Remove Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS Remove Environment Variable CLEVERAGENTS_TESTING_USE_MOCK_AI Remove Environment Variable CLEVERAGEPS_DATABASE_URL Remove Environment Variable CLEVERAGENTS_TEST_DATABASE_URL ``` Note: `${SUITE_DB_PATH}` and `${SUITE_TEST_DB_PATH}` are set by `Setup Database Isolation` but are not currently consumed by any test case or teardown. If they are not intended for external use, they can be removed. --- ### F4 — `contextlib.suppress(Exception)` Is Overly Broad in `reset_global_state()` [LOW — Error Handling] **File:** `robot/helpers_common.py:42-44` ```python for _url, engine in list(MEMORY_ENGINES.items()): with contextlib.suppress(Exception): engine.dispose() MEMORY_ENGINES.clear() ``` `contextlib.suppress(Exception)` catches all non-fatal exceptions, including `AttributeError`, `TypeError`, and other programming errors. If `engine.dispose()` raises due to a defect in the engine object (e.g., an unexpected type stored in `MEMORY_ENGINES`), the exception is silently discarded and `MEMORY_ENGINES.clear()` runs regardless. This can mask bugs in the test infrastructure that would be caught immediately if the exception propagated or was logged. The rationale for suppressing here is presumably that a partially-disposed engine should not abort the reset cycle — but suppressing all exceptions is broader than that intent. **What needs to be done:** Target the suppression to the specific exceptions `engine.dispose()` legitimately raises (e.g., `OperationalError` or `DBAPIError` from SQLAlchemy), or log the exception to `sys.stderr` before suppressing: ```python import sys for _url, engine in list(MEMORY_ENGINES.items()): try: engine.dispose() except Exception as exc: # noqa: BLE001 print(f"[helpers_common] engine.dispose() suppressed: {exc}", file=sys.stderr) MEMORY_ENGINES.clear() ``` This keeps the non-blocking reset behavior while making silent failures visible during debugging. --- ### F5 — `from __future__ import annotations` in `helpers_common.py` Is Unnecessary [Nit] **File:** `robot/helpers_common.py:13` `from __future__ import annotations` enables PEP 563 deferred evaluation of annotations and is needed only when type annotations reference names that are not yet defined at parse time (forward references) or when targeting Python versions before 3.10 that do not support union syntax (`X | Y`). `helpers_common.py` has a single function with the signature `def reset_global_state() -> None:`, which does not require deferred evaluation under any supported Python version. The import adds no value and was likely copied from the helper file template that uses it for domain-model imports. Remove it. --- ## Severity Summary | # | Finding | Severity | Category | File | |---|---------|----------|----------|------| | F1 | Buried `from helpers_common import` inside `_reset_global_state()` function body | HIGH | Standards / Policy | `helper_m4_e2e_verification.py`, `helper_m4_correction_subplan_smoke.py`, `helper_m3_decision_validation_smoke.py` | | F2 | Double reset before `full_flow()` Step 1 — redundant and obscures intent | LOW | Code Clarity | `helper_m4_correction_subplan_smoke.py` | | F3 | `Cleanup Test Environment` reads env var instead of using `${SUITE_HOME}` | LOW | Correctness | `robot/common.resource` | | F4 | `contextlib.suppress(Exception)` too broad — swallows programming errors silently | LOW | Error Handling | `robot/helpers_common.py` | | F5 | `from __future__ import annotations` unnecessary | Nit | Style | `robot/helpers_common.py` | --- ## Positive Observations - The three-pronged root cause analysis (shared DB URL, racy shared directory cleanup, singleton leak in chained CLI helpers) is accurate and complete. Each cause is addressed with a targeted, minimal fix. - The composable keyword design (`Setup Test Environment` + optional `Setup Database Isolation` + convenience `Setup Test Environment With Database Isolation`) is the correct architectural choice. It preserves backward compatibility with 150+ existing suites while adding isolation only where needed, without requiring a global change. - The per-suite CLEVERAGENTS_HOME (`${TEMPDIR}/.cleveragents_${safe_suite}`) is a clean solution that avoids any shared mutable state between pabot workers. - `helpers_common.reset_global_state()` using `contextlib.suppress(ImportError)` for each optional import is the correct defensive pattern for a shared utility that may be loaded in contexts where not all application modules are on the path. - The `full_flow()` docstring in `helper_m4_correction_subplan_smoke.py` now clearly explains why `_reset_global_state()` is called between steps and why Step 3 (pure domain logic) does not need one. This is exactly the kind of non-obvious reasoning that deserves a comment. - `timeout=30s` on all 10 `Run Process` calls in `m4_e2e_verification.robot` and `timeout=60s` on the chained `full-flow` command are well-calibrated and consistent across the three `.robot` files. - `Remove Keyword And Ignore Error` is used correctly for the directory removal in `Cleanup Test Environment` — a failing cleanup should not mark an otherwise-passing test as failed. --- ## Verdict **Requesting changes** on F1 (buried import in all three helper wrappers — explicit CONTRIBUTING.md violation). F2–F5 are lower-priority but should be addressed in the same commit to avoid accumulating small debt. The core fix is technically sound and the composable keyword design is the right architectural approach.
brent.edwards force-pushed fix/m4-validation-race-condition from 29b924ac6e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 2m33s
CI / docker (pull_request) Successful in 14s
CI / integration_tests (pull_request) Successful in 4m14s
CI / coverage (pull_request) Successful in 4m30s
CI / benchmark-regression (pull_request) Successful in 30m3s
to 85ea704752
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 18s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 57s
CI / integration_tests (pull_request) Successful in 3m14s
CI / unit_tests (pull_request) Successful in 3m20s
CI / docker (pull_request) Successful in 10s
CI / coverage (pull_request) Successful in 5m25s
CI / benchmark-regression (pull_request) Successful in 30m42s
2026-03-09 18:45:47 +00:00
Compare
Author
Member

Disposition of Aditya review (comment #56643) — commit 85ea7047

Squashed into single commit rebased onto master (a808c395). All findings addressed.

Fixed

ID Severity Finding Resolution
F1 HIGH Buried from helpers_common import inside _reset_global_state() function body in 3 helper files Fixed — moved from helpers_common import reset_global_state to module-level import block in all three files. Removed the thin _reset_global_state() wrapper entirely; call sites now invoke reset_global_state() directly.
F2 LOW Double reset before full_flow() Step 1 is redundant Fixed — removed the reset_global_state() call at the top of full_flow(). The __main__ dispatcher already resets before calling any command function. Added clarifying comment to __main__ block.
F3 LOW Cleanup Test Environment re-reads CLEVERAGENTS_HOME from env instead of using ${SUITE_HOME} Fixed — cleanup now uses ${SUITE_HOME} directly, eliminating the Get Environment Variable call and the null-guard.
F4 LOW contextlib.suppress(Exception) too broad in reset_global_state() Fixed — replaced with try/except that logs the suppressed exception to sys.stderr via print(f"[helpers_common] engine.dispose() suppressed: {exc}", file=sys.stderr).
F5 Nit from __future__ import annotations unnecessary in helpers_common.py Fixed — removed.

Additional

  • Filled in PR body per PM request (#56516)
  • Squashed all prior commits (3 logical + 2 merge) into a single commit on master

Verification

Gate Result
nox -s lint All checks passed
nox -s typecheck 0 errors, 0 warnings
nox -s unit_tests 9,029 passed, 0 failed
Single commit 85ea7047 — no merge commits
Rebased onto master a808c395
## Disposition of Aditya review (comment #56643) — commit `85ea7047` Squashed into single commit rebased onto master (`a808c395`). All findings addressed. ### Fixed | ID | Severity | Finding | Resolution | |----|----------|---------|------------| | **F1** | HIGH | Buried `from helpers_common import` inside `_reset_global_state()` function body in 3 helper files | **Fixed** — moved `from helpers_common import reset_global_state` to module-level import block in all three files. Removed the thin `_reset_global_state()` wrapper entirely; call sites now invoke `reset_global_state()` directly. | | **F2** | LOW | Double reset before `full_flow()` Step 1 is redundant | **Fixed** — removed the `reset_global_state()` call at the top of `full_flow()`. The `__main__` dispatcher already resets before calling any command function. Added clarifying comment to `__main__` block. | | **F3** | LOW | `Cleanup Test Environment` re-reads `CLEVERAGENTS_HOME` from env instead of using `${SUITE_HOME}` | **Fixed** — cleanup now uses `${SUITE_HOME}` directly, eliminating the `Get Environment Variable` call and the null-guard. | | **F4** | LOW | `contextlib.suppress(Exception)` too broad in `reset_global_state()` | **Fixed** — replaced with try/except that logs the suppressed exception to `sys.stderr` via `print(f"[helpers_common] engine.dispose() suppressed: {exc}", file=sys.stderr)`. | | **F5** | Nit | `from __future__ import annotations` unnecessary in `helpers_common.py` | **Fixed** — removed. | ### Additional - Filled in PR body per PM request (#56516) - Squashed all prior commits (3 logical + 2 merge) into a single commit on master ### Verification | Gate | Result | |------|--------| | `nox -s lint` | All checks passed | | `nox -s typecheck` | 0 errors, 0 warnings | | `nox -s unit_tests` | 9,029 passed, 0 failed | | Single commit | `85ea7047` — no merge commits | | Rebased onto master | `a808c395` |
Owner

PM Status Check — Day 29

Author: @brent.edwards | Milestone: v3.3.0 (M4) | Issue: #563 | Reviews: REQUEST_CHANGES (Aditya, 5 findings)

Summary

This PR fixes the intermittent race condition in M4 validation integration tests (3 root causes: shared SQLite DB, shared CLEVERAGENTS_HOME, singleton leak). Aditya reviewed and found 5 findings including a HIGH-severity buried import violation.

Added State/In Review and Priority/Critical labels.

Review Status

Aditya's review identified:

  • F1 HIGH: Buried import violating CONTRIBUTING.md import order rules
  • F2-F5: Medium/Low findings on test coverage and style

Action Required

Who Action Deadline
@brent.edwards Address F1 (import order), F2-F5 Mar 10 EOD
@aditya Re-review after fixes Mar 11

Bug fix PRs are always top priority. This blocks M4 acceptance gate closure for #563.

## PM Status Check — Day 29 **Author**: @brent.edwards | **Milestone**: v3.3.0 (M4) | **Issue**: #563 | **Reviews**: REQUEST_CHANGES (Aditya, 5 findings) ### Summary This PR fixes the intermittent race condition in M4 validation integration tests (3 root causes: shared SQLite DB, shared CLEVERAGENTS_HOME, singleton leak). Aditya reviewed and found 5 findings including a HIGH-severity buried import violation. Added State/In Review and Priority/Critical labels. ### Review Status Aditya's review identified: - **F1 HIGH**: Buried import violating CONTRIBUTING.md import order rules - **F2-F5**: Medium/Low findings on test coverage and style ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @brent.edwards | Address F1 (import order), F2-F5 | **Mar 10 EOD** | | @aditya | Re-review after fixes | Mar 11 | Bug fix PRs are always top priority. This blocks M4 acceptance gate closure for #563.
Author
Member

Code Review — PR #619 fix/m4-validation-race-condition

Reviewed commit: 85ea7047 on base a808c395
Review type: Architecture review + Test review + Docs review
Verdict: REQUEST CHANGES — 4× P1, 4× P2, 3× P3


P1 — Must fix before merge

F1 · P1:must-fixCLEVERAGENTS_TESTING_USE_MOCK_AI=true set globally for ALL suites
robot/common.resource

The Setup Test Environment keyword now unconditionally sets CLEVERAGENTS_TESTING_USE_MOCK_AI=true as a global environment variable. This means every Robot suite that uses this keyword — including any future e2e suites that intentionally need real AI — will silently use mock AI. This is a correctness risk: if someone writes a suite that needs real AI responses, they'll get mocks with no warning. Consider making this opt-in via a keyword argument or a separate keyword.

F2 · P1:must-fix_COMMANDS: dict[str, object] causes # type: ignore[operator] in two helpers
robot/helper_m4_e2e_verification.py, robot/helper_m4_correction_subplan_smoke.py

The _COMMANDS dict is typed as dict[str, object] but its values are used as callables (invoked with ()). This forces # type: ignore[operator] suppressions which violate CONTRIBUTING.md's prohibition on inline # type: ignore comments. Either type the dict properly as dict[str, Callable[..., int]] or use Protocol to type the command functions.

F3 · P1:must-fix — Docs say reset is "defined identically in each helper" — wrong after this PR
docs/development/testing.md

The new "Parallel Execution Isolation" section states that reset_global_state() is "defined identically in each helper script." This is incorrect after this PR — the helpers now import reset_global_state from helpers_common.py rather than defining it locally. The docs must be updated to reflect the shared module pattern.

F4 · P1:must-fix — Docs template uses _reset_global_state() but actual function is reset_global_state()
docs/development/testing.md

The code template in the docs shows _reset_global_state() (with leading underscore, private convention) but the actual function in helpers_common.py is reset_global_state() (public, no underscore). This will confuse anyone following the template.


P2 — Should fix (follow-up within 3 days)

F5 · P2:should-fix — Lazy imports inside function body in helpers_common.py
robot/helpers_common.pyreset_global_state()

Imports wrapped in contextlib.suppress(ImportError) inside the function body. While there's a runtime justification (graceful degradation if modules aren't installed), CONTRIBUTING.md §Import Guidelines only exempts if TYPE_CHECKING: blocks. Consider moving these to module level with the contextlib.suppress guard, or add a comment citing the specific exception rationale.

F6 · P2:should-fixexcept Exception suppression in reset logic
robot/helpers_common.py

Broad except Exception catches in the reset function suppress all errors including unexpected ones (e.g., RuntimeError, OSError). This violates exception propagation best practices. At minimum, log the suppressed exception or narrow the catch to specific expected exceptions.

F7 · P2:should-fix — Other helper-based Robot suites not migrated to database isolation

Only three Robot suites (m4_e2e_verification, m4_correction_subplan_smoke, m3_decision_validation_smoke) are migrated to the new Setup Database Isolation pattern. Other suites using helpers (e.g., cli_plan_context_commands) are not migrated. If database isolation fixes the race condition, the un-migrated suites may still exhibit the same issue.

F8 · P2:should-fixCLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true set globally
robot/common.resource

Similar to F1, CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true is now set globally. This could mask migration bugs where auto-apply should NOT be enabled. Consider making this configurable per suite.


P3 — Nits (author discretion)

F9 · P3:nit — Extra blank lines in two helpers
robot/helper_m4_e2e_verification.py, robot/helper_m4_correction_subplan_smoke.py

There are extra blank lines between import groups that don't follow PEP 8 import grouping conventions. Minor formatting issue.

F10 · P3:nit — Removed default .cleveragents cleanup

The PR removes the default ~/.cleveragents directory cleanup from some test teardowns. If tests create artifacts in this directory, they may leak between runs. Verify this is intentional.

F11 · P3:nitnoqa: E402 comments on imports

Some imports have # noqa: E402 to suppress the "module level import not at top of file" linting error. These are correctly applied since the imports follow sys.path manipulation. Just noting for awareness.


Summary

The race condition fix via database isolation and shared helpers_common.py module is architecturally sound and addresses a real parallel execution problem. The timeout=30s addition to Robot Run Process calls and the composable keyword design in common.resource are good improvements. However, the four P1 findings need attention: the global mock AI/auto-migration settings could silently affect future suites (F1, F8), the # type: ignore suppressions violate project policy (F2), and the documentation inaccuracies (F3, F4) will mislead contributors following the template.

## Code Review — PR #619 `fix/m4-validation-race-condition` **Reviewed commit:** `85ea7047` on base `a808c395` **Review type:** Architecture review + Test review + Docs review **Verdict:** REQUEST CHANGES — 4× P1, 4× P2, 3× P3 --- ### P1 — Must fix before merge **F1 · `P1:must-fix` — `CLEVERAGENTS_TESTING_USE_MOCK_AI=true` set globally for ALL suites** `robot/common.resource` The `Setup Test Environment` keyword now unconditionally sets `CLEVERAGENTS_TESTING_USE_MOCK_AI=true` as a global environment variable. This means **every** Robot suite that uses this keyword — including any future e2e suites that intentionally need real AI — will silently use mock AI. This is a correctness risk: if someone writes a suite that needs real AI responses, they'll get mocks with no warning. Consider making this opt-in via a keyword argument or a separate keyword. **F2 · `P1:must-fix` — `_COMMANDS: dict[str, object]` causes `# type: ignore[operator]` in two helpers** `robot/helper_m4_e2e_verification.py`, `robot/helper_m4_correction_subplan_smoke.py` The `_COMMANDS` dict is typed as `dict[str, object]` but its values are used as callables (invoked with `()`). This forces `# type: ignore[operator]` suppressions which violate CONTRIBUTING.md's prohibition on inline `# type: ignore` comments. Either type the dict properly as `dict[str, Callable[..., int]]` or use `Protocol` to type the command functions. **F3 · `P1:must-fix` — Docs say reset is "defined identically in each helper" — wrong after this PR** `docs/development/testing.md` The new "Parallel Execution Isolation" section states that `reset_global_state()` is "defined identically in each helper script." This is incorrect after this PR — the helpers now **import** `reset_global_state` from `helpers_common.py` rather than defining it locally. The docs must be updated to reflect the shared module pattern. **F4 · `P1:must-fix` — Docs template uses `_reset_global_state()` but actual function is `reset_global_state()`** `docs/development/testing.md` The code template in the docs shows `_reset_global_state()` (with leading underscore, private convention) but the actual function in `helpers_common.py` is `reset_global_state()` (public, no underscore). This will confuse anyone following the template. --- ### P2 — Should fix (follow-up within 3 days) **F5 · `P2:should-fix` — Lazy imports inside function body in `helpers_common.py`** `robot/helpers_common.py` — `reset_global_state()` Imports wrapped in `contextlib.suppress(ImportError)` inside the function body. While there's a runtime justification (graceful degradation if modules aren't installed), CONTRIBUTING.md §Import Guidelines only exempts `if TYPE_CHECKING:` blocks. Consider moving these to module level with the `contextlib.suppress` guard, or add a comment citing the specific exception rationale. **F6 · `P2:should-fix` — `except Exception` suppression in reset logic** `robot/helpers_common.py` Broad `except Exception` catches in the reset function suppress all errors including unexpected ones (e.g., `RuntimeError`, `OSError`). This violates exception propagation best practices. At minimum, log the suppressed exception or narrow the catch to specific expected exceptions. **F7 · `P2:should-fix` — Other helper-based Robot suites not migrated to database isolation** Only three Robot suites (`m4_e2e_verification`, `m4_correction_subplan_smoke`, `m3_decision_validation_smoke`) are migrated to the new `Setup Database Isolation` pattern. Other suites using helpers (e.g., `cli_plan_context_commands`) are not migrated. If database isolation fixes the race condition, the un-migrated suites may still exhibit the same issue. **F8 · `P2:should-fix` — `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true` set globally** `robot/common.resource` Similar to F1, `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=true` is now set globally. This could mask migration bugs where auto-apply should NOT be enabled. Consider making this configurable per suite. --- ### P3 — Nits (author discretion) **F9 · `P3:nit` — Extra blank lines in two helpers** `robot/helper_m4_e2e_verification.py`, `robot/helper_m4_correction_subplan_smoke.py` There are extra blank lines between import groups that don't follow PEP 8 import grouping conventions. Minor formatting issue. **F10 · `P3:nit` — Removed default `.cleveragents` cleanup** The PR removes the default `~/.cleveragents` directory cleanup from some test teardowns. If tests create artifacts in this directory, they may leak between runs. Verify this is intentional. **F11 · `P3:nit` — `noqa: E402` comments on imports** Some imports have `# noqa: E402` to suppress the "module level import not at top of file" linting error. These are correctly applied since the imports follow `sys.path` manipulation. Just noting for awareness. --- ### Summary The race condition fix via database isolation and shared `helpers_common.py` module is architecturally sound and addresses a real parallel execution problem. The `timeout=30s` addition to Robot `Run Process` calls and the composable keyword design in `common.resource` are good improvements. However, the four P1 findings need attention: the global mock AI/auto-migration settings could silently affect future suites (F1, F8), the `# type: ignore` suppressions violate project policy (F2), and the documentation inaccuracies (F3, F4) will mislead contributors following the template.
Owner

PM Compliance Audit — CONTRIBUTING.md Checklist

# Requirement Status
1 Detailed PR description PASS — Good summary with root cause analysis.
2 Issue reference with closing keyword FAIL — Body mentions #563 but lacks Closes #563 or Fixes #563. Issue will not auto-close on merge.
3 Dependency link (PR blocks issue) FAIL — No blocking link to #563.
4 CHANGELOG.md updated FAIL — No CHANGELOG.md in diff.
5 Milestone assigned PASS — v3.3.0
6 Type label PASS — Type/Bug
7 Assignee PASS — @brent.edwards (set today)
8 Mergeable PASS
9 Reviews OPEN — REQUEST_CHANGES from @hurui200320. Aditya's review findings addressed (commit 85ea7047).

Action Required

@brent.edwards — 3 items to fix:

  1. Add Fixes #563 to the PR body (edit the description)
  2. Add CHANGELOG.md entry for the race condition fix
  3. Add #563 as a Forgejo dependency (PR blocks #563)
## PM Compliance Audit — CONTRIBUTING.md Checklist | # | Requirement | Status | |---|------------|--------| | 1 | Detailed PR description | PASS — Good summary with root cause analysis. | | 2 | Issue reference with closing keyword | **FAIL** — Body mentions #563 but lacks `Closes #563` or `Fixes #563`. Issue will not auto-close on merge. | | 3 | Dependency link (PR blocks issue) | **FAIL** — No blocking link to #563. | | 4 | CHANGELOG.md updated | **FAIL** — No CHANGELOG.md in diff. | | 5 | Milestone assigned | PASS — v3.3.0 | | 6 | Type label | PASS — Type/Bug | | 7 | Assignee | PASS — @brent.edwards (set today) | | 8 | Mergeable | PASS | | 9 | Reviews | **OPEN** — REQUEST_CHANGES from @hurui200320. Aditya's review findings addressed (commit `85ea7047`). | ### Action Required @brent.edwards — 3 items to fix: 1. **Add** `Fixes #563` to the PR body (edit the description) 2. **Add CHANGELOG.md** entry for the race condition fix 3. **Add** #563 as a Forgejo dependency (PR blocks #563)
brent.edwards force-pushed fix/m4-validation-race-condition from 85ea704752
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 18s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 57s
CI / integration_tests (pull_request) Successful in 3m14s
CI / unit_tests (pull_request) Successful in 3m20s
CI / docker (pull_request) Successful in 10s
CI / coverage (pull_request) Successful in 5m25s
CI / benchmark-regression (pull_request) Successful in 30m42s
to aee4315c31
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 38s
CI / unit_tests (pull_request) Successful in 3m45s
CI / docker (pull_request) Successful in 38s
CI / integration_tests (pull_request) Successful in 4m34s
CI / coverage (pull_request) Successful in 5m15s
CI / benchmark-regression (pull_request) Successful in 30m55s
2026-03-09 22:30:13 +00:00
Compare
Author
Member

Review Fixes — aee4315c

Rebased onto latest master (83f2f3a0). All findings addressed in a single squashed commit.

Disposition of self-review (comment #56919)

ID Severity Finding Resolution
F1 P1 CLEVERAGENTS_TESTING_USE_MOCK_AI=true set globally FixedSetup Test Environment now accepts optional mock_ai=${TRUE} and auto_apply_migrations=${TRUE} arguments. Default is ${TRUE} for backward compatibility; suites can pass ${FALSE} to opt out.
F2 P1 # type: ignore[operator] in two helpers Fixed — Changed _COMMANDS: dict[str, object] to dict[str, Callable[[], None]] in both helper_m4_correction_subplan_smoke.py and helper_m3_decision_validation_smoke.py. Removed both # type: ignore comments.
F3 P1 Docs say "defined identically in each helper" Fixed — Updated testing.md to say "centralised in robot/helpers_common.py; each helper imports and delegates to helpers_common.reset_global_state()."
F4 P1 Docs template uses _reset_global_state() Fixed — Template now shows from helpers_common import reset_global_state delegation pattern.
F5 P2 Lazy imports in helpers_common.py Fixed — Added docstring comment explaining the contextlib.suppress(ImportError) rationale as an intentional exception per CONTRIBUTING.md §Import Guidelines.
F8 P2 Global CLEVERAGENTS_AUTO_APPLY_MIGRATIONS Fixed — Same treatment as F1: keyword argument with default ${TRUE}.

Disposition of hurui200320 review (review #2053)

ID Severity Finding Resolution
F1 P2 Missing reset_provider_registry() in helpers_common.py Fixed — Added reset_provider_registry() call after reset_container().
F2 P2 Missing Suite Teardown in cli_plan_context_commands.robot Fixed — Added Suite Teardown Cleanup Test Environment.
F3 P2 Docs template contradicts code Fixed — Same as self-review F3/F4.
F4 P1 Empty PR body Fixed — Filled in per CONTRIBUTING template.
F5 P1 Commit footer Refs: #619 instead of #563 Fixed — Commit footer now says Fixes: #563.

Disposition of Aditya review (comment #56643) — previously addressed

All 5 findings (F1–F5) remain addressed from previous commit.

Disposition of PM compliance audit (comment #57050)

Item Resolution
Fixes #563 in PR body Fixed
CHANGELOG.md entry Fixed
Dependency link PR body now references Fixes #563 for auto-close

Quality gates

Gate Result
nox -s lint Pass
nox -s typecheck 0 errors, 0 warnings
Single commit aee4315c
Rebased onto master 83f2f3a0
## Review Fixes — `aee4315c` Rebased onto latest master (`83f2f3a0`). All findings addressed in a single squashed commit. ### Disposition of self-review (comment #56919) | ID | Severity | Finding | Resolution | |----|----------|---------|------------| | **F1** | P1 | `CLEVERAGENTS_TESTING_USE_MOCK_AI=true` set globally | **Fixed** — `Setup Test Environment` now accepts optional `mock_ai=${TRUE}` and `auto_apply_migrations=${TRUE}` arguments. Default is `${TRUE}` for backward compatibility; suites can pass `${FALSE}` to opt out. | | **F2** | P1 | `# type: ignore[operator]` in two helpers | **Fixed** — Changed `_COMMANDS: dict[str, object]` to `dict[str, Callable[[], None]]` in both `helper_m4_correction_subplan_smoke.py` and `helper_m3_decision_validation_smoke.py`. Removed both `# type: ignore` comments. | | **F3** | P1 | Docs say "defined identically in each helper" | **Fixed** — Updated `testing.md` to say "centralised in `robot/helpers_common.py`; each helper imports and delegates to `helpers_common.reset_global_state()`." | | **F4** | P1 | Docs template uses `_reset_global_state()` | **Fixed** — Template now shows `from helpers_common import reset_global_state` delegation pattern. | | **F5** | P2 | Lazy imports in `helpers_common.py` | **Fixed** — Added docstring comment explaining the `contextlib.suppress(ImportError)` rationale as an intentional exception per CONTRIBUTING.md §Import Guidelines. | | **F8** | P2 | Global `CLEVERAGENTS_AUTO_APPLY_MIGRATIONS` | **Fixed** — Same treatment as F1: keyword argument with default `${TRUE}`. | ### Disposition of hurui200320 review (review #2053) | ID | Severity | Finding | Resolution | |----|----------|---------|------------| | **F1** | P2 | Missing `reset_provider_registry()` in `helpers_common.py` | **Fixed** — Added `reset_provider_registry()` call after `reset_container()`. | | **F2** | P2 | Missing `Suite Teardown` in `cli_plan_context_commands.robot` | **Fixed** — Added `Suite Teardown Cleanup Test Environment`. | | **F3** | P2 | Docs template contradicts code | **Fixed** — Same as self-review F3/F4. | | **F4** | P1 | Empty PR body | **Fixed** — Filled in per CONTRIBUTING template. | | **F5** | P1 | Commit footer `Refs: #619` instead of `#563` | **Fixed** — Commit footer now says `Fixes: #563`. | ### Disposition of Aditya review (comment #56643) — previously addressed All 5 findings (F1–F5) remain addressed from previous commit. ### Disposition of PM compliance audit (comment #57050) | Item | Resolution | |------|------------| | `Fixes #563` in PR body | **Fixed** | | CHANGELOG.md entry | **Fixed** | | Dependency link | PR body now references `Fixes #563` for auto-close | ### Quality gates | Gate | Result | |------|--------| | `nox -s lint` | Pass | | `nox -s typecheck` | 0 errors, 0 warnings | | Single commit | `aee4315c` | | Rebased onto master | `83f2f3a0` |
brent.edwards left a comment

Code Review — PR #619 fix(test): resolve race condition in M4 validation integration test

Reviewed commit: aee4315c | Base: master (83f2f3a0)
Scope: Robot Framework test infrastructure, helpers, documentation
Checklists applied: Test review, CI review, Docs review


Summary

This PR correctly identifies and fixes a three-pronged race condition in M4 validation integration tests under pabot parallel execution: (1) shared default sqlite:///cleveragents.db, (2) incomplete per-suite CLEVERAGENTS_HOME isolation, and (3) leaked singleton state across chained CLI invocations. The fix introduces a centralised reset_global_state() in robot/helpers_common.py, composable Setup Database Isolation / Setup Test Environment With Database Isolation keywords in common.resource, timeout=30s on Run Process calls, and clear documentation in testing.md. The approach of making database isolation opt-in is the correct design choice.

The type annotation improvement (dict[str, object]dict[str, Callable[[], None]]) that eliminates # type: ignore is a welcome incidental cleanup.


Findings

ID Severity Category File:Line Description Recommended Fix
F1 P2:should-fix Maintenance robot/helpers_common.py:34 Settings._instance = None directly mutates a private attribute. Unlike the other singletons in this function which use public reset APIs (reset_container(), reset_provider_registry()), Settings has no reset() classmethod. The same pattern exists in features/environment.py:335 on master, so this is consistent with existing code. Consider adding a Settings.reset() classmethod as a follow-up, or add a # TODO: comment citing the coupling. Not blocking since it matches the established project pattern.
F2 P2:should-fix Scope robot/common.resource Only the M4/M3 suites use the new Setup Test Environment With Database Isolation keyword. Other helper-based suites (persistence_lifecycle.robot, cli_lifecycle.robot, etc.) still use plain Setup Test Environment and remain vulnerable to the same SQLite contention under pabot. File a follow-up issue to migrate remaining suites. At minimum, add a note in testing.md that existing suites will be migrated.
F3 P2:should-fix Robustness robot/common.resource:82 Cleanup Test Environment references ${SUITE_HOME}. If Suite Setup fails before Set Suite Variable ${SUITE_HOME}, the teardown will get a Robot Framework variable resolution error. Assign a default value at *** Variables *** level: ${SUITE_HOME} ${EMPTY}, or guard the Remove Directory with Run Keyword If $SUITE_HOME is not None.
F4 P2:should-fix Documentation CHANGELOG.md:5 Entry says "per-suite temp directories" as if this is new, but per-suite CLEVERAGENTS_HOME temp directories already existed on master. What's new is per-suite database URL isolation. Reword to "per-suite database URL isolation" or "per-suite SQLite database paths".
F5 P3:nit Consistency robot/m4_e2e_verification.robot timeout=30s was added to Run Process calls here, but the Run Process calls in m3_decision_validation_smoke.robot and m4_correction_subplan_smoke.robot (also modified) do not have timeout= added. Consider adding timeout=30s to those suites too for consistency, or note as follow-up.
F6 P3:nit Safety robot/common.resource:57 Setup Database Isolation assumes Setup Test Environment was called first (to set ${SUITE_HOME}). The convenience keyword handles ordering, but a direct call would fail with a confusing error. Consider a precondition check or document the dependency. Low priority since the convenience keyword exists.

Security Check

  • No credentials, secrets, or API keys in any changed file.
  • No eval(), exec(), or shell injection vectors.
  • contextlib.suppress(ImportError) scoped correctly.

Verdict

Severity Count
P0 0
P1 0
P2 4
P3 2

APPROVE with comments. No P0/P1 findings. The four P2 items (established pattern coupling, incomplete migration, teardown robustness, CHANGELOG wording) can all be addressed in a follow-up within 3 days per project policy. The core race condition fix is correct and well-implemented.

## Code Review — PR #619 `fix(test): resolve race condition in M4 validation integration test` **Reviewed commit:** `aee4315c` | **Base:** `master` (`83f2f3a0`) **Scope:** Robot Framework test infrastructure, helpers, documentation **Checklists applied:** Test review, CI review, Docs review --- ### Summary This PR correctly identifies and fixes a three-pronged race condition in M4 validation integration tests under `pabot` parallel execution: (1) shared default `sqlite:///cleveragents.db`, (2) incomplete per-suite `CLEVERAGENTS_HOME` isolation, and (3) leaked singleton state across chained CLI invocations. The fix introduces a centralised `reset_global_state()` in `robot/helpers_common.py`, composable `Setup Database Isolation` / `Setup Test Environment With Database Isolation` keywords in `common.resource`, `timeout=30s` on `Run Process` calls, and clear documentation in `testing.md`. The approach of making database isolation opt-in is the correct design choice. The type annotation improvement (`dict[str, object]` → `dict[str, Callable[[], None]]`) that eliminates `# type: ignore` is a welcome incidental cleanup. --- ### Findings | ID | Severity | Category | File:Line | Description | Recommended Fix | |----|----------|----------|-----------|-------------|-----------------| | F1 | **P2:should-fix** | Maintenance | `robot/helpers_common.py:34` | `Settings._instance = None` directly mutates a private attribute. Unlike the other singletons in this function which use public reset APIs (`reset_container()`, `reset_provider_registry()`), `Settings` has no `reset()` classmethod. The same pattern exists in `features/environment.py:335` on master, so this is consistent with existing code. | Consider adding a `Settings.reset()` classmethod as a follow-up, or add a `# TODO:` comment citing the coupling. Not blocking since it matches the established project pattern. | | F2 | **P2:should-fix** | Scope | `robot/common.resource` | Only the M4/M3 suites use the new `Setup Test Environment With Database Isolation` keyword. Other helper-based suites (`persistence_lifecycle.robot`, `cli_lifecycle.robot`, etc.) still use plain `Setup Test Environment` and remain vulnerable to the same SQLite contention under `pabot`. | File a follow-up issue to migrate remaining suites. At minimum, add a note in `testing.md` that existing suites will be migrated. | | F3 | **P2:should-fix** | Robustness | `robot/common.resource:82` | `Cleanup Test Environment` references `${SUITE_HOME}`. If `Suite Setup` fails before `Set Suite Variable ${SUITE_HOME}`, the teardown will get a Robot Framework variable resolution error. | Assign a default value at `*** Variables ***` level: `${SUITE_HOME} ${EMPTY}`, or guard the `Remove Directory` with `Run Keyword If $SUITE_HOME is not None`. | | F4 | **P2:should-fix** | Documentation | `CHANGELOG.md:5` | Entry says "per-suite temp directories" as if this is new, but per-suite `CLEVERAGENTS_HOME` temp directories already existed on master. What's new is per-suite *database URL* isolation. | Reword to "per-suite database URL isolation" or "per-suite SQLite database paths". | | F5 | **P3:nit** | Consistency | `robot/m4_e2e_verification.robot` | `timeout=30s` was added to `Run Process` calls here, but the `Run Process` calls in `m3_decision_validation_smoke.robot` and `m4_correction_subplan_smoke.robot` (also modified) do not have `timeout=` added. | Consider adding `timeout=30s` to those suites too for consistency, or note as follow-up. | | F6 | **P3:nit** | Safety | `robot/common.resource:57` | `Setup Database Isolation` assumes `Setup Test Environment` was called first (to set `${SUITE_HOME}`). The convenience keyword handles ordering, but a direct call would fail with a confusing error. | Consider a precondition check or document the dependency. Low priority since the convenience keyword exists. | ### Security Check - No credentials, secrets, or API keys in any changed file. - No `eval()`, `exec()`, or shell injection vectors. - `contextlib.suppress(ImportError)` scoped correctly. ### Verdict | Severity | Count | |----------|-------| | P0 | 0 | | P1 | 0 | | P2 | 4 | | P3 | 2 | **APPROVE with comments.** No P0/P1 findings. The four P2 items (established pattern coupling, incomplete migration, teardown robustness, CHANGELOG wording) can all be addressed in a follow-up within 3 days per project policy. The core race condition fix is correct and well-implemented.
hurui200320 approved these changes 2026-03-10 09:40:44 +00:00
Dismissed
hurui200320 left a comment

Review (Pre-External Review Check): PR #619fix(test): resolve race condition in M4 validation integration test

Reviewer: Rui Hu (@hurui200320) | Commit: aee4315c | Base: master (83f2f3a0)
Scope: Code review, test review, docs review, security check


Previous Review Status

All 5 findings from my previous review (#2053) are verified as resolved in the latest code:

ID Finding Status
F1 Missing reset_provider_registry() in helpers_common.py Resolvedrobot/helpers_common.py:44-48
F2 Missing Suite Teardown in cli_plan_context_commands.robot Resolvedrobot/cli_plan_context_commands.robot:9
F3 Docs template contradicts code Resolveddocs/development/testing.md:211-213 now correctly describes delegation pattern
F4 Empty PR body Resolved — PR body populated per CONTRIBUTING template with Fixes #563
F5 Commit footer references PR instead of issue Resolved — Footer now says Fixes: #563

Aditya's 5 findings (#56643) — all resolved. Brent's self-review findings (#56919) — all resolved. PM compliance audit (#57050) — all resolved.


Technical Assessment

The core race condition fix is technically sound and well-designed. The three-pronged root cause analysis (shared SQLite DB URL, shared CLEVERAGENTS_HOME, singleton leak in chained CLI invocations) is correct and each cause is addressed with a targeted, minimal fix.

Strengths:

  • Composable keyword design (Setup Test Environment + optional Setup Database Isolation) preserves backward compatibility with 170+ existing suites.
  • reset_global_state() properly centralised in robot/helpers_common.py with all 4 singletons: Settings._instance, reset_container(), reset_provider_registry(), MEMORY_ENGINES. Reset order matches the BDD harness (features/environment.py:331-389).
  • from helpers_common import reset_global_state is at module level in all 3 helpers (no buried imports).
  • _COMMANDS typed as dict[str, Callable[[], None]] — eliminates all # type: ignore[operator] in modified files.
  • All Run Process calls have appropriate timeouts (30s standard, 60s for full-flow).
  • full_flow() chaining correctly resets between Steps 1-2 and skips reset before Step 3 (pure domain logic), with clear docstring explaining the rationale.
  • Cleanup Test Environment uses ${SUITE_HOME} directly instead of re-reading env var.
  • engine.dispose() exceptions are now logged to stderr instead of silently suppressed.

Security check: No credentials, secrets, API keys, eval(), exec(), or shell injection vectors in any changed file. contextlib.suppress(ImportError) correctly scoped.


New Findings

R1 — ${SUITE_HOME} not declared with default; teardown fragile if setup fails [P2:should-fix]

File: robot/common.resource:13-15 (Variables section) and robot/common.resource:95 (Cleanup)

${SUITE_HOME} is set dynamically by Setup Test Environment at line 45 (Set Suite Variable). It is NOT declared in the *** Variables *** section. If Suite Setup fails before reaching line 45 (e.g., Create Directory fails on a read-only filesystem), Cleanup Test Environment will fail at line 95 with a VariableError because ${SUITE_HOME} is undefined. In Robot Framework, variable resolution occurs before Run Keyword And Ignore Error executes, so the error is NOT caught.

Impact: Suite Setup failure produces a secondary teardown error that obscures the real failure. Low probability in normal operation, but hampers debugging when it does occur.

Recommendation: Add a default in the *** Variables *** section:

*** Variables ***
${WORKSPACE}      ${CURDIR}/..
${SRC_DIR}        ${WORKSPACE}/src/cleveragents
${SUITE_HOME}     ${EMPTY}

And guard the cleanup:

Run Keyword If    '${SUITE_HOME}' != ''    Run Keyword And Ignore Error
...    Remove Directory    ${SUITE_HOME}    recursive=True

R2 — Only 3 of 170+ suites migrated to database isolation [P3:info]

File: robot/common.resource

Confirmed via full audit: only m3_decision_validation_smoke.robot, m4_correction_subplan_smoke.robot, and m4_e2e_verification.robot use Setup Test Environment With Database Isolation. The remaining ~167 suites using Setup Test Environment are not migrated. This is by design (only helper-based suites that invoke Run Process need DB isolation), but suites like persistence_lifecycle.robot, cli_lifecycle.robot, and other helper-based suites may still be vulnerable to the same SQLite contention under pabot.

Recommendation: File a follow-up issue to audit which remaining suites invoke Python helpers via Run Process and migrate those to use DB isolation.

R3 — CHANGELOG entry wording slightly inaccurate [P3:nit]

File: CHANGELOG.md:9

Entry says "per-suite temp directories" but per-suite CLEVERAGENTS_HOME temp directories already existed on master (the old Setup Test Environment already created them). What's new in this PR is per-suite database URL isolation. Consider rewording to "per-suite database URL isolation" for accuracy.


Out-of-Scope Observations (Not Blocking)

  • 47 # type: ignore comments exist across 24 other robot helper files (pre-existing, not introduced by this PR). This PR removed them from the 3 modified helpers, which is a good incremental improvement.
  • provider_detection_smoke.robot is missing Suite Teardown (pre-existing bug, not introduced by this PR).
  • features/environment.py uses contextlib.suppress(Exception) for engine.dispose() (silent), while helpers_common.py now uses try/except with logging (improved). Consider backporting the logging pattern to environment.py in a follow-up.

Severity Summary

# Finding Severity Category File
R1 ${SUITE_HOME} teardown fragile if setup fails P2 Robustness robot/common.resource
R2 Only 3 of 170+ suites migrated to DB isolation P3 Scope robot/common.resource
R3 CHANGELOG wording slightly inaccurate P3 Documentation CHANGELOG.md

Verdict: APPROVE

The fix is correct, well-documented, and all blocking issues from previous reviews are resolved. The P2 item (R1, ${SUITE_HOME} default) and the P3 items (R2 migration scope, R3 CHANGELOG wording) can be addressed in a follow-up PR within 3 days per project policy. The core race condition fix does not introduce any regressions or new bugs.

## Review (Pre-External Review Check): PR #619 — `fix(test): resolve race condition in M4 validation integration test` **Reviewer:** Rui Hu (`@hurui200320`) | **Commit:** `aee4315c` | **Base:** `master` (`83f2f3a0`) **Scope:** Code review, test review, docs review, security check --- ### Previous Review Status All 5 findings from my previous review (#2053) are verified as resolved in the latest code: | ID | Finding | Status | |----|---------|--------| | F1 | Missing `reset_provider_registry()` in `helpers_common.py` | **Resolved** — `robot/helpers_common.py:44-48` | | F2 | Missing `Suite Teardown` in `cli_plan_context_commands.robot` | **Resolved** — `robot/cli_plan_context_commands.robot:9` | | F3 | Docs template contradicts code | **Resolved** — `docs/development/testing.md:211-213` now correctly describes delegation pattern | | F4 | Empty PR body | **Resolved** — PR body populated per CONTRIBUTING template with `Fixes #563` | | F5 | Commit footer references PR instead of issue | **Resolved** — Footer now says `Fixes: #563` | Aditya's 5 findings (#56643) — all resolved. Brent's self-review findings (#56919) — all resolved. PM compliance audit (#57050) — all resolved. --- ### Technical Assessment The core race condition fix is **technically sound and well-designed**. The three-pronged root cause analysis (shared SQLite DB URL, shared `CLEVERAGENTS_HOME`, singleton leak in chained CLI invocations) is correct and each cause is addressed with a targeted, minimal fix. **Strengths:** - Composable keyword design (`Setup Test Environment` + optional `Setup Database Isolation`) preserves backward compatibility with 170+ existing suites. - `reset_global_state()` properly centralised in `robot/helpers_common.py` with all 4 singletons: `Settings._instance`, `reset_container()`, `reset_provider_registry()`, `MEMORY_ENGINES`. Reset order matches the BDD harness (`features/environment.py:331-389`). - `from helpers_common import reset_global_state` is at module level in all 3 helpers (no buried imports). - `_COMMANDS` typed as `dict[str, Callable[[], None]]` — eliminates all `# type: ignore[operator]` in modified files. - All `Run Process` calls have appropriate timeouts (30s standard, 60s for `full-flow`). - `full_flow()` chaining correctly resets between Steps 1-2 and skips reset before Step 3 (pure domain logic), with clear docstring explaining the rationale. - `Cleanup Test Environment` uses `${SUITE_HOME}` directly instead of re-reading env var. - `engine.dispose()` exceptions are now logged to stderr instead of silently suppressed. **Security check:** No credentials, secrets, API keys, `eval()`, `exec()`, or shell injection vectors in any changed file. `contextlib.suppress(ImportError)` correctly scoped. --- ### New Findings #### R1 — `${SUITE_HOME}` not declared with default; teardown fragile if setup fails [P2:should-fix] **File:** `robot/common.resource:13-15` (Variables section) and `robot/common.resource:95` (Cleanup) `${SUITE_HOME}` is set dynamically by `Setup Test Environment` at line 45 (`Set Suite Variable`). It is NOT declared in the `*** Variables ***` section. If `Suite Setup` fails before reaching line 45 (e.g., `Create Directory` fails on a read-only filesystem), `Cleanup Test Environment` will fail at line 95 with a `VariableError` because `${SUITE_HOME}` is undefined. In Robot Framework, variable resolution occurs before `Run Keyword And Ignore Error` executes, so the error is NOT caught. **Impact:** Suite Setup failure produces a secondary teardown error that obscures the real failure. Low probability in normal operation, but hampers debugging when it does occur. **Recommendation:** Add a default in the `*** Variables ***` section: ```robot *** Variables *** ${WORKSPACE} ${CURDIR}/.. ${SRC_DIR} ${WORKSPACE}/src/cleveragents ${SUITE_HOME} ${EMPTY} ``` And guard the cleanup: ```robot Run Keyword If '${SUITE_HOME}' != '' Run Keyword And Ignore Error ... Remove Directory ${SUITE_HOME} recursive=True ``` #### R2 — Only 3 of 170+ suites migrated to database isolation [P3:info] **File:** `robot/common.resource` Confirmed via full audit: only `m3_decision_validation_smoke.robot`, `m4_correction_subplan_smoke.robot`, and `m4_e2e_verification.robot` use `Setup Test Environment With Database Isolation`. The remaining ~167 suites using `Setup Test Environment` are not migrated. This is by design (only helper-based suites that invoke `Run Process` need DB isolation), but suites like `persistence_lifecycle.robot`, `cli_lifecycle.robot`, and other helper-based suites may still be vulnerable to the same SQLite contention under pabot. **Recommendation:** File a follow-up issue to audit which remaining suites invoke Python helpers via `Run Process` and migrate those to use DB isolation. #### R3 — CHANGELOG entry wording slightly inaccurate [P3:nit] **File:** `CHANGELOG.md:9` Entry says "per-suite temp directories" but per-suite `CLEVERAGENTS_HOME` temp directories already existed on master (the old `Setup Test Environment` already created them). What's new in this PR is per-suite **database URL** isolation. Consider rewording to "per-suite database URL isolation" for accuracy. --- ### Out-of-Scope Observations (Not Blocking) - 47 `# type: ignore` comments exist across 24 other robot helper files (pre-existing, not introduced by this PR). This PR removed them from the 3 modified helpers, which is a good incremental improvement. - `provider_detection_smoke.robot` is missing `Suite Teardown` (pre-existing bug, not introduced by this PR). - `features/environment.py` uses `contextlib.suppress(Exception)` for `engine.dispose()` (silent), while `helpers_common.py` now uses `try/except` with logging (improved). Consider backporting the logging pattern to `environment.py` in a follow-up. --- ### Severity Summary | # | Finding | Severity | Category | File | |---|---------|----------|----------|------| | R1 | `${SUITE_HOME}` teardown fragile if setup fails | P2 | Robustness | `robot/common.resource` | | R2 | Only 3 of 170+ suites migrated to DB isolation | P3 | Scope | `robot/common.resource` | | R3 | CHANGELOG wording slightly inaccurate | P3 | Documentation | `CHANGELOG.md` | --- ### Verdict: **APPROVE** The fix is correct, well-documented, and all blocking issues from previous reviews are resolved. The P2 item (R1, `${SUITE_HOME}` default) and the P3 items (R2 migration scope, R3 CHANGELOG wording) can be addressed in a follow-up PR within 3 days per project policy. The core race condition fix does not introduce any regressions or new bugs.
brent.edwards force-pushed fix/m4-validation-race-condition from aee4315c31
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 38s
CI / unit_tests (pull_request) Successful in 3m45s
CI / docker (pull_request) Successful in 38s
CI / integration_tests (pull_request) Successful in 4m34s
CI / coverage (pull_request) Successful in 5m15s
CI / benchmark-regression (pull_request) Successful in 30m55s
to d88dad94f6
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 25s
CI / quality (pull_request) Successful in 25s
CI / build (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 1m9s
CI / security (pull_request) Successful in 1m18s
CI / integration_tests (pull_request) Successful in 3m43s
CI / unit_tests (pull_request) Successful in 4m8s
CI / docker (pull_request) Successful in 43s
CI / coverage (pull_request) Successful in 5m35s
CI / benchmark-regression (pull_request) Successful in 32m44s
2026-03-10 19:05:33 +00:00
Compare
brent.edwards dismissed hurui200320's review 2026-03-10 19:05:33 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-10 19:07:10 +00:00
brent.edwards deleted branch fix/m4-validation-race-condition 2026-03-10 19:13:23 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!619
No description provided.