fix(test): resolve race condition in M4 validation integration test #619
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!619
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/m4-validation-race-condition"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes the intermittent race condition in M4 validation integration tests when running under pabot. Three root causes identified and addressed:
Shared SQLite DB URL — pabot workers all fell back to
sqlite:///cleveragents.dbrelative to CWD. New composableSetup Database Isolationkeyword gives each suite a uniqueCLEVERAGENTS_DATABASE_URL.Racy shared directory cleanup — old
Setup Test Environmentused a shared temp directory. Now uses per-suite${TEMPDIR}/.cleveragents_${SuiteName}.Singleton leak in helpers —
reset_global_state()centralised inrobot/helpers_common.py, now also resetsreset_provider_registry().Changes
robot/common.resource— composable keyword redesign with optionalmock_aiandauto_apply_migrationsarguments for backward compatibilityrobot/helpers_common.py— new shared module withreset_global_state()robot/helper_m4_e2e_verification.py,helper_m4_correction_subplan_smoke.py,helper_m3_decision_validation_smoke.py— delegate tohelpers_common, fixed_COMMANDStyping todict[str, Callable[[], None]]robot/cli_plan_context_commands.robot— added missingSuite Teardownrobot/m4_e2e_verification.robot— addedtimeout=30sto all 10Run Processcallsdocs/development/testing.md— updated to reflecthelpers_commondelegation patternCHANGELOG.md— added entryQuality Gates
nox -s lintnox -s typecheckFixes #563
Self-Review: PR #619 — fix(test): resolve race condition in M4 validation integration test
Reviewer: Brent Edwards (primary for
robot/anddocs/)Review scope: Test review (Robot tests & helpers), Docs review
Note: Self-review — cannot formally approve own PR. Requesting secondary reviewer (Rui) for approval.
Nox Verification Matrix
nox -s lintnox -s typechecknox -s unit_testsnox -s integration_testsnox -s coverage_reportnox -s security_scannox -s dead_codenox -s complexitynox -s benchmarkRoot Cause & Fix Assessment
The three-pronged root cause analysis is correct and well-targeted:
Shared DB URL — pabot workers all fell back to
sqlite:///cleveragents.dbrelative to CWD. The composableSetup Database Isolationkeyword correctly isolates per-suite DB files insideCLEVERAGENTS_HOME.Racy shared directory cleanup — The old
Setup Test Environmentremoved${TEMPDIR}/.cleveragents, a directory potentially shared by all workers. The new keyword uses${TEMPDIR}/.cleveragents_${SuiteName}, providing per-suite directories.Singleton leak in helpers —
_reset_global_state()correctly resetsSettings._instance,reset_container(), andMEMORY_ENGINESbetween chained CLI invocations.The decision to split
Setup Test Environmentinto composable keywords (base + optional DB isolation) instead of always settingCLEVERAGENTS_DATABASE_URLwas the right call — it preserves the 150+ other suites that rely on the CLI determining its own DB path, while fixing the 3 helper-based suites that need explicit isolation.Findings
P2:should-fix—_reset_global_state()duplicated in 3 helper filesThe identical ~30-line function is copy-pasted in
helper_m4_e2e_verification.py:69-98,helper_m4_correction_subplan_smoke.py:46-69, andhelper_m3_decision_validation_smoke.py:38-61. A shared module (e.g.robot/helpers_common.py) would reduce maintenance burden — if the singleton pattern changes (e.g._instancerenamed, new global state added), all three files must be updated in lockstep.The documentation in
testing.md:216-218acknowledges this duplication as intentional, and the helpers are standalone scripts invoked viaRun Process, so a shared import is straightforward.Remediation: Extract to a shared module in a follow-up PR within 3 days.
P3:nit— testing.md says "≤2 workers by default" but actual default is CPU countLine 196 of
docs/development/testing.mdstates "≤2 workers by default". However,noxfile.py:17-25sets the default toos.sched_getaffinity(0)(CPU count), which is typically 4-16 on dev machines. Minor doc inaccuracy.P3:nit— RedundantCLEVERAGENTS_AUTO_APPLY_MIGRATIONSincli_plan_context_commands.robotSetup Test Environmentnow setsCLEVERAGENTS_AUTO_APPLY_MIGRATIONS=truefor all suites. Thecli_plan_context_commands.robotSuite Setup still hasAND Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS true, which is now redundant. Harmless but could be cleaned up.Pre-existing Issues (Not Introduced by This PR)
# type: ignore[operator]inhelper_m4_correction_subplan_smoke.py:491andhelper_m3_decision_validation_smoke.py:351— confirmed present onmaster. Caused by_COMMANDS: dict[str, object]typing; the M4 E2E helper usesdict[str, Callable[[], None]]which avoids this. Not a finding against this PR.Verdict
No P0 or P1 findings. The fix is well-designed, correctly targeted at all three contributing causes, and preserves backward compatibility with the 150+ existing Robot suites. All nox sessions pass. Documentation is thorough and includes a template for future helpers.
Would approve with P2/P3 comments above. The P2 duplication finding should be addressed in a follow-up PR. Requesting Rui as secondary reviewer for formal approval.
@ -193,1 +193,4 @@### Parallel Execution Isolation (pabot)Robot integration tests run in parallel via `pabot` (≤2 workers by default).P3:nit — Line says "≤2 workers by default" but
noxfile.py:17-25defaults toos.sched_getaffinity(0)(CPU count). On most dev machines this is 4-16.@ -65,2 +66,4 @@runner = CliRunner()def _reset_global_state() -> None:P2:should-fix — This
_reset_global_state()function is duplicated identically in 3 helper files. Consider extracting to a sharedrobot/helpers_common.pymodule in a follow-up PR to reduce maintenance burden if the singleton pattern changes.Bug Review — Additional Findings
Two bugs found during a deeper code review pass, both introduced or relevant to this PR.
P1:must-fix — Incomplete env var cleanup in
Cleanup Test EnvironmentFile:
robot/common.resource:75-86Setup Test Environment(lines 36-40) sets 5 environment variables:CLEVERAGENTS_HOMECLEVERAGENTS_AUTO_APPLY_MIGRATIONSCLEVERAGENTS_TESTING_USE_MOCK_AICLEVERAGENTS_DATABASE_URL(viaSetup Database Isolation)CLEVERAGENTS_TEST_DATABASE_URL(viaSetup Database Isolation)Cleanup Test Environment(lines 85-86) only removes 2 of them:Missing cleanup for
CLEVERAGENTS_HOME,CLEVERAGENTS_AUTO_APPLY_MIGRATIONS, andCLEVERAGENTS_TESTING_USE_MOCK_AI. The comment on line 84 says "Clean env vars to avoid leaking into subsequent suites" but doesn't follow through.Impact: In sequential Robot runs (non-pabot),
CLEVERAGENTS_HOMEleaks pointing to a deleted directory (lines 82-83 remove it). If a subsequent suite callsSetup Database IsolationwithoutSetup Test Environment, it reads a staleCLEVERAGENTS_HOMEpointing to a non-existent path. The other two env vars leak silently and could mask failures in suites that expect them unset.Fix: Add to
Cleanup Test Environment:P1:must-fix — Missing
timeout=on all 10Run Processcalls inm4_e2e_verification.robotFile:
robot/m4_e2e_verification.robot:17,27,37,47,56,65,75,85,95,106All 10 test cases use
Run Processwithout atimeout=parameter. The other two.robotfiles in this PR both have timeouts:m3_decision_validation_smoke.robot— every test hastimeout=30sm4_correction_subplan_smoke.robot— every test hastimeout=30s(60sfor full-flow)While technically pre-existing on master, this PR: (a) modified this file for pabot reliability, (b) added a
testing.mdsection that explicitly says to always addtimeout=toRun Processcalls, and (c) the sibling suites already have timeouts, making this an inconsistency.Impact: A hung helper process blocks a pabot worker indefinitely. Since pabot uses CPU-count workers (not "≤2" as the docs state), one stuck process reduces throughput and can stall CI.
Fix: Add
timeout=30sto all 10Run Processcalls inm4_e2e_verification.robot.Review Fixes —
29b924acAll findings from both review comments addressed.
Disposition of self-review #2033 (comment #55531)
_reset_global_state()to new sharedrobot/helpers_common.pymodule. The three helper files (helper_m4_e2e_verification.py,helper_m4_correction_subplan_smoke.py,helper_m3_decision_validation_smoke.py) now delegate tohelpers_common.reset_global_state()instead of each carrying an identical ~30-line copy. Removed now-unusedimport contextlibfrom all three files.testing.mdworker count corrected from "≤2" to "CPU-count" (already fixed ind2d9681d).Set Environment Variable CLEVERAGENTS_AUTO_APPLY_MIGRATIONS truefromcli_plan_context_commands.robotSuite Setup.Setup Test Environmentnow sets this automatically (line 39 ofcommon.resource).Disposition of bug review (comment #55532)
Remove Environment VariableforCLEVERAGENTS_HOME,CLEVERAGENTS_AUTO_APPLY_MIGRATIONS, andCLEVERAGENTS_TESTING_USE_MOCK_AItoCleanup Test Environment(already fixed ind2d9681d).timeout=30sto all 10Run Processcalls inm4_e2e_verification.robot(already fixed ind2d9681d).Quality gates
nox -s lintnox -s typechecknox -s unit_testsPlanning Authority Review — PR #619
Branch Naming Convention:
This PR uses the
fix/prefix (fix/m4-validation-race-condition), but per CONTRIBUTING.md, bug fix branches should use thebugfix/prefix (e.g.,bugfix/m4-validation-race-condition). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent bugfix branches.TDD Counterpart Tracking:
TDD counterpart issue #635 has been created for the originating bug #563. This ensures the TDD cycle is properly tracked alongside the fix.
Review Status:
This PR has 2 comments with P1 fixes, all addressed by the author. No additional blocking concerns from planning at this time.
PM Status Check — Day 29
Author: @brent.edwards | Milestone: v3.3.0 (M4) | Mergeable: Yes | Reviews: Self-review only
Current State
Fix for M4 validation integration test race condition (bug #563). Brent self-reviewed and addressed all findings in commit
29b924ac. Includes:_reset_global_state()torobot/helpers_common.pyRemove Environment Variablecalls)timeout=30sto all 10Run Processcalls inm4_e2e_verification.robotIssue: No External Reviewer
This PR has zero assigned reviewers and only self-review. Per CONTRIBUTING.md, bug fix PRs require at least one external reviewer approval.
Status: NEEDS REVIEWER ASSIGNMENT
Action Required
Note: PR body is empty — @brent.edwards please fill in per CONTRIBUTING template (summary, changes, quality gates,
ISSUES CLOSED:line). This is a bug fix for #563, which is on the M4 critical path.Review: PR #619 — fix(test): resolve race condition in M4 validation integration test
Reviewer: Rui Hu (
@hurui200320) | Scope: Code review, test review, docs reviewTechnical Assessment
The core fix is sound and well-designed. The three-pronged root cause analysis (shared DB URL, racy shared directory, singleton leaks in chained CLI invocations) is correct. The composable keyword design (
Setup Test Environment+ optionalSetup Database Isolation) preserves backward compatibility with 150+ existing suites while properly isolating the 3 affected helper-based suites. Thefull_flow()chained invocations correctly call_reset_global_state()between CLI steps 1-2 and correctly skip it before step 3 (pure domain logic, no CLI/DB). The centralization of reset logic inhelpers_common.pyis a clean refactor.All findings from Brent's self-review and bug review have been verified as resolved in the latest code.
Finding 1 — P2:should-fix —
helpers_common.reset_global_state()does not resetProviderRegistryFile:
robot/helpers_common.py:18-44The function resets 3 singletons (
Settings._instance,reset_container(),MEMORY_ENGINES) but omitsreset_provider_registry()fromcleveragents.providers.registry.The BDD test harness (
features/environment.py,after_scenariohook) resets all four:The
ProviderRegistryatsrc/cleveragents/providers/registry.pymaintains its own module-level_registryglobal. Afterreset_container(), this stale_registrypersists. In current code paths this is mitigated because the DI container passessettingsexplicitly toget_provider_registry(settings=...), which forces a fresh registry. However, any code that callsget_provider_registry()without thesettingsargument between a reset and the first container access would get the stale registry (with the oldSettingsand thus the old database URL). This is a latent bug.Recommendation: Add to
helpers_common.reset_global_state():Finding 2 — P2:should-fix —
cli_plan_context_commands.robotmissingSuite Teardownafter expanded env var setupFile:
robot/cli_plan_context_commands.robot:1-9This file has
Suite Setupbut noSuite Teardown. This PR expandedSetup Test Environmentto also setCLEVERAGENTS_AUTO_APPLY_MIGRATIONSandCLEVERAGENTS_TESTING_USE_MOCK_AI(common.resource:39-40), and the PR touched this file to remove the now-redundant explicit env var. The missing teardown now leaks 3 env vars (up from 1 pre-PR), plus the per-suite temp directory is never cleaned up. The other three.robotfiles modified by this PR all correctly haveSuite Teardown Cleanup Test Environment.Recommendation: Add:
Finding 3 — P2:should-fix —
testing.mdtemplate contradicts the code it documentsFile:
docs/development/testing.md:216-247Two inconsistencies in the newly added documentation:
Lines 216-218 say "The reset pattern is defined identically in each helper that needs it" — but the code in this same PR centralizes it in
helpers_common.py. The helpers delegate; they no longer carry identical copies.Lines 231-246 show a ~15-line inline
_reset_global_state()template withcontextlib.suppress(ImportError)blocks. This does not match the delegation pattern now used in all three helpers. A developer following this template would re-introduce the duplication that commit29b924acexplicitly eliminated.Recommendation: Update lines 216-218 to:
Replace the code example (lines 231-246) with:
Finding 4 — P1:must-fix — PR body is empty
Per CONTRIBUTING.md: "PRs submitted without a description or without an issue reference will not be reviewed." The PR body is blank. Missing: summary of changes, closing keyword (
Closes #563), and Forgejo dependency link.Finding 5 — P1:must-fix — Commit footers have missing/incorrect issue references
Per CONTRIBUTING.md requirement #4: "Every commit must reference the issue it addresses in its commit message footer."
d2d9681d: No issue reference footer at all.29b924ac: Footer saysRefs: #619(the PR number) instead ofRefs: #563(the issue number).Verdict: Request Changes
The technical fix is well-designed and the root cause analysis is correct. The blocking items are Finding 1 (incomplete singleton reset in new code), Finding 4 (empty PR body), and Finding 5 (commit footers). Findings 2-3 should also be addressed before merge.
Code Review — PR #619:
fix(test): resolve race condition in M4 validation integration testLatest Commit:
29b924ac6e—fix(test): deduplicate _reset_global_state and clean up redundant env varIssue: #563 — Racing condition in integration test for M4 validation
Branch:
fix/m4-validation-race-conditionReviewer: Aditya Chhabra
Scope
This PR fixes a pabot-parallel race condition in the M4 validation integration tests. Changes span:
robot/common.resource— composable keyword redesign (Setup Database Isolation,Setup Test Environment With Database Isolation, expandedCleanup Test Environment)robot/m4_e2e_verification.robot,robot/m4_correction_subplan_smoke.robot,robot/m3_decision_validation_smoke.robot— upgradedSuite SetuptoSetup Test Environment With Database Isolationrobot/helper_m4_e2e_verification.py,robot/helper_m4_correction_subplan_smoke.py,robot/helper_m3_decision_validation_smoke.py— added thin_reset_global_state()delegatesrobot/helpers_common.py— new shared module withreset_global_state()robot/cli_plan_context_commands.robot— removed now-redundant explicit env var setdocs/development/testing.md— new "Parallel Execution Isolation (pabot)" sectionNew Findings
F1 — Buried Import in
_reset_global_state()Thin Wrappers [HIGH — Standards Violation]Files:
robot/helper_m4_e2e_verification.py:73robot/helper_m4_correction_subplan_smoke.py:50robot/helper_m3_decision_validation_smoke.py:42All three helper files define an identical thin wrapper:
CONTRIBUTING.md(lines 1187-1188) is explicit: "Ensure all imports (includingfromstatements) are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."The P2 self-review finding correctly identified the duplication and extracted the implementation to
helpers_common.py. However, the thin wrappers still have the import inside the function body. There is no technical justification for this —helpers_commonis a stable, unconditional dependency of each file (it lives in the samerobot/directory, which Python adds tosys.pathautomatically when the script is executed). This differs from the deferred imports insidehelpers_common.reset_global_state()itself, which are legitimately guarded bycontextlib.suppress(ImportError)to handle optional application-level modules.What needs to be done:
Move
helpers_commonto the module-level import block in all three files — alongside the existingfrom cleveragents...imports. The thin wrapper function then contains only the call:Alternatively, remove
_reset_global_state()entirely and callreset_global_state()fromhelpers_commondirectly at the call sites — the thin wrapper provides no added value.F2 — Double Reset Before
full_flow()Step 1 Is Redundant and Obscures Intent [LOW — Code Clarity]File:
robot/helper_m4_correction_subplan_smoke.py:468-474(the__main__block) androbot/helper_m4_correction_subplan_smoke.py:380(start offull_flow())The
__main__block calls_reset_global_state()before dispatching:full_flow()also calls_reset_global_state()as its very first line before Step 1:When the
full-flowcommand is invoked, state is reset twice before the first CLI call. For all other single-step commands, the__main__reset is correct;full_flow()should not need to duplicate it.The double-reset is harmless but creates ambiguity: it is not clear whether the
full_flow()-level reset is defensive programming, an oversight from whenfull_flow()was written before the__main__-level reset was added, or intentional for a reason not documented in the comment. A reader following the "reset between chained invocations" pattern documented intesting.mdwould not expect a reset before the first invocation.What needs to be done:
Remove the
_reset_global_state()call at the top offull_flow()(Step 1). The__main__block's reset is sufficient to ensure a clean starting state. The in-function resets before Steps 2 (line 414) are correct and must be retained. Add a brief comment to the__main__block explaining that the reset covers thefull_flow()case:F3 —
Cleanup Test EnvironmentRe-readsCLEVERAGENTS_HOMEfrom Environment Instead of Using${SUITE_HOME}[LOW — Correctness / Consistency]File:
robot/common.resource:81-83Setup Test Environmentsets both theCLEVERAGENTS_HOMEenvironment variable and the${SUITE_HOME}suite variable (line 37) — the latter exists precisely so that the suite retains a reliable reference to the directory it created.Cleanup Test Environmentignores${SUITE_HOME}and readsCLEVERAGENTS_HOMEback out of the environment:If a test case modifies
CLEVERAGENTS_HOME(e.g., to verify behavior with a custom home path), cleanup removes the modified path rather than the one the suite created. The${SUITE_HOME}variable cannot be modified by individual test cases without an explicitSet Suite Variablecall in the test body.Using
${SUITE_HOME}directly also eliminates theGet Environment Variablecall and the'${home}' != '${EMPTY}'null-guard —${SUITE_HOME}is guaranteed to be set ifSetup Test Environmentran, and if it did not run, the suite setup itself would have failed first.What needs to be done:
Note:
${SUITE_DB_PATH}and${SUITE_TEST_DB_PATH}are set bySetup Database Isolationbut are not currently consumed by any test case or teardown. If they are not intended for external use, they can be removed.F4 —
contextlib.suppress(Exception)Is Overly Broad inreset_global_state()[LOW — Error Handling]File:
robot/helpers_common.py:42-44contextlib.suppress(Exception)catches all non-fatal exceptions, includingAttributeError,TypeError, and other programming errors. Ifengine.dispose()raises due to a defect in the engine object (e.g., an unexpected type stored inMEMORY_ENGINES), the exception is silently discarded andMEMORY_ENGINES.clear()runs regardless. This can mask bugs in the test infrastructure that would be caught immediately if the exception propagated or was logged.The rationale for suppressing here is presumably that a partially-disposed engine should not abort the reset cycle — but suppressing all exceptions is broader than that intent.
What needs to be done:
Target the suppression to the specific exceptions
engine.dispose()legitimately raises (e.g.,OperationalErrororDBAPIErrorfrom SQLAlchemy), or log the exception tosys.stderrbefore suppressing:This keeps the non-blocking reset behavior while making silent failures visible during debugging.
F5 —
from __future__ import annotationsinhelpers_common.pyIs Unnecessary [Nit]File:
robot/helpers_common.py:13from __future__ import annotationsenables PEP 563 deferred evaluation of annotations and is needed only when type annotations reference names that are not yet defined at parse time (forward references) or when targeting Python versions before 3.10 that do not support union syntax (X | Y).helpers_common.pyhas a single function with the signaturedef reset_global_state() -> None:, which does not require deferred evaluation under any supported Python version.The import adds no value and was likely copied from the helper file template that uses it for domain-model imports. Remove it.
Severity Summary
from helpers_common importinside_reset_global_state()function bodyhelper_m4_e2e_verification.py,helper_m4_correction_subplan_smoke.py,helper_m3_decision_validation_smoke.pyfull_flow()Step 1 — redundant and obscures intenthelper_m4_correction_subplan_smoke.pyCleanup Test Environmentreads env var instead of using${SUITE_HOME}robot/common.resourcecontextlib.suppress(Exception)too broad — swallows programming errors silentlyrobot/helpers_common.pyfrom __future__ import annotationsunnecessaryrobot/helpers_common.pyPositive Observations
Setup Test Environment+ optionalSetup Database Isolation+ convenienceSetup Test Environment With Database Isolation) is the correct architectural choice. It preserves backward compatibility with 150+ existing suites while adding isolation only where needed, without requiring a global change.${TEMPDIR}/.cleveragents_${safe_suite}) is a clean solution that avoids any shared mutable state between pabot workers.helpers_common.reset_global_state()usingcontextlib.suppress(ImportError)for each optional import is the correct defensive pattern for a shared utility that may be loaded in contexts where not all application modules are on the path.full_flow()docstring inhelper_m4_correction_subplan_smoke.pynow clearly explains why_reset_global_state()is called between steps and why Step 3 (pure domain logic) does not need one. This is exactly the kind of non-obvious reasoning that deserves a comment.timeout=30son all 10Run Processcalls inm4_e2e_verification.robotandtimeout=60son the chainedfull-flowcommand are well-calibrated and consistent across the three.robotfiles.Remove Keyword And Ignore Erroris used correctly for the directory removal inCleanup Test Environment— a failing cleanup should not mark an otherwise-passing test as failed.Verdict
Requesting changes on F1 (buried import in all three helper wrappers — explicit CONTRIBUTING.md violation). F2–F5 are lower-priority but should be addressed in the same commit to avoid accumulating small debt. The core fix is technically sound and the composable keyword design is the right architectural approach.
29b924ac6e85ea704752Disposition of Aditya review (comment #56643) — commit
85ea7047Squashed into single commit rebased onto master (
a808c395). All findings addressed.Fixed
from helpers_common importinside_reset_global_state()function body in 3 helper filesfrom helpers_common import reset_global_stateto module-level import block in all three files. Removed the thin_reset_global_state()wrapper entirely; call sites now invokereset_global_state()directly.full_flow()Step 1 is redundantreset_global_state()call at the top offull_flow(). The__main__dispatcher already resets before calling any command function. Added clarifying comment to__main__block.Cleanup Test Environmentre-readsCLEVERAGENTS_HOMEfrom env instead of using${SUITE_HOME}${SUITE_HOME}directly, eliminating theGet Environment Variablecall and the null-guard.contextlib.suppress(Exception)too broad inreset_global_state()sys.stderrviaprint(f"[helpers_common] engine.dispose() suppressed: {exc}", file=sys.stderr).from __future__ import annotationsunnecessary inhelpers_common.pyAdditional
Verification
nox -s lintnox -s typechecknox -s unit_tests85ea7047— no merge commitsa808c395PM Status Check — Day 29
Author: @brent.edwards | Milestone: v3.3.0 (M4) | Issue: #563 | Reviews: REQUEST_CHANGES (Aditya, 5 findings)
Summary
This PR fixes the intermittent race condition in M4 validation integration tests (3 root causes: shared SQLite DB, shared CLEVERAGENTS_HOME, singleton leak). Aditya reviewed and found 5 findings including a HIGH-severity buried import violation.
Added State/In Review and Priority/Critical labels.
Review Status
Aditya's review identified:
Action Required
Bug fix PRs are always top priority. This blocks M4 acceptance gate closure for #563.
Code Review — PR #619
fix/m4-validation-race-conditionReviewed commit:
85ea7047on basea808c395Review type: Architecture review + Test review + Docs review
Verdict: REQUEST CHANGES — 4× P1, 4× P2, 3× P3
P1 — Must fix before merge
F1 ·
P1:must-fix—CLEVERAGENTS_TESTING_USE_MOCK_AI=trueset globally for ALL suitesrobot/common.resourceThe
Setup Test Environmentkeyword now unconditionally setsCLEVERAGENTS_TESTING_USE_MOCK_AI=trueas a global environment variable. This means every Robot suite that uses this keyword — including any future e2e suites that intentionally need real AI — will silently use mock AI. This is a correctness risk: if someone writes a suite that needs real AI responses, they'll get mocks with no warning. Consider making this opt-in via a keyword argument or a separate keyword.F2 ·
P1:must-fix—_COMMANDS: dict[str, object]causes# type: ignore[operator]in two helpersrobot/helper_m4_e2e_verification.py,robot/helper_m4_correction_subplan_smoke.pyThe
_COMMANDSdict is typed asdict[str, object]but its values are used as callables (invoked with()). This forces# type: ignore[operator]suppressions which violate CONTRIBUTING.md's prohibition on inline# type: ignorecomments. Either type the dict properly asdict[str, Callable[..., int]]or useProtocolto type the command functions.F3 ·
P1:must-fix— Docs say reset is "defined identically in each helper" — wrong after this PRdocs/development/testing.mdThe new "Parallel Execution Isolation" section states that
reset_global_state()is "defined identically in each helper script." This is incorrect after this PR — the helpers now importreset_global_statefromhelpers_common.pyrather than defining it locally. The docs must be updated to reflect the shared module pattern.F4 ·
P1:must-fix— Docs template uses_reset_global_state()but actual function isreset_global_state()docs/development/testing.mdThe code template in the docs shows
_reset_global_state()(with leading underscore, private convention) but the actual function inhelpers_common.pyisreset_global_state()(public, no underscore). This will confuse anyone following the template.P2 — Should fix (follow-up within 3 days)
F5 ·
P2:should-fix— Lazy imports inside function body inhelpers_common.pyrobot/helpers_common.py—reset_global_state()Imports wrapped in
contextlib.suppress(ImportError)inside the function body. While there's a runtime justification (graceful degradation if modules aren't installed), CONTRIBUTING.md §Import Guidelines only exemptsif TYPE_CHECKING:blocks. Consider moving these to module level with thecontextlib.suppressguard, or add a comment citing the specific exception rationale.F6 ·
P2:should-fix—except Exceptionsuppression in reset logicrobot/helpers_common.pyBroad
except Exceptioncatches in the reset function suppress all errors including unexpected ones (e.g.,RuntimeError,OSError). This violates exception propagation best practices. At minimum, log the suppressed exception or narrow the catch to specific expected exceptions.F7 ·
P2:should-fix— Other helper-based Robot suites not migrated to database isolationOnly three Robot suites (
m4_e2e_verification,m4_correction_subplan_smoke,m3_decision_validation_smoke) are migrated to the newSetup Database Isolationpattern. Other suites using helpers (e.g.,cli_plan_context_commands) are not migrated. If database isolation fixes the race condition, the un-migrated suites may still exhibit the same issue.F8 ·
P2:should-fix—CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=trueset globallyrobot/common.resourceSimilar to F1,
CLEVERAGENTS_AUTO_APPLY_MIGRATIONS=trueis now set globally. This could mask migration bugs where auto-apply should NOT be enabled. Consider making this configurable per suite.P3 — Nits (author discretion)
F9 ·
P3:nit— Extra blank lines in two helpersrobot/helper_m4_e2e_verification.py,robot/helper_m4_correction_subplan_smoke.pyThere are extra blank lines between import groups that don't follow PEP 8 import grouping conventions. Minor formatting issue.
F10 ·
P3:nit— Removed default.cleveragentscleanupThe PR removes the default
~/.cleveragentsdirectory cleanup from some test teardowns. If tests create artifacts in this directory, they may leak between runs. Verify this is intentional.F11 ·
P3:nit—noqa: E402comments on importsSome imports have
# noqa: E402to suppress the "module level import not at top of file" linting error. These are correctly applied since the imports followsys.pathmanipulation. Just noting for awareness.Summary
The race condition fix via database isolation and shared
helpers_common.pymodule is architecturally sound and addresses a real parallel execution problem. Thetimeout=30saddition to RobotRun Processcalls and the composable keyword design incommon.resourceare good improvements. However, the four P1 findings need attention: the global mock AI/auto-migration settings could silently affect future suites (F1, F8), the# type: ignoresuppressions violate project policy (F2), and the documentation inaccuracies (F3, F4) will mislead contributors following the template.PM Compliance Audit — CONTRIBUTING.md Checklist
Closes #563orFixes #563. Issue will not auto-close on merge.85ea7047).Action Required
@brent.edwards — 3 items to fix:
Fixes #563to the PR body (edit the description)85ea704752aee4315c31Review Fixes —
aee4315cRebased onto latest master (
83f2f3a0). All findings addressed in a single squashed commit.Disposition of self-review (comment #56919)
CLEVERAGENTS_TESTING_USE_MOCK_AI=trueset globallySetup Test Environmentnow accepts optionalmock_ai=${TRUE}andauto_apply_migrations=${TRUE}arguments. Default is${TRUE}for backward compatibility; suites can pass${FALSE}to opt out.# type: ignore[operator]in two helpers_COMMANDS: dict[str, object]todict[str, Callable[[], None]]in bothhelper_m4_correction_subplan_smoke.pyandhelper_m3_decision_validation_smoke.py. Removed both# type: ignorecomments.testing.mdto say "centralised inrobot/helpers_common.py; each helper imports and delegates tohelpers_common.reset_global_state()."_reset_global_state()from helpers_common import reset_global_statedelegation pattern.helpers_common.pycontextlib.suppress(ImportError)rationale as an intentional exception per CONTRIBUTING.md §Import Guidelines.CLEVERAGENTS_AUTO_APPLY_MIGRATIONS${TRUE}.Disposition of hurui200320 review (review #2053)
reset_provider_registry()inhelpers_common.pyreset_provider_registry()call afterreset_container().Suite Teardownincli_plan_context_commands.robotSuite Teardown Cleanup Test Environment.Refs: #619instead of#563Fixes: #563.Disposition of Aditya review (comment #56643) — previously addressed
All 5 findings (F1–F5) remain addressed from previous commit.
Disposition of PM compliance audit (comment #57050)
Fixes #563in PR bodyFixes #563for auto-closeQuality gates
nox -s lintnox -s typecheckaee4315c83f2f3a0Code Review — PR #619
fix(test): resolve race condition in M4 validation integration testReviewed commit:
aee4315c| Base:master(83f2f3a0)Scope: Robot Framework test infrastructure, helpers, documentation
Checklists applied: Test review, CI review, Docs review
Summary
This PR correctly identifies and fixes a three-pronged race condition in M4 validation integration tests under
pabotparallel execution: (1) shared defaultsqlite:///cleveragents.db, (2) incomplete per-suiteCLEVERAGENTS_HOMEisolation, and (3) leaked singleton state across chained CLI invocations. The fix introduces a centralisedreset_global_state()inrobot/helpers_common.py, composableSetup Database Isolation/Setup Test Environment With Database Isolationkeywords incommon.resource,timeout=30sonRun Processcalls, and clear documentation intesting.md. The approach of making database isolation opt-in is the correct design choice.The type annotation improvement (
dict[str, object]→dict[str, Callable[[], None]]) that eliminates# type: ignoreis a welcome incidental cleanup.Findings
robot/helpers_common.py:34Settings._instance = Nonedirectly mutates a private attribute. Unlike the other singletons in this function which use public reset APIs (reset_container(),reset_provider_registry()),Settingshas noreset()classmethod. The same pattern exists infeatures/environment.py:335on master, so this is consistent with existing code.Settings.reset()classmethod as a follow-up, or add a# TODO:comment citing the coupling. Not blocking since it matches the established project pattern.robot/common.resourceSetup Test Environment With Database Isolationkeyword. Other helper-based suites (persistence_lifecycle.robot,cli_lifecycle.robot, etc.) still use plainSetup Test Environmentand remain vulnerable to the same SQLite contention underpabot.testing.mdthat existing suites will be migrated.robot/common.resource:82Cleanup Test Environmentreferences${SUITE_HOME}. IfSuite Setupfails beforeSet Suite Variable ${SUITE_HOME}, the teardown will get a Robot Framework variable resolution error.*** Variables ***level:${SUITE_HOME} ${EMPTY}, or guard theRemove DirectorywithRun Keyword If $SUITE_HOME is not None.CHANGELOG.md:5CLEVERAGENTS_HOMEtemp directories already existed on master. What's new is per-suite database URL isolation.robot/m4_e2e_verification.robottimeout=30swas added toRun Processcalls here, but theRun Processcalls inm3_decision_validation_smoke.robotandm4_correction_subplan_smoke.robot(also modified) do not havetimeout=added.timeout=30sto those suites too for consistency, or note as follow-up.robot/common.resource:57Setup Database IsolationassumesSetup Test Environmentwas called first (to set${SUITE_HOME}). The convenience keyword handles ordering, but a direct call would fail with a confusing error.Security Check
eval(),exec(), or shell injection vectors.contextlib.suppress(ImportError)scoped correctly.Verdict
APPROVE with comments. No P0/P1 findings. The four P2 items (established pattern coupling, incomplete migration, teardown robustness, CHANGELOG wording) can all be addressed in a follow-up within 3 days per project policy. The core race condition fix is correct and well-implemented.
Review (Pre-External Review Check): PR #619 —
fix(test): resolve race condition in M4 validation integration testReviewer: Rui Hu (
@hurui200320) | Commit:aee4315c| Base:master(83f2f3a0)Scope: Code review, test review, docs review, security check
Previous Review Status
All 5 findings from my previous review (#2053) are verified as resolved in the latest code:
reset_provider_registry()inhelpers_common.pyrobot/helpers_common.py:44-48Suite Teardownincli_plan_context_commands.robotrobot/cli_plan_context_commands.robot:9docs/development/testing.md:211-213now correctly describes delegation patternFixes #563Fixes: #563Aditya's 5 findings (#56643) — all resolved. Brent's self-review findings (#56919) — all resolved. PM compliance audit (#57050) — all resolved.
Technical Assessment
The core race condition fix is technically sound and well-designed. The three-pronged root cause analysis (shared SQLite DB URL, shared
CLEVERAGENTS_HOME, singleton leak in chained CLI invocations) is correct and each cause is addressed with a targeted, minimal fix.Strengths:
Setup Test Environment+ optionalSetup Database Isolation) preserves backward compatibility with 170+ existing suites.reset_global_state()properly centralised inrobot/helpers_common.pywith all 4 singletons:Settings._instance,reset_container(),reset_provider_registry(),MEMORY_ENGINES. Reset order matches the BDD harness (features/environment.py:331-389).from helpers_common import reset_global_stateis at module level in all 3 helpers (no buried imports)._COMMANDStyped asdict[str, Callable[[], None]]— eliminates all# type: ignore[operator]in modified files.Run Processcalls have appropriate timeouts (30s standard, 60s forfull-flow).full_flow()chaining correctly resets between Steps 1-2 and skips reset before Step 3 (pure domain logic), with clear docstring explaining the rationale.Cleanup Test Environmentuses${SUITE_HOME}directly instead of re-reading env var.engine.dispose()exceptions are now logged to stderr instead of silently suppressed.Security check: No credentials, secrets, API keys,
eval(),exec(), or shell injection vectors in any changed file.contextlib.suppress(ImportError)correctly scoped.New Findings
R1 —
${SUITE_HOME}not declared with default; teardown fragile if setup fails [P2:should-fix]File:
robot/common.resource:13-15(Variables section) androbot/common.resource:95(Cleanup)${SUITE_HOME}is set dynamically bySetup Test Environmentat line 45 (Set Suite Variable). It is NOT declared in the*** Variables ***section. IfSuite Setupfails before reaching line 45 (e.g.,Create Directoryfails on a read-only filesystem),Cleanup Test Environmentwill fail at line 95 with aVariableErrorbecause${SUITE_HOME}is undefined. In Robot Framework, variable resolution occurs beforeRun Keyword And Ignore Errorexecutes, so the error is NOT caught.Impact: Suite Setup failure produces a secondary teardown error that obscures the real failure. Low probability in normal operation, but hampers debugging when it does occur.
Recommendation: Add a default in the
*** Variables ***section:And guard the cleanup:
R2 — Only 3 of 170+ suites migrated to database isolation [P3:info]
File:
robot/common.resourceConfirmed via full audit: only
m3_decision_validation_smoke.robot,m4_correction_subplan_smoke.robot, andm4_e2e_verification.robotuseSetup Test Environment With Database Isolation. The remaining ~167 suites usingSetup Test Environmentare not migrated. This is by design (only helper-based suites that invokeRun Processneed DB isolation), but suites likepersistence_lifecycle.robot,cli_lifecycle.robot, and other helper-based suites may still be vulnerable to the same SQLite contention under pabot.Recommendation: File a follow-up issue to audit which remaining suites invoke Python helpers via
Run Processand migrate those to use DB isolation.R3 — CHANGELOG entry wording slightly inaccurate [P3:nit]
File:
CHANGELOG.md:9Entry says "per-suite temp directories" but per-suite
CLEVERAGENTS_HOMEtemp directories already existed on master (the oldSetup Test Environmentalready created them). What's new in this PR is per-suite database URL isolation. Consider rewording to "per-suite database URL isolation" for accuracy.Out-of-Scope Observations (Not Blocking)
# type: ignorecomments exist across 24 other robot helper files (pre-existing, not introduced by this PR). This PR removed them from the 3 modified helpers, which is a good incremental improvement.provider_detection_smoke.robotis missingSuite Teardown(pre-existing bug, not introduced by this PR).features/environment.pyusescontextlib.suppress(Exception)forengine.dispose()(silent), whilehelpers_common.pynow usestry/exceptwith logging (improved). Consider backporting the logging pattern toenvironment.pyin a follow-up.Severity Summary
${SUITE_HOME}teardown fragile if setup failsrobot/common.resourcerobot/common.resourceCHANGELOG.mdVerdict: APPROVE
The fix is correct, well-documented, and all blocking issues from previous reviews are resolved. The P2 item (R1,
${SUITE_HOME}default) and the P3 items (R2 migration scope, R3 CHANGELOG wording) can be addressed in a follow-up PR within 3 days per project policy. The core race condition fix does not introduce any regressions or new bugs.aee4315c31d88dad94f6New commits pushed, approval review dismissed automatically according to repository settings