fix(test): skip redundant Alembic migration check for existing test databases #727
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!727
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/test-db-performance-regression"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fix a performance regression in the BDD test infrastructure where
_fast_init_or_upgradeinfeatures/environment.pyfell through to the original_original_init_or_upgrade()for existing non-empty test databases, triggering redundant Alembic migration checks on every CLI command invocation within a scenario.Closes #726
Changes
features/environment.pySingle change in
_fast_init_or_upgrade()(line 221-222):When a scenario test database already exists and is non-empty (created by a prior step like
I have initialized a project), the function now returns immediately instead of delegating to the original migration runner:Why this is safe
scripts/create_template_db.py)init_command) already ran the full migration during creation_SCENARIO_DB_PREFIXESguard above this check ensures only scenario test databases are affected — migration-runner unit tests that use custom URLs still go through the original pathWhat the bug caused
INFO [alembic.runtime.migration]log lines per scenario for every CLI command after initbehave-parallel)@database_retrydecorators (wait_fixed(0.5)× 3 attempts) on ~100 repository methodsAffected scenarios (all now passing)
All 11 previously-failing scenarios in
features/core_cli_commands.featurethat invoke CLI commands after a prior step already created a database.Verification
nox -s unit_tests -- features/core_cli_commands.feature --processes 1nox -s unit_tests -- features/core_cli_commands.feature(16 processes)Quality Checks
features/environment.py7ba0a7f30d1ec525599bSecurity Review — PR #727
Scope
Reviewed the single-file change in
features/environment.py(the PR diff), plus the broader security posture of related files:features/steps/cli_plan_context_commands_steps.py,scripts/create_template_db.py,src/cleveragents/infrastructure/database/migration_runner.py,src/cleveragents/core/retry_patterns.py, test step files, andpyproject.tomldependencies.Verdict: No security blockers introduced by this PR
The change replaces
return _original_init_or_upgrade(self, **kwargs)with a barereturnin the test-only monkey-patched_fast_init_or_upgrade. This is scoped entirely to test infrastructure and does not touch production code. No new security vulnerabilities are introduced.Findings
Finding 1 — TOCTOU race on file existence check (PR change context)
features/environment.py:221returnbranchThe check
if db_path.exists() and db_path.stat().st_size > 0followed by either returning or copying the template has a theoretical TOCTOU window: another parallel test worker could delete or replace the file between thestat()call and subsequent use by SQLAlchemy. In practice, the window is sub-microsecond and the worst outcome is a test failure (not a security breach). The same race existed before this PR — the change merely swaps what happens in the "true" branch.Recommendation: No action needed for security. If test stability under heavy parallelism becomes an issue, consider catching
FileNotFoundErrorin the calling code.Finding 2 —
tempfile.mktemp()usage (pre-existing, 16 call sites)features/environment.py:48,52,292plus 13 other step filestempfile.mktemp()is deprecated by Python docs because of a symlink race: between name generation and file creation, an attacker could place a symlink at the predicted path. In this codebase, all 16 call sites are in test infrastructure generating SQLite DB paths or scratch files, and the paths use unpredictable prefixes in system temp directories. The practical risk in a CI/test context is negligible, but it remains a code quality concern.Recommendation: When convenient, migrate to
tempfile.mkstemp()(returns an fd + path atomically) ortempfile.NamedTemporaryFile(delete=False). This is not urgent and should not block this PR.Finding 3 —
shell=Truein subprocess call (pre-existing)features/steps/cli_coverage_steps.py:44subprocess.run(command, ..., shell=True)wherecommandis a string built from BDD step definitions (@when('I run shell command "{command}"')). The command strings originate from.featurefiles authored by the development team, not from runtime user input. However,shell=Truewith string input is a well-known command injection vector and violates defense-in-depth.Recommendation: Convert to
shlex.split(command)+shell=False(as already done incli_plan_context_commands_steps.py:351). Not a blocker for this PR.Finding 4 — Skipping migration validation in tests
features/environment.py:221-222(the PR change)returnvs. calling original)By returning immediately for non-empty test DBs, the monkey-patched path now skips Alembic migration validation entirely. Previously it delegated to the original
init_or_upgrade()which would verify the DB was at HEAD. The risk is that a test could silently pass with a stale or partially-migrated schema that would fail in production.Mitigating factors:
scripts/create_template_db.py:53-60_SCENARIO_DB_PREFIXESguard (line 211) limits this to known test DB namesMigrationRunner.init_or_upgrade()is never monkey-patchedRecommendation: Acceptable as-is for a test performance fix. Consider adding a CI gate that verifies the template DB's Alembic revision matches HEAD (a simple
SELECT version_num FROM alembic_versionassertion in a nox session) to catch template staleness.Finding 5 — Test credential handling (pre-existing, no issues)
Searched for hardcoded credentials across all feature/step files. Found only clearly-fake test fixtures:
"my-secret-value-here","plan-service-api-key","ls-fake-key-for-coverage","behave-context-api-key". These are used in mock/stub contexts and do not represent real credentials. No.envfiles with secrets, no real API keys, no connection strings with embedded passwords.Finding 6 — Database connection string in environment variable (pre-existing)
migration_runner.py:121MigrationRunner.run_migrations()temporarily setsos.environ["CLEVERAGENTS_DATABASE_URL"]to the database URL for Alembic'senv.pyto read. The value is restored in afinallyblock. For non-SQLite databases, connection strings could contain credentials that would be visible to child processes or/proc/self/environreaders during that window. The restoration is properly handled; the risk is inherent to Alembic's configuration model.Recommendation: No action needed for this PR. For hardening, consider using
alembic_cfg.attributes["connection"]exclusively (as already done for the engine-based path on line 128) to avoid env var exposure.Finding 7 — Credential sanitization in retry logging (pre-existing, positive finding)
retry_patterns.py:349-397_sanitize_error_message()properly redacts URLs with embedded credentials, API key assignments, auth headers, and dict-format secrets before logging. The pre-truncation to 2000 chars (_MAX_SANITIZE_INPUT, line 388) bounds regex CPU time, preventing ReDoS. This is a well-implemented defense.Finding 8 — Supply chain / dependency review
Reviewed
pyproject.tomldependencies. All packages are mainstream ecosystem libraries (alembic, sqlalchemy, tenacity, langchain, pydantic, typer, etc.). Security tooling is included in dev dependencies (bandit,semgrep).RestrictedPythonfor sandboxing is a positive signal. No suspicious or typosquatting-candidate packages detected. No pinned commit hashes that could mask substitution.Summary Table
tempfile.mktemp()usageshell=Truesubprocessshell=FalseNo critical, high, or medium severity security findings. No blockers for merge from a security perspective.
TOCTOU (Time-of-check-time-of-use): The
exists()+stat().st_sizecheck followed by the barereturnhas a theoretical race window under parallel execution — another worker could delete the DB between this check and the subsequent SQLAlchemy open. Pre-existing condition (the check was here before this PR), and worst-case outcome is a test failure, not a security issue. Severity: Low/Informational.Migration validation skip: The change from
return _original_init_or_upgrade(self, **kwargs)to barereturnmeans existing non-empty test DBs now bypass Alembic validation entirely. This is safe because: (1) template DBs are stamped at HEAD, (2) DBs from prior steps already ran migration, (3) production code is unaffected. Consider a CI gate that asserts the template DB revision matches HEAD to prevent silent staleness.Performance Review — PR #727
Summary
The core change (replacing
_original_init_or_upgrade(self, **kwargs)withreturnfor non-empty DBs) is an effective performance fix. The PR's claims are substantiated by the code. Below is a detailed analysis of both the fix and remaining performance concerns in the broader test infrastructure.1. Verification of PR Performance Claims
Claim: Redundant Alembic migration checks (7+ pairs per scenario)
Verdict: CONFIRMED
MigrationRunner.init_or_upgrade()(migration_runner.py:189-309) performs expensive work on every call:create_engine()— new SQLAlchemy engineengine.connect()— opens a connection/file handleinspect(engine).get_table_names()— queriessqlite_masterget_pending_migrations()— reads Alembic script directory, compares against DB statecommand.upgrade()orcommand.stamp()— Alembic operationsengine.dispose()— cleanupMultiple
UnitOfWorkinstances are created per scenario (35 call-sites across step files), each calling_ensure_database_initialized()→init_or_upgrade(). The_database_initializedflag atunit_of_work.py:55is per-instance, not global, so newUnitOfWorkobjects targeting the same DB file will re-trigger the full check. The claim of 7+ redundant checks per scenario is entirely plausible.Estimated savings: ~0.5-3s per redundant check × 7+ checks × thousands of scenarios = minutes of aggregate savings.
Claim: SQLite lock contention under parallel execution
Verdict: CONFIRMED
Each redundant
init_or_upgrade()call creates an engine and opens a connection to the SQLite file. Under 16-worker parallel execution (behave-parallel), even though workers use unique DB paths, the I/O pressure from thousands of unnecessarycreate_engine+connect+inspectcycles contributes to system-level contention (file descriptor pressure, inode cache pressure). If any edge case causes path collision, actual SQLiteSQLITE_BUSYlocks would occur.Claim: Cascading retry overhead
Verdict: CONFIRMED
The
databaseretry category (retry_patterns.py:128-132) useswait_fixed(0.5)× 3 attempts. There are 107+ methods decorated with@database_retryacrossrepositories.pyandchangeset_repository.py. While the sleep cap atenvironment.py:109limits waits to 10ms, the per-attempt overhead (exception creation, tenacity state machine, structlog logging inlog_before_retry/log_after_retry) still multiplies across 100+ decorated methods. The fix removes the root cause (lock contention) rather than treating the symptom.Claim: Intermittent CI failures across 280+ feature files
Verdict: PLAUSIBLE
The repo contains 374 feature files with ~10,700 scenarios. At 7+ redundant migration checks per scenario, that's ~75,000+ unnecessary engine creation + Alembic inspection cycles per full test run. Under parallel execution, cumulative I/O pressure causing timeouts is a reasonable failure mode.
2. The Fix Itself — Performance Assessment
Severity: The fix is correct and effective.
The change is safe because:
create_template_db.py:58-61_SCENARIO_DB_PREFIXESguard at line 211 ensures only test DBs are affected3. Remaining Performance Concerns
Finding P1: Double syscall in existence check
features/environment.py:221db_path.exists() and db_path.stat().st_size > 0issues twostat()syscalls (Python'sPath.exists()callsos.stat()internally). Can be consolidated to one:Finding P2: TOCTOU race on stat() in parallel execution
features/environment.py:221db_path.exists()anddb_path.stat(), the file could theoretically be deleted by another process. However, this is mitigated by:tempfile.mktemp()(line 292)after_scenario, lines 507-510) only runs after engine disposal in the same processFinding P3:
tempfile.mktemp()is deprecatedfeatures/environment.py:48-49, 292tempfile.mktemp()is deprecated since Python 2.3 due to TOCTOU race conditions. Under high concurrency (16 workers), two workers could theoretically receive the same path before either creates the file.tempfile.mkstemp()atomically creates the file, but returns a file descriptor for a file that already exists — which conflicts with the template-copy approach. The correct alternative istempfile.NamedTemporaryFile(delete=False, suffix='.db', prefix=prefix)followed by closing the file, or usingtempfile.mkdtemp()+ a fixed filename inside it.tempfile.mkstemp()+os.close(fd)to get an atomically-created unique path, then overwrite with template copy.Finding P4:
shutil.copy2vsshutil.copyfor templatefeatures/environment.py:226shutil.copy2()preserves file metadata (timestamps, permissions) via an extraos.utime()+os.chmod()call. For temp test databases, metadata preservation is unnecessary.shutil.copy()or evenshutil.copyfile()(which skips permission copying too) would be marginally faster.shutil.copyfile()in a follow-up for cleanliness.Finding P5: Per-instance
_database_initializedflag remains a structural inefficiencysrc/cleveragents/infrastructure/database/unit_of_work.py:55_fast_init_or_upgradestill executes on every newUnitOfWorkinstance: it parses the URL, extracts the path, checks prefixes, and callsstat(). A process-global or thread-local set of "already-initialized" DB paths would skip all of this work for repeated calls to the same path.init_or_upgradedelegation, so the remaining per-call overhead is small.Finding P6:
_find_alembic_ini()filesystem traversal in fallback pathssrc/cleveragents/infrastructure/database/migration_runner.py:35-89_find_alembic_ini()walks up to 10 directory levels from two starting points, callingcandidate.exists()at each level. This runs on everyinit_or_upgrade()call via thealembic_cfgproperty. The PR's early return prevents this from executing for the common case, but the fallback paths (non-SQLite, in-memory, non-prefix DBs) still hit it.Finding P7: No template DB cache invalidation
features/environment.py:140-144_ensure_template_db()reuses an existing template iftemplate_path.is_file()returns true, with no staleness check. If the Alembic migration set changes between runs without cleaningbuild/, the template will be outdated. This wouldn't cause a performance regression but could cause test failures.noxfile.py:74-81) always recreates the template, so this only affects directbehaveinvocations.alembic/versions/mtime, or always recreating in_ensure_template_db(). Low priority.4. Broader Test Infrastructure Performance
Finding P8:
behave-parallelchunk granularitynoxfile.py:400-404(inline CLI source)chunk_size = max(1, (len(feature_paths) + processes - 1) // processes)). This means a chunk with one very slow feature file will become the bottleneck while other workers sit idle. Work-stealing or per-feature dispatch would improve load balancing.pool.maptopool.imap_unorderedwith per-feature granularity for better load balancing. This is outside the scope of this PR.Finding P9: Engine cache cleared twice per scenario
features/environment.py:259-266(before_scenario) andfeatures/environment.py:494-503(after_scenario)MEMORY_ENGINESis cleared and engines disposed in bothbefore_scenarioandafter_scenario. The double-clear is defensive but means every scenario pays the cost of iterating and disposing cached engines twice.MEMORY_ENGINESis typically small. Defensive programming trade-off.Overall Assessment
The fix directly addresses the root cause and the change is minimal and well-scoped. Approve from a performance perspective.
@ -218,3 +222,4 @@return# Copy the template — creates a fully-migrated DB in ~1msdb_path.parent.mkdir(parents=True, exist_ok=True)[Performance — Low]
shutil.copy2preserves file metadata (timestamps, permissions) via extraos.utime()+os.chmod()syscalls that are unnecessary for ephemeral test databases.shutil.copyfile()would be marginally faster by skipping metadata preservation. Negligible impact at ~21ms total across a full run, but cleaner semantically.@ -216,1 +218,4 @@# fully migrated. Skipping the Alembic check avoids redundant engine# creation, SQLite lock contention, and the cumulative overhead that# causes intermittent hangs in parallel test runs.if db_path.exists() and db_path.stat().st_size > 0:[Performance — Low] Two syscalls where one suffices.
Path.exists()internally callsos.stat(), and thenPath.stat()callsos.stat()again. This can be consolidated into a single syscall:At ~75,000 invocations per full run this saves ~75ms total — negligible, but it also eliminates the theoretical TOCTOU race between the two calls. Consider as a follow-up micro-optimization.
Bug Detection Review — PR #727
Summary
I read all five files referenced in the PR description plus the surrounding context. The change replaces
return _original_init_or_upgrade(self, **kwargs)withreturnon line 222 offeatures/environment.py, inside the_fast_init_or_upgrade()monkey-patch. This means that when a test DB file already exists and is non-empty, the patched function now silently returns instead of delegating to the real Alembic migration runner.Overall assessment: No critical or high-severity bugs found. The change is safe for its intended purpose. One low-severity behavioral regression and two pre-existing issues are worth noting.
Finding 1 — Loss of stale-template migration safety net (Introduced by this PR)
File:
features/environment.py:221-222Severity: Low
Description:
Previously, when
init_or_upgradewas called on a DB that already exists and is non-empty (e.g. a secondinit_or_upgradecall on a DB that was already set up by the template copy earlier in the same scenario), the old code delegated to the original_original_init_or_upgrade, which would:The new code returns immediately, removing this "second-chance" migration check.
When this matters: Only if the cached template DB (
build/.template-migrated.db) is from a previous schema version (new Alembic migrations were added but the template was not rebuilt). In that case, the old code could detect and apply pending migrations on the secondinit_or_upgradecall for the same file. The new code will not.Why this is Low severity:
shutil.copy2without calling the original.init_or_upgradecalls on the same file in the same scenario.before_all()via_ensure_template_db(), so staleness only occurs in unusual local developer workflows.Recommendation: Consider adding a freshness check to
_ensure_template_db()— e.g. compare the template's Alembicheadstamp against the current Alembic head revision. If they differ, recreate the template. This would be a more robust fix than relying on the migration runner as a second-chance safety net.Finding 2 — Stale cached template reuse in
_ensure_template_db()(Pre-existing, NOT introduced by this PR)File:
features/environment.py:141-143Severity: Low
Description:
The cached template in
build/.template-migrated.dbis reused without verifying it matches the current schema. If a developer adds new migrations between test runs without deleting the cache, all tests will use a stale template.This affects both old and new code equally, but Finding 1 means the new code has one less layer of defense.
Recommendation: Same as Finding 1 — validate the template's Alembic stamp at reuse time.
Finding 3 — Return value correctness: Confirmed safe
init_or_upgradeis typed as-> None(migration_runner.py:193). Both the oldreturn _original_init_or_upgrade(self, **kwargs)and the newreturnevaluate toNone. No callers depend on a return value. The_database_initialized = Trueflag inunit_of_work.py:147is set afterinit_or_upgradereturns, so it is correctly set in both old and new code.Finding 4 — Prefix guard correctness: Confirmed safe
The
_SCENARIO_DB_PREFIXEStuple at line 181 includes"db.", which matchesdb.sqlitefromcli_plan_context_commands_steps.py:98. This is correct —db.sqliteis a test-generated DB that should use the fast path.The migration_runner unit tests use URLs like
sqlite:///:memory:,postgresql://..., orsqlite:///tmp/test-db/mydb.db— none of which match the prefix filter. So the patch correctly does NOT intercept migration_runner tests.Finding 5 — No race conditions in template copy logic
db_path.parent.mkdir(parents=True, exist_ok=True)(line 225) is safe for concurrent use.shutil.copy2(line 226) is not atomic, but each scenario/process gets unique DB paths viatempfile.mktemp(), so concurrent copies to the same destination don't occur.db_path.exists() and db_path.stat().st_size > 0check (line 221) has a TOCTOU window, but this is benign in the test environment since no other process is racing to create the same temp file.Finding 6 — No regression risk to production code
The monkey-patch in
_install_template_db_patch()is only installed duringbefore_all()in Behave tests. It modifiesMigrationRunner.init_or_upgradeonly within the test process. Production code is unaffected.Verdict
No critical or high-severity bugs. The change is a safe optimization for its stated goal (eliminating redundant engine creation and SQLite lock contention in parallel test runs). The one low-severity finding (loss of stale-template safety net) is a minor reduction in defense-in-depth that could be addressed independently by adding template freshness validation to
_ensure_template_db().Bug Detection — Finding 2 (Low, pre-existing): The cached template is reused without validating that its Alembic revision stamp matches the current head. If a developer adds new migrations between test runs without deleting
build/.template-migrated.db, all tests silently use the stale schema.This is pre-existing and not introduced by this PR, but Finding 1 removes one layer of defense that previously (partially) mitigated it.
@ -216,2 +220,3 @@# causes intermittent hangs in parallel test runs.if db_path.exists() and db_path.stat().st_size > 0:return _original_init_or_upgrade(self, **kwargs)returnBug Detection — Finding 1 (Low): Replacing
_original_init_or_upgrade(self, **kwargs)with a barereturnremoves the safety net that would catch a stale template DB on subsequentinit_or_upgradecalls within the same scenario.When this matters: Only if
build/.template-migrated.dbwas cached from a previous schema version (developer adds migrations but doesn't rebuild template). The original code would have detected pending migrations on the second call and applied them.Why Low severity: The first call (which copies the template at line 226) never ran the original migration runner in either version — so this safety net was always incomplete. A stale template would cause immediate test failures, not silent data corruption.
Suggestion: Consider adding a freshness check in
_ensure_template_db()(e.g. compare the template's Alembic stamp againstScriptDirectory.get_current_head()) to proactively detect stale templates, rather than relying on the migration runner as a fallback.Code Review — PR #727
Verdict: APPROVED
The change is correct, minimal, well-scoped, and satisfies all ticket #726 acceptance criteria. No bugs, no security issues, no regressions. Performance claims are verified. Approving with findings noted below.
Findings Requiring Action on This PR
1. Missing Milestone — High
CONTRIBUTING.md requires every PR to be assigned the same milestone as its linked issue. Issue #726 is on milestone
v3.2.0, but this PR has no milestone.Action: Assign this PR to milestone
v3.2.0.2. No Direct Test for the New Early-Return Behavior — Medium
File:
features/environment.py:221-222The PR's behavioral change (skip Alembic for existing non-empty databases) is not directly tested by any scenario. The 15 scenarios pass because of the change (no more hangs), but none verify the early-return logic explicitly. A search for
_fast_init_or_upgradeacross all.featurefiles returns zero matches.Recommendation: Consider a follow-up ticket to add a scenario that creates a non-empty DB, calls
_fast_init_or_upgrade, and asserts_original_init_or_upgradeis NOT invoked. Not blocking since this is test infrastructure excluded from coverage measurement.3. No Changelog Update — Medium
CONTRIBUTING.md requires PRs to include a changelog update. No changelog file exists in the repository — this may be a project-wide gap rather than specific to this PR.
Action: Clarify whether the project maintains a changelog. If so, add an entry.
4. Stale Labels — Low
Both the PR and issue carry
Needs feedback. If the PR is ready for review, this is misleading. The PR also carriesState/Verified— the issue correctly hasState/In Review.Action: Remove
Needs feedbackif no questions remain; correctState/Verifiedon the PR.Pre-Existing Issues (Not Introduced by This PR)
Worth tracking as separate tickets:
5. Vacuous Assertions in Step Definitions — High (pre-existing)
File:
features/steps/cli_plan_context_commands_steps.pySix
@thenstep definitions contain assertions guarded by conditions that silently pass when false, or havepassas their body:step_plan_has_changes): passes if file doesn't existstep_changes_in_database): tautological assertion in else branchstep_changes_marked_applied): passes if no filestep_equivalent_command): body ispassstep_project_named): body ispassstep_plan_has_conversation): body ispassRecommendation: Implement real assertions or mark as
pending.6. Dead
after_scenarioHooks in Step Files — Medium (pre-existing)Files:
features/steps/cli_plan_context_commands_steps.py:599-613,features/steps/cli_plan_context_steps.py:129-138Both files define
after_scenario()functions, but Behave only invokes hooks fromenvironment.py. These are dead code — the cleanup they perform is silently skipped.7. Deprecated
tempfile.mktemp()Usage — Medium (pre-existing)File:
features/environment.py:48, 52, 292(and ~16 other call sites)tempfile.mktemp()is deprecated due to TOCTOU race conditions. In parallel test runs with fork, this creates a non-zero risk of path collisions.Recommendation: Replace with
tempfile.mkstemp()+os.close(fd).8. Stale Template DB Detection — Low (pre-existing)
File:
features/environment.py(_ensure_template_db())The template DB is reused without checking its Alembic revision. If schema changes are made but the template isn't regenerated, all tests silently use a stale schema.
Recommendation: Add a freshness check comparing the template's Alembic stamp against the current head revision.
9. Both DB URLs Point to Same File — Low (pre-existing)
File:
features/steps/cli_plan_context_commands_steps.py:96-100Both
CLEVERAGENTS_DATABASE_URLandCLEVERAGENTS_TEST_DATABASE_URLare set to the same SQLite file, which could mask bugs where the production vs. test URL selection matters.Verification Summary
_SCENARIO_DB_PREFIXESguard intact)Minor performance notes (non-blocking):
db_path.exists()+db_path.stat()could be consolidated into oneos.stat()try/except;shutil.copy2could beshutil.copyfilesince metadata preservation is unnecessary for test DBs.@ -218,3 +222,4 @@return# Copy the template — creates a fully-migrated DB in ~1msdb_path.parent.mkdir(parents=True, exist_ok=True)Consider
shutil.copyfile()instead ofshutil.copy2()here — metadata preservation (timestamps, permissions) is unnecessary for ephemeral test databases andcopyfileis marginally faster. Not blocking.@ -216,1 +218,4 @@# fully migrated. Skipping the Alembic check avoids redundant engine# creation, SQLite lock contention, and the cumulative overhead that# causes intermittent hangs in parallel test runs.if db_path.exists() and db_path.stat().st_size > 0:The early return here is correct and safe — existing non-empty DBs are either template-copied (already at Alembic HEAD) or created by a prior step that already ran the full migration. The
_SCENARIO_DB_PREFIXESguard above ensures non-scenario DBs still go through the original path.Minor suggestion:
db_path.exists()followed bydb_path.stat().st_sizeis two syscalls. Could consolidate to:Not blocking — purely a micro-optimization.
CoreRasurae referenced this pull request2026-03-25 00:44:16 +00:00