[AUTO-INF-6] Module-level ULID generation in step files creates cross-scenario session ID contamination #10245

Open
opened 2026-04-17 10:30:45 +00:00 by HAL9000 · 0 comments
Owner

Metadata

Field Value
Branch test/test-data-quality-fix-module-level-session-ids
Commit Message test(infra): move module-level ULID generation into per-scenario fixtures to prevent cross-scenario contamination
Milestone v3.5.0
Parent Epic

Background and Context

Three step files generate session/plan identifiers at module load time (i.e., when Python imports the module), not at scenario setup time. Because Python caches imported modules, these values are generated once and shared across every scenario that uses the step file. In parallel test runs (behave-parallel), multiple workers import the same module and share the same identifier, causing cross-scenario contamination.

Evidence — three affected files:

1. features/steps/session_cli_steps.py (lines 28–29)

_SESSION_ID = str(ULID())
_SESSION_ID_2 = str(ULID())

These two session IDs are generated once at import time. All 20+ step functions in this file that reference _SESSION_ID or _SESSION_ID_2 share the same values across every scenario in the test run.

2. features/steps/session_cli_coverage_boost_steps.py (line 47)

_ULID1 = "01SCVBST000000000000000001"

A static string used as a session ID across 20+ step functions. While not a ULID generator, this is a hardcoded shared identifier that creates the same cross-scenario contamination risk.

3. features/steps/plan_executor_edge_cases_coverage_steps.py (line 37)

EDGE3_SANDBOX_ROOT = "/tmp/edge3-sandbox"

A module-level path constant shared across all scenarios. In parallel runs, two workers executing scenarios from this step file will both attempt to use /tmp/edge3-sandbox, causing filesystem conflicts.


Impact

  1. Parallel test contamination: In behave-parallel runs, two workers executing scenarios from session_cli_steps.py will use the same _SESSION_ID. If both scenarios write to a database keyed by session ID, they will overwrite each other's data.
  2. Non-deterministic failures: Tests may pass in serial runs but fail intermittently in parallel runs, making failures hard to reproduce.
  3. False positives: A scenario that should fail (because it creates a new session) may pass because it finds data left by a previous scenario with the same ID.

Expected Behavior

Session IDs and sandbox paths should be generated per scenario in before_scenario hooks or in @given step implementations, not at module load time. Each scenario should receive a fresh, unique identifier.

Correct pattern (already used in many other step files):

@given("a new session is created")
def step_create_session(context):
    context.session_id = str(ULID())  # Generated fresh per scenario

Acceptance Criteria

  • _SESSION_ID and _SESSION_ID_2 in session_cli_steps.py are removed from module scope; session IDs are generated per-scenario (e.g., in before_scenario or in the relevant @given step)
  • _ULID1 in session_cli_coverage_boost_steps.py is replaced with a per-scenario generated value
  • EDGE3_SANDBOX_ROOT in plan_executor_edge_cases_coverage_steps.py is replaced with a per-scenario temporary directory (using tempfile.mkdtemp())
  • All nox stages pass without regressions
  • Coverage >= 97%

Subtasks

  • Audit session_cli_steps.py for all usages of _SESSION_ID and _SESSION_ID_2; move generation to before_scenario or per-step context attribute
  • Audit session_cli_coverage_boost_steps.py for all usages of _ULID1; replace with per-scenario generated ULID stored in context
  • Audit plan_executor_edge_cases_coverage_steps.py for all usages of EDGE3_SANDBOX_ROOT; replace with tempfile.mkdtemp() in before_scenario or the relevant @given step
  • Run nox -s unit_tests to confirm no regressions
  • Run nox -s coverage_report to confirm coverage >= 97%

Definition of Done

This issue should be closed when:

  • No module-level ULID generation or shared session ID constants remain in session_cli_steps.py, session_cli_coverage_boost_steps.py, or plan_executor_edge_cases_coverage_steps.py
  • All session IDs and sandbox paths are generated fresh per scenario
  • All nox stages pass
  • Coverage >= 97%

Duplicate Check

Search 1 — open issues, keywords "module-level" + "session ID":
Searched open issues for "module-level", "session ID", "ULID module", "import time". No existing issue covers module-level ULID generation at import time in step files.

Search 2 — cross-area analysis:

  • #10082 (context.temp_dir cleanup): covers missing after_scenario cleanup, not module-level generation. Different issue.
  • #7087 (tempfile.mktemp() race condition): covers tempfile.mktemp() in environment.py, not module-level ULID generation in step files. Different issue.
  • #5844 (hardcoded /tmp paths in fixtures): covers fixture JSON files, not module-level path constants in step files. Different issue.
  • #9789 (test data realism): covers ULID validity in fixture JSON, not module-level generation in step files. Different issue.

Search 3 — closed issues, keywords "session ID" + "module" + "parallel":
No closed issues found covering module-level session ID generation in step files.

Search 4 — keywords "ULID" + "step file" + "module":
No existing open or closed issue covers the specific pattern of _SESSION_ID = str(ULID()) at module scope in step files.

Search 5 — uncertainty check:
This finding is clearly distinct from all existing issues. The specific problem (module-level ULID generation creating shared state across scenarios) is not addressed by any existing issue.


Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor


Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata | Field | Value | |---|---| | **Branch** | `test/test-data-quality-fix-module-level-session-ids` | | **Commit Message** | `test(infra): move module-level ULID generation into per-scenario fixtures to prevent cross-scenario contamination` | | **Milestone** | v3.5.0 | | **Parent Epic** | — | --- ## Background and Context Three step files generate session/plan identifiers at **module load time** (i.e., when Python imports the module), not at scenario setup time. Because Python caches imported modules, these values are generated **once** and shared across every scenario that uses the step file. In parallel test runs (`behave-parallel`), multiple workers import the same module and share the same identifier, causing cross-scenario contamination. **Evidence — three affected files:** ### 1. `features/steps/session_cli_steps.py` (lines 28–29) ```python _SESSION_ID = str(ULID()) _SESSION_ID_2 = str(ULID()) ``` These two session IDs are generated once at import time. All 20+ step functions in this file that reference `_SESSION_ID` or `_SESSION_ID_2` share the same values across every scenario in the test run. ### 2. `features/steps/session_cli_coverage_boost_steps.py` (line 47) ```python _ULID1 = "01SCVBST000000000000000001" ``` A static string used as a session ID across 20+ step functions. While not a ULID generator, this is a hardcoded shared identifier that creates the same cross-scenario contamination risk. ### 3. `features/steps/plan_executor_edge_cases_coverage_steps.py` (line 37) ```python EDGE3_SANDBOX_ROOT = "/tmp/edge3-sandbox" ``` A module-level path constant shared across all scenarios. In parallel runs, two workers executing scenarios from this step file will both attempt to use `/tmp/edge3-sandbox`, causing filesystem conflicts. --- ## Impact 1. **Parallel test contamination**: In `behave-parallel` runs, two workers executing scenarios from `session_cli_steps.py` will use the same `_SESSION_ID`. If both scenarios write to a database keyed by session ID, they will overwrite each other's data. 2. **Non-deterministic failures**: Tests may pass in serial runs but fail intermittently in parallel runs, making failures hard to reproduce. 3. **False positives**: A scenario that should fail (because it creates a new session) may pass because it finds data left by a previous scenario with the same ID. --- ## Expected Behavior Session IDs and sandbox paths should be generated **per scenario** in `before_scenario` hooks or in `@given` step implementations, not at module load time. Each scenario should receive a fresh, unique identifier. **Correct pattern** (already used in many other step files): ```python @given("a new session is created") def step_create_session(context): context.session_id = str(ULID()) # Generated fresh per scenario ``` --- ## Acceptance Criteria - [ ] `_SESSION_ID` and `_SESSION_ID_2` in `session_cli_steps.py` are removed from module scope; session IDs are generated per-scenario (e.g., in `before_scenario` or in the relevant `@given` step) - [ ] `_ULID1` in `session_cli_coverage_boost_steps.py` is replaced with a per-scenario generated value - [ ] `EDGE3_SANDBOX_ROOT` in `plan_executor_edge_cases_coverage_steps.py` is replaced with a per-scenario temporary directory (using `tempfile.mkdtemp()`) - [ ] All nox stages pass without regressions - [ ] Coverage >= 97% --- ## Subtasks - [ ] Audit `session_cli_steps.py` for all usages of `_SESSION_ID` and `_SESSION_ID_2`; move generation to `before_scenario` or per-step context attribute - [ ] Audit `session_cli_coverage_boost_steps.py` for all usages of `_ULID1`; replace with per-scenario generated ULID stored in `context` - [ ] Audit `plan_executor_edge_cases_coverage_steps.py` for all usages of `EDGE3_SANDBOX_ROOT`; replace with `tempfile.mkdtemp()` in `before_scenario` or the relevant `@given` step - [ ] Run `nox -s unit_tests` to confirm no regressions - [ ] Run `nox -s coverage_report` to confirm coverage >= 97% --- ## Definition of Done This issue should be closed when: - [ ] No module-level ULID generation or shared session ID constants remain in `session_cli_steps.py`, `session_cli_coverage_boost_steps.py`, or `plan_executor_edge_cases_coverage_steps.py` - [ ] All session IDs and sandbox paths are generated fresh per scenario - [ ] All nox stages pass - [ ] Coverage >= 97% --- ### Duplicate Check **Search 1 — open issues, keywords "module-level" + "session ID":** Searched open issues for "module-level", "session ID", "ULID module", "import time". No existing issue covers module-level ULID generation at import time in step files. **Search 2 — cross-area analysis:** - #10082 (`context.temp_dir` cleanup): covers missing `after_scenario` cleanup, not module-level generation. Different issue. - #7087 (tempfile.mktemp() race condition): covers `tempfile.mktemp()` in `environment.py`, not module-level ULID generation in step files. Different issue. - #5844 (hardcoded /tmp paths in fixtures): covers fixture JSON files, not module-level path constants in step files. Different issue. - #9789 (test data realism): covers ULID validity in fixture JSON, not module-level generation in step files. Different issue. **Search 3 — closed issues, keywords "session ID" + "module" + "parallel":** No closed issues found covering module-level session ID generation in step files. **Search 4 — keywords "ULID" + "step file" + "module":** No existing open or closed issue covers the specific pattern of `_SESSION_ID = str(ULID())` at module scope in step files. **Search 5 — uncertainty check:** This finding is clearly distinct from all existing issues. The specific problem (module-level ULID generation creating shared state across scenarios) is not addressed by any existing issue. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10245
No description provided.