[AUTO-INF-4] Flaky test: memory_service_coverage_steps.py and resource_registry_model_steps.py use patched time.sleep(0.01) to ensure timestamp ordering — may fail on fast CI runners #10310

Open
opened 2026-04-18 08:33:32 +00:00 by HAL9000 · 0 comments
Owner

Metadata

Field Value
Branch fix/auto-inf-4-timestamp-ordering-sleep
Commit Message fix(tests): replace patched time.sleep with monotonic busy-wait in timestamp-ordering steps
Milestone v3.5.0
Parent Epic

Background and Context

Three step files use time.sleep(0.01) (10 ms) to ensure that two consecutive datetime.now() calls return different values, so that timestamp-ordering assertions pass:

1. features/steps/memory_service_coverage_steps.py

Line 437 (step_track_same_entity):

time.sleep(0.01)  # Small delay to ensure time difference
context.tracked_entity = context.memory_service.track_entity(...)

Lines 506–508 (step_memory_service_entities_over_time):

context.memory_service.track_entity("oldest", EntityType.PROJECT)
time.sleep(0.01)
context.memory_service.track_entity("middle", EntityType.PLAN)
time.sleep(0.01)
context.memory_service.track_entity("newest", EntityType.FILE)

Line 763 (step_track_same_entity_with_additional_metadata):

time.sleep(0.01)  # Small delay to ensure time difference
context.tracked_entity = context.memory_service.track_entity(...)

2. features/steps/resource_registry_model_steps.py

Line 479 (step_call_with_updated_at):

time.sleep(0.01)  # Ensure timestamp differs
context.new_resource = context.resource.with_updated_at()

The features/environment.py before_all hook installs a fast-sleep patch (_install_fast_sleep_patch) that caps time.sleep() at 10 ms globally:

_MAX_SLEEP = 0.01  # 10 ms cap

def _capped_sleep(seconds: float) -> None:
    time._original_sleep(min(seconds, _MAX_SLEEP))

time.sleep = _capped_sleep

The requested sleep of 0.01 seconds (10 ms) is exactly at the cap. The actual sleep duration is min(0.01, 0.01) = 0.01 seconds — the patch does not reduce it. However, on fast CI runners (e.g., the python:3.13-slim Docker container), the system clock resolution may be coarser than 10 ms, meaning that two consecutive datetime.now() calls separated by a 10 ms sleep may still return the same microsecond value.

This is the exact same root cause as the previously fixed test_example_flaky_test (documented in features/test_infra_flaky_test_example.feature):

The root cause was that the async-job heartbeat step used a bare time.sleep(0.01) that was insufficient on fast CI runners, causing the "heartbeat updated" assertion to fail intermittently when two consecutive datetime.now(UTC) calls returned the same microsecond value.

The fix replaces the fixed sleep with a monotonic busy-wait that spins until the clock has actually advanced, making the step deterministic on any hardware.

The same fix pattern should be applied to these three step files.

Correct pattern (from features/steps/async_execution_steps.py line 378–382):

# Busy-wait until the clock advances past the previous heartbeat
# to avoid flaky failures on fast systems where 10ms may not be
# enough to produce a distinct datetime.now(UTC) value.
while datetime.now(UTC) <= before:
    time.sleep(0.001)

Or using _original_sleep for the busy-wait:

_real_sleep = getattr(time, "_original_sleep", time.sleep)
before = datetime.now(UTC)
while datetime.now(UTC) <= before:
    _real_sleep(0.001)

Impact

  • Intermittent CI failures in unit_tests job: timestamp-ordering assertions fail when the system clock does not advance in 10 ms.
  • Affected features: features/memory_service_coverage.feature (or equivalent), features/resource_registry_model.feature (or equivalent).
  • Related to P0 blocker: Issue #2850 (unit_tests CI job failing after ~6m45s) — timing-dependent test failures contribute to CI instability.
  • Precedent: This exact pattern was already fixed for async_execution_steps.py (documented in features/test_infra_flaky_test_example.feature). The same fix should be applied consistently.

Expected Behavior

Timestamp-ordering steps should use a monotonic busy-wait that spins until the clock has actually advanced, making the step deterministic on any hardware — not a fixed-duration sleep that may be insufficient on fast CI runners.


Acceptance Criteria

  • All four time.sleep(0.01) calls in memory_service_coverage_steps.py (lines 437, 506, 508, 763) are replaced with a monotonic busy-wait using _original_sleep
  • The time.sleep(0.01) call in resource_registry_model_steps.py (line 479) is replaced with a monotonic busy-wait using _original_sleep
  • All affected scenarios pass consistently in parallel test runs on fast CI runners
  • nox -s unit_tests passes without intermittent timestamp-ordering failures
  • Coverage remains >= 97%

Subtasks

  • Update step_track_same_entity in memory_service_coverage_steps.py (line 437): replace time.sleep(0.01) with a monotonic busy-wait
  • Update step_memory_service_entities_over_time in memory_service_coverage_steps.py (lines 506–508): replace both time.sleep(0.01) calls with monotonic busy-waits
  • Update step_track_same_entity_with_additional_metadata in memory_service_coverage_steps.py (line 763): replace time.sleep(0.01) with a monotonic busy-wait
  • Update step_call_with_updated_at in resource_registry_model_steps.py (line 479): replace time.sleep(0.01) with a monotonic busy-wait
  • Verify all affected scenarios pass in nox -s unit_tests
  • Run nox (all default sessions), fix any errors
  • Verify coverage >= 97% via nox -s coverage_report

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional details.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Duplicate Check

Check Query Result
1. Open issues keyword search memory_service_coverage, resource_registry_model, timestamp ordering, time.sleep timestamp, datetime.now flaky No existing issues found covering this specific timestamp-ordering sleep pattern in these step files
2. Cross-area search [AUTO-INF-*] issues (#10301, #10295, #10245, #10244, #10241, #10250) None address memory_service_coverage_steps.py or resource_registry_model_steps.py timestamp ordering
3. Closed issues search timestamp ordering, datetime.now flaky, time.sleep timestamp, memory service flaky No closed issues found covering this pattern (pages 1–10 searched)
4. Similar proposals Issue #10295 ([AUTO-INF-4]) covers uko_indexer debounce — different step file. Issue #10301 ([AUTO-INF-4]) covers devcontainer thread startup — different step file. The test_infra_flaky_test_example.feature documents the fix for async_execution_steps.py but does not cover these three step files. No overlap with this specific timestamp-ordering issue
5. Uncertainty check This is a specific, well-defined bug: time.sleep(0.01) at exactly the fast-sleep cap may not advance the clock on fast CI runners. The same root cause was already fixed for async_execution_steps.py. Confident this is a new, distinct issue for the remaining step files. Not a duplicate

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor

## Metadata | Field | Value | |---|---| | **Branch** | `fix/auto-inf-4-timestamp-ordering-sleep` | | **Commit Message** | `fix(tests): replace patched time.sleep with monotonic busy-wait in timestamp-ordering steps` | | **Milestone** | v3.5.0 | | **Parent Epic** | — | --- ## Background and Context Three step files use `time.sleep(0.01)` (10 ms) to ensure that two consecutive `datetime.now()` calls return different values, so that timestamp-ordering assertions pass: ### 1. `features/steps/memory_service_coverage_steps.py` **Line 437** (`step_track_same_entity`): ```python time.sleep(0.01) # Small delay to ensure time difference context.tracked_entity = context.memory_service.track_entity(...) ``` **Lines 506–508** (`step_memory_service_entities_over_time`): ```python context.memory_service.track_entity("oldest", EntityType.PROJECT) time.sleep(0.01) context.memory_service.track_entity("middle", EntityType.PLAN) time.sleep(0.01) context.memory_service.track_entity("newest", EntityType.FILE) ``` **Line 763** (`step_track_same_entity_with_additional_metadata`): ```python time.sleep(0.01) # Small delay to ensure time difference context.tracked_entity = context.memory_service.track_entity(...) ``` ### 2. `features/steps/resource_registry_model_steps.py` **Line 479** (`step_call_with_updated_at`): ```python time.sleep(0.01) # Ensure timestamp differs context.new_resource = context.resource.with_updated_at() ``` The `features/environment.py` `before_all` hook installs a **fast-sleep patch** (`_install_fast_sleep_patch`) that caps `time.sleep()` at **10 ms** globally: ```python _MAX_SLEEP = 0.01 # 10 ms cap def _capped_sleep(seconds: float) -> None: time._original_sleep(min(seconds, _MAX_SLEEP)) time.sleep = _capped_sleep ``` The requested sleep of `0.01` seconds (10 ms) is **exactly at the cap**. The actual sleep duration is `min(0.01, 0.01) = 0.01` seconds — the patch does not reduce it. However, on fast CI runners (e.g., the `python:3.13-slim` Docker container), the system clock resolution may be coarser than 10 ms, meaning that two consecutive `datetime.now()` calls separated by a 10 ms sleep may still return the same microsecond value. **This is the exact same root cause** as the previously fixed `test_example_flaky_test` (documented in `features/test_infra_flaky_test_example.feature`): > The root cause was that the async-job heartbeat step used a bare `time.sleep(0.01)` that was insufficient on fast CI runners, causing the "heartbeat updated" assertion to fail intermittently when two consecutive `datetime.now(UTC)` calls returned the same microsecond value. > > The fix replaces the fixed sleep with a monotonic busy-wait that spins until the clock has actually advanced, making the step deterministic on any hardware. The same fix pattern should be applied to these three step files. **Correct pattern** (from `features/steps/async_execution_steps.py` line 378–382): ```python # Busy-wait until the clock advances past the previous heartbeat # to avoid flaky failures on fast systems where 10ms may not be # enough to produce a distinct datetime.now(UTC) value. while datetime.now(UTC) <= before: time.sleep(0.001) ``` Or using `_original_sleep` for the busy-wait: ```python _real_sleep = getattr(time, "_original_sleep", time.sleep) before = datetime.now(UTC) while datetime.now(UTC) <= before: _real_sleep(0.001) ``` --- ## Impact - **Intermittent CI failures** in `unit_tests` job: timestamp-ordering assertions fail when the system clock does not advance in 10 ms. - **Affected features**: `features/memory_service_coverage.feature` (or equivalent), `features/resource_registry_model.feature` (or equivalent). - **Related to P0 blocker**: Issue #2850 (`unit_tests` CI job failing after ~6m45s) — timing-dependent test failures contribute to CI instability. - **Precedent**: This exact pattern was already fixed for `async_execution_steps.py` (documented in `features/test_infra_flaky_test_example.feature`). The same fix should be applied consistently. --- ## Expected Behavior Timestamp-ordering steps should use a monotonic busy-wait that spins until the clock has actually advanced, making the step deterministic on any hardware — not a fixed-duration sleep that may be insufficient on fast CI runners. --- ## Acceptance Criteria - [ ] All four `time.sleep(0.01)` calls in `memory_service_coverage_steps.py` (lines 437, 506, 508, 763) are replaced with a monotonic busy-wait using `_original_sleep` - [ ] The `time.sleep(0.01)` call in `resource_registry_model_steps.py` (line 479) is replaced with a monotonic busy-wait using `_original_sleep` - [ ] All affected scenarios pass consistently in parallel test runs on fast CI runners - [ ] `nox -s unit_tests` passes without intermittent timestamp-ordering failures - [ ] Coverage remains >= 97% --- ## Subtasks - [ ] Update `step_track_same_entity` in `memory_service_coverage_steps.py` (line 437): replace `time.sleep(0.01)` with a monotonic busy-wait - [ ] Update `step_memory_service_entities_over_time` in `memory_service_coverage_steps.py` (lines 506–508): replace both `time.sleep(0.01)` calls with monotonic busy-waits - [ ] Update `step_track_same_entity_with_additional_metadata` in `memory_service_coverage_steps.py` (line 763): replace `time.sleep(0.01)` with a monotonic busy-wait - [ ] Update `step_call_with_updated_at` in `resource_registry_model_steps.py` (line 479): replace `time.sleep(0.01)` with a monotonic busy-wait - [ ] Verify all affected scenarios pass in `nox -s unit_tests` - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >= 97% via `nox -s coverage_report` --- ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional details. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- ### Duplicate Check | Check | Query | Result | |-------|-------|--------| | 1. Open issues keyword search | `memory_service_coverage`, `resource_registry_model`, `timestamp ordering`, `time.sleep timestamp`, `datetime.now flaky` | No existing issues found covering this specific timestamp-ordering sleep pattern in these step files | | 2. Cross-area search | `[AUTO-INF-*]` issues (#10301, #10295, #10245, #10244, #10241, #10250) | None address memory_service_coverage_steps.py or resource_registry_model_steps.py timestamp ordering | | 3. Closed issues search | `timestamp ordering`, `datetime.now flaky`, `time.sleep timestamp`, `memory service flaky` | No closed issues found covering this pattern (pages 1–10 searched) | | 4. Similar proposals | Issue #10295 ([AUTO-INF-4]) covers uko_indexer debounce — different step file. Issue #10301 ([AUTO-INF-4]) covers devcontainer thread startup — different step file. The `test_infra_flaky_test_example.feature` documents the fix for `async_execution_steps.py` but does not cover these three step files. | No overlap with this specific timestamp-ordering issue | | 5. Uncertainty check | This is a specific, well-defined bug: `time.sleep(0.01)` at exactly the fast-sleep cap may not advance the clock on fast CI runners. The same root cause was already fixed for `async_execution_steps.py`. Confident this is a new, distinct issue for the remaining step files. | Not a duplicate | --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10310
No description provided.