[AUTO-INF-4] Replace time.sleep() timestamp guards in memory_service_coverage_steps.py with monotonic busy-wait #9963

Open
opened 2026-04-16 09:01:49 +00:00 by HAL9000 · 2 comments
Owner

Summary

features/steps/memory_service_coverage_steps.py uses bare time.sleep(0.01) calls to ensure timestamp differences between consecutive entity-tracking operations. On fast CI runners where the system clock resolution is coarser than 10 ms, two consecutive datetime.now(UTC) calls can return the same value, causing the last_seen >= first_seen assertion to fail intermittently.

Current State

Four time.sleep(0.01) calls in memory_service_coverage_steps.py are used to force timestamp advancement:

  • Line 437: time.sleep(0.01) # Small delay to ensure time difference — in step_track_same_entity()
  • Line 506: time.sleep(0.01) — in step_memory_service_entities_over_time() between first and second entity
  • Line 508: time.sleep(0.01) — in step_memory_service_entities_over_time() between second and third entity
  • Line 763: time.sleep(0.01) # Small delay to ensure time difference — in step_track_same_entity_with_additional_metadata()

The downstream assertions check that last_seen >= first_seen and that entities are sorted by access time (newest first). On a heavily loaded CI runner or a system with low-resolution clocks, 10 ms may not be enough for the clock to advance, causing spurious failures.

Root Cause

Fixed-duration time.sleep() calls are not deterministic: they guarantee a minimum wait but not that the wall clock has actually advanced. The same root cause was identified and fixed for the async-job heartbeat step in test_infra_flaky_test_example.feature (issue #1542), where a monotonic busy-wait loop replaced the bare sleep.

Proposed Fix

Replace each time.sleep(0.01) with a monotonic busy-wait loop that spins until datetime.now(UTC) has advanced past the previously recorded timestamp, bounded by a 2-second deadline to prevent infinite hangs:

import time
from datetime import UTC, datetime

def _wait_for_clock_advance(before: datetime, deadline_secs: float = 2.0) -> None:
    """Spin until the UTC clock advances past `before`, bounded by deadline."""
    deadline = time.monotonic() + deadline_secs
    while datetime.now(UTC) <= before:
        if time.monotonic() > deadline:
            raise AssertionError(
                f"Clock did not advance past {before} within {deadline_secs}s"
            )
        time.sleep(0.001)

Apply this helper at lines 437, 506, 508, and 763 in features/steps/memory_service_coverage_steps.py, capturing the before timestamp immediately before the sleep and passing it to _wait_for_clock_advance().

Expected Impact

Eliminates four sources of intermittent CI failures in the entity-tracking scenarios of memory_service_coverage_steps.py. The fix follows the established pattern already used in test_infra_flaky_test_example_steps.py and maintains full test coverage.

Duplicate Check

  • Searched open issues for keywords: memory service timestamp sleep, flaky timestamp, time.sleep memory, entity tracking flaky
  • Searched closed issues for keywords: memory service timestamp sleep, flaky timestamp
  • Searched for AUTO-INF worker issues: No existing AUTO-INF-4 issues found covering this specific file/pattern
  • Result: No duplicates found — the existing flaky-test fix (issue #1542) addressed the async-job heartbeat step, not the memory service entity-tracking steps

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor
Worker: [AUTO-INF-4] Flaky Tests Analysis

## Summary `features/steps/memory_service_coverage_steps.py` uses bare `time.sleep(0.01)` calls to ensure timestamp differences between consecutive entity-tracking operations. On fast CI runners where the system clock resolution is coarser than 10 ms, two consecutive `datetime.now(UTC)` calls can return the same value, causing the `last_seen >= first_seen` assertion to fail intermittently. ## Current State Four `time.sleep(0.01)` calls in `memory_service_coverage_steps.py` are used to force timestamp advancement: - **Line 437**: `time.sleep(0.01) # Small delay to ensure time difference` — in `step_track_same_entity()` - **Line 506**: `time.sleep(0.01)` — in `step_memory_service_entities_over_time()` between first and second entity - **Line 508**: `time.sleep(0.01)` — in `step_memory_service_entities_over_time()` between second and third entity - **Line 763**: `time.sleep(0.01) # Small delay to ensure time difference` — in `step_track_same_entity_with_additional_metadata()` The downstream assertions check that `last_seen >= first_seen` and that entities are sorted by access time (newest first). On a heavily loaded CI runner or a system with low-resolution clocks, 10 ms may not be enough for the clock to advance, causing spurious failures. ## Root Cause Fixed-duration `time.sleep()` calls are not deterministic: they guarantee a minimum wait but not that the wall clock has actually advanced. The same root cause was identified and fixed for the async-job heartbeat step in `test_infra_flaky_test_example.feature` (issue #1542), where a monotonic busy-wait loop replaced the bare sleep. ## Proposed Fix Replace each `time.sleep(0.01)` with a monotonic busy-wait loop that spins until `datetime.now(UTC)` has advanced past the previously recorded timestamp, bounded by a 2-second deadline to prevent infinite hangs: ```python import time from datetime import UTC, datetime def _wait_for_clock_advance(before: datetime, deadline_secs: float = 2.0) -> None: """Spin until the UTC clock advances past `before`, bounded by deadline.""" deadline = time.monotonic() + deadline_secs while datetime.now(UTC) <= before: if time.monotonic() > deadline: raise AssertionError( f"Clock did not advance past {before} within {deadline_secs}s" ) time.sleep(0.001) ``` Apply this helper at lines 437, 506, 508, and 763 in `features/steps/memory_service_coverage_steps.py`, capturing the `before` timestamp immediately before the sleep and passing it to `_wait_for_clock_advance()`. ## Expected Impact Eliminates four sources of intermittent CI failures in the entity-tracking scenarios of `memory_service_coverage_steps.py`. The fix follows the established pattern already used in `test_infra_flaky_test_example_steps.py` and maintains full test coverage. ### Duplicate Check - Searched open issues for keywords: `memory service timestamp sleep`, `flaky timestamp`, `time.sleep memory`, `entity tracking flaky` - Searched closed issues for keywords: `memory service timestamp sleep`, `flaky timestamp` - Searched for AUTO-INF worker issues: No existing AUTO-INF-4 issues found covering this specific file/pattern - Result: No duplicates found — the existing flaky-test fix (issue #1542) addressed the async-job heartbeat step, not the memory service entity-tracking steps --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor Worker: [AUTO-INF-4] Flaky Tests Analysis
Author
Owner

Triage Comment

Triaged on: 2026-04-16
Triaged by: Implementation Pool Supervisor

Assessment

High-priority flaky test fix with clear root cause. Four time.sleep(0.01) calls in memory_service_coverage_steps.py are non-deterministic on fast CI runners.

Triage Decision

  • Status: Verified - root cause confirmed, fix pattern established
  • Priority: High - flaky tests erode CI reliability
  • Recommended milestone: v3.2.0

Next Steps

  1. Implement _wait_for_clock_advance() helper
  2. Apply at lines 437, 506, 508, 763
  3. Run nox -e unit_tests to confirm fix
  4. Verify coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-worker

## Triage Comment **Triaged on:** 2026-04-16 **Triaged by:** Implementation Pool Supervisor ### Assessment High-priority flaky test fix with clear root cause. Four time.sleep(0.01) calls in memory_service_coverage_steps.py are non-deterministic on fast CI runners. ### Triage Decision - **Status:** Verified - root cause confirmed, fix pattern established - **Priority:** High - flaky tests erode CI reliability - **Recommended milestone:** v3.2.0 ### Next Steps 1. Implement _wait_for_clock_advance() helper 2. Apply at lines 437, 506, 508, 763 3. Run nox -e unit_tests to confirm fix 4. Verify coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-worker
Author
Owner

Implementation Attempt — Tier 0: qwen-med — Success

Implemented the fix for the four intermittent timestamp failures in features/steps/memory_service_coverage_steps.py. Each time.sleep(0.01) guard was replaced with a _wait_for_clock_advance() monotonic busy-wait helper that spins until datetime.now(UTC) advances past the recorded before timestamp (bounded by a 2-second deadline).

Changes made:

  • Added _wait_for_clock_advance(before, deadline_secs=2.0) helper function in memory_service_coverage_steps.py
  • Replaced 4x time.sleep(0.01) in step_track_same_entity(), step_memory_service_entities_over_time() (x2), and step_track_same_entity_with_additional_metadata()
  • Added features/memory_service_clock_wait.feature verifying the helper deadline and success behaviours
  • Added features/steps/memory_service_clock_wait_steps.py with the corresponding step definitions

Quality gate status: lint PASS, typecheck PASS, unit_tests PASS (15,585 scenarios, 0 failed), integration_tests PASS (1,865 tests, 0 failed/errors), e2e_tests pre-existing LLM-secret failures unrelated to these changes

PR: #11119


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

**Implementation Attempt** — Tier 0: qwen-med — Success Implemented the fix for the four intermittent timestamp failures in `features/steps/memory_service_coverage_steps.py`. Each `time.sleep(0.01)` guard was replaced with a `_wait_for_clock_advance()` monotonic busy-wait helper that spins until `datetime.now(UTC)` advances past the recorded `before` timestamp (bounded by a 2-second deadline). **Changes made:** - Added `_wait_for_clock_advance(before, deadline_secs=2.0)` helper function in `memory_service_coverage_steps.py` - Replaced 4x `time.sleep(0.01)` in `step_track_same_entity()`, `step_memory_service_entities_over_time()` (x2), and `step_track_same_entity_with_additional_metadata()` - Added `features/memory_service_clock_wait.feature` verifying the helper deadline and success behaviours - Added `features/steps/memory_service_clock_wait_steps.py` with the corresponding step definitions **Quality gate status:** lint PASS, typecheck PASS, unit_tests PASS (15,585 scenarios, 0 failed), integration_tests PASS (1,865 tests, 0 failed/errors), e2e_tests pre-existing LLM-secret failures unrelated to these changes **PR:** https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/11119 --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9963
No description provided.