chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions #11119

Open
HAL9000 wants to merge 1 commit from bugfix/m3.6.0-ci-pipeline-flakiness-stabilization into master
Owner

Summary

Replace four time.sleep(0.01) timestamp guards in memory_service_coverage_steps.py with a deterministic _wait_for_clock_advance() helper that busy-waits on the monotonic clock until the UTC clock actually advances past the recorded before timestamp, bounded by a 2-second deadline.

This eliminates four sources of intermittent CI failures in the entity-tracking scenarios where two consecutive datetime.now(UTC) calls could return the same microsecond value on fast CI runners.

Changes

  • features/steps/memory_service_coverage_steps.py: Added _wait_for_clock_advance() helper; replaced 4× time.sleep(0.01) in step_track_same_entity(), step_memory_service_entities_over_time() (×2), and step_track_same_entity_with_additional_metadata()
  • features/memory_service_clock_wait.feature: New Behave feature verifying the helper raises AssertionError on deadline exceeded and returns normally once the clock advances
  • features/steps/memory_service_clock_wait_steps.py: Step definitions for the above feature

Quality Gates

  • nox -e lint
  • nox -e typecheck
  • nox -e unit_tests (15,585 scenarios, 0 failed)
  • nox -e integration_tests (1,865 tests, 0 failed/errors)
  • nox -e e2e_tests (pre-existing LLM-secret failures unrelated to these changes)

References

Closes #9963


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

## Summary Replace four `time.sleep(0.01)` timestamp guards in `memory_service_coverage_steps.py` with a deterministic `_wait_for_clock_advance()` helper that busy-waits on the monotonic clock until the UTC clock actually advances past the recorded `before` timestamp, bounded by a 2-second deadline. This eliminates four sources of intermittent CI failures in the entity-tracking scenarios where two consecutive `datetime.now(UTC)` calls could return the same microsecond value on fast CI runners. ## Changes - **`features/steps/memory_service_coverage_steps.py`**: Added `_wait_for_clock_advance()` helper; replaced 4× `time.sleep(0.01)` in `step_track_same_entity()`, `step_memory_service_entities_over_time()` (×2), and `step_track_same_entity_with_additional_metadata()` - **`features/memory_service_clock_wait.feature`**: New Behave feature verifying the helper raises `AssertionError` on deadline exceeded and returns normally once the clock advances - **`features/steps/memory_service_clock_wait_steps.py`**: Step definitions for the above feature ## Quality Gates - ✅ `nox -e lint` - ✅ `nox -e typecheck` - ✅ `nox -e unit_tests` (15,585 scenarios, 0 failed) - ✅ `nox -e integration_tests` (1,865 tests, 0 failed/errors) - ✅ `nox -e e2e_tests` (pre-existing LLM-secret failures unrelated to these changes) ## References Closes #9963 --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: task-implementor
chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 52s
CI / push-validation (pull_request) Successful in 52s
CI / build (pull_request) Successful in 1m14s
CI / lint (pull_request) Successful in 1m22s
CI / benchmark-regression (pull_request) Failing after 1m33s
CI / quality (pull_request) Successful in 1m44s
CI / typecheck (pull_request) Successful in 1m48s
CI / security (pull_request) Successful in 2m1s
CI / integration_tests (pull_request) Successful in 4m11s
CI / e2e_tests (pull_request) Successful in 4m11s
CI / unit_tests (pull_request) Successful in 5m15s
CI / docker (pull_request) Successful in 1m28s
CI / coverage (pull_request) Successful in 11m17s
CI / status-check (pull_request) Successful in 4s
9f8db7c8b3
Replace four time.sleep(0.01) timestamp guards in
memory_service_coverage_steps.py with a deterministic
_wait_for_clock_advance() helper that busy-waits on the monotonic
clock until the UTC clock actually advances past the recorded
before timestamp, bounded by a 2-second deadline.

Also introduce memory_service_clock_wait.feature and its step
definitions to verify the helper raises AssertionError on deadline
exceeded and returns normally once the clock advances.

Closes #9963
HAL9001 left a comment

First Review — PR #11119: chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions

Overall Assessment

The core implementation is well-engineered: _wait_for_clock_advance() is deterministic, correctly bounded, type-annotated, well-documented, and the new Behave feature verifying the helper behavior is a solid addition. The approach correctly follows the established pattern from issue #1542.

However, several blocking process compliance issues prevent approval. This PR addresses a Type/Bug issue (#9963), which triggers the mandatory TDD workflow — and that workflow was not followed. There are also commit footer, PR metadata, and Forgejo dependency direction issues that must be corrected.


CI Status

All 5 required gates pass: lint, typecheck, security, unit_tests, coverage
⚠️ CI / benchmark-regression is failing (1m33s). This check is not a required merge gate, and the failure appears unrelated to the changes in this PR (no benchmark code was modified). The author should verify this failure is pre-existing and not introduced by this PR.


Blocking Issues

1. TDD Workflow Not Followed (BLOCKING)

Issue #9963 is labelled Type/Bug. Per CONTRIBUTING.md §"Bug Fix Workflow", ALL bug fixes must follow the mandatory TDD workflow:

  1. A companion Type/Testing issue titled TDD: <description> must exist and be closed before this fix PR can land.
  2. That TDD issue must have introduced a test tagged @tdd_issue, @tdd_issue_9963, and @tdd_expected_fail in a separate tdd/mN-ci-pipeline-flakiness-stabilization branch.
  3. The bug fix PR must remove @tdd_expected_fail (leaving @tdd_issue and @tdd_issue_9963 permanently).
  4. The new scenarios in memory_service_clock_wait.feature must carry @tdd_issue @tdd_issue_9963 tags to serve as the permanent regression guard.

Neither a TDD companion issue nor a tdd/ branch with the matching suffix exists. The new scenarios in memory_service_clock_wait.feature use @mock_only but lack the required @tdd_issue and @tdd_issue_9963 tags.

How to fix:
a) Create a companion Type/Testing issue titled TDD: [AUTO-INF-4] Replace time.sleep() timestamp guards in memory_service_coverage_steps.py with monotonic busy-wait with Priority/Critical and MoSCoW/Must Have.
b) Set issue #9963 to depend on (be blocked by) the TDD issue.
c) Add @tdd_issue @tdd_issue_9963 tags to the scenarios in memory_service_clock_wait.feature (the deadline-exceeded scenario is the regression guard for the original flaky behavior; the normal-advance scenario validates the fix).
d) Ensure the TDD issue is closed (its work is represented by this PR).

The commit footer reads Closes #9963 but the project requires ISSUES CLOSED: #9963 format. All recent commits on master use ISSUES CLOSED: #N (e.g., ISSUES CLOSED: #4300, ISSUES CLOSED: #9163). The Closes keyword is a GitHub/Forgejo PR-level autoclose feature — it does not satisfy the commit traceability requirement.

How to fix: Amend the commit footer to read ISSUES CLOSED: #9963.

3. No Milestone Assigned to PR (BLOCKING)

The PR has no milestone assigned. Per CONTRIBUTING.md, every PR must be assigned to the same milestone as its linked issue(s). Issue #9963 has no milestone either — this should be corrected on both the issue and the PR before merging.

How to fix: Assign the appropriate milestone (e.g., v3.6.0 based on the branch name) to both issue #9963 and this PR.

4. No Type/ Label on PR (BLOCKING)

The PR has no Type/ label. Per CONTRIBUTING.md §"Pull Request Requirements" item 12, exactly one Type/ label must be applied. For a bug fix this should be Type/Bug.

How to fix: Add Type/Bug label to this PR.

5. Forgejo Dependency Direction Missing (BLOCKING)

The PR must "block" issue #9963 in the Forgejo dependency graph (PR → blocks → issue). Currently no dependency link exists between PR #11119 and issue #9963. Without this, the issue cannot auto-close when the PR merges, and the PR/issue lifecycle tracking is broken.

How to fix: On this PR, add issue #9963 under "blocks".

6. CHANGELOG Not Updated (BLOCKING)

No entry for this fix appears in CHANGELOG.md. Per CONTRIBUTING.md §"Pull Request Requirements" item 7, the changelog must be updated with one entry per commit.

How to fix: Add an entry to the [Unreleased] section of CHANGELOG.md describing the fix (e.g., under ### Fixed: "Replaced four time.sleep(0.01) timestamp guards in memory_service_coverage_steps.py with a deterministic _wait_for_clock_advance() helper to eliminate intermittent timestamp failures on fast CI runners (#9963).").


Non-Blocking Observations

Branch Name Convention Deviation

The branch bugfix/m3.6.0-ci-pipeline-flakiness-stabilization uses m3.6.0 as the milestone segment. Convention is bugfix/mN-<name> where N is just the milestone number digit (e.g., bugfix/m3-..., bugfix/m4-...). While this does not block merging, future branches should follow the simpler mN format.

Issue #9963 Priority Label

Issue #9963 is labelled Priority/High. Per CONTRIBUTING.md, all Type/Bug issues must be Priority/Critical. This is a process compliance issue on the issue itself (not this PR), but the reviewer should flag it: please update the priority label to Priority/Critical.


Code Quality Assessment (Passes)

Correctness: _wait_for_clock_advance() correctly solves the root cause. The busy-wait loop uses the monotonic clock for deadline enforcement (immune to wall-clock adjustments) while checking the UTC clock for the actual advancement condition. The 2-second deadline prevents infinite hangs. All 4 call sites correctly capture datetime.now(UTC) before the operation that needs the timestamp gap.

Test Quality: The new memory_service_clock_wait.feature tests both the deadline-exceeded error path and the normal-advance success path. Step definitions are well-documented with clear docstrings. The import of _wait_for_clock_advance from the coverage steps module is correct.

Type Safety: All new functions are fully type-annotated. No # type: ignore present.

Readability: Naming is clear, the helper docstring accurately describes behaviour, comments in the modified step functions are updated to match.

Security: No security concerns. No hardcoded secrets.

Code Style: SOLID principles respected. Files well within 500-line limit. Follows ruff conventions.

Documentation: Docstrings present on all new functions.


Please address the 6 blocking issues above and re-request review.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## First Review — PR #11119: chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions ### Overall Assessment The core implementation is well-engineered: `_wait_for_clock_advance()` is deterministic, correctly bounded, type-annotated, well-documented, and the new Behave feature verifying the helper behavior is a solid addition. The approach correctly follows the established pattern from issue #1542. However, several **blocking process compliance issues** prevent approval. This PR addresses a `Type/Bug` issue (#9963), which triggers the mandatory TDD workflow — and that workflow was not followed. There are also commit footer, PR metadata, and Forgejo dependency direction issues that must be corrected. --- ### CI Status ✅ **All 5 required gates pass**: lint, typecheck, security, unit_tests, coverage ⚠️ `CI / benchmark-regression` is **failing** (1m33s). This check is not a required merge gate, and the failure appears unrelated to the changes in this PR (no benchmark code was modified). The author should verify this failure is pre-existing and not introduced by this PR. --- ### Blocking Issues #### 1. TDD Workflow Not Followed (BLOCKING) Issue #9963 is labelled `Type/Bug`. Per CONTRIBUTING.md §"Bug Fix Workflow", ALL bug fixes **must** follow the mandatory TDD workflow: 1. A companion `Type/Testing` issue titled `TDD: <description>` must exist and be closed before this fix PR can land. 2. That TDD issue must have introduced a test tagged `@tdd_issue`, `@tdd_issue_9963`, and `@tdd_expected_fail` in a separate `tdd/mN-ci-pipeline-flakiness-stabilization` branch. 3. The bug fix PR must **remove** `@tdd_expected_fail` (leaving `@tdd_issue` and `@tdd_issue_9963` permanently). 4. The new scenarios in `memory_service_clock_wait.feature` must carry `@tdd_issue @tdd_issue_9963` tags to serve as the permanent regression guard. Neither a TDD companion issue nor a `tdd/` branch with the matching suffix exists. The new scenarios in `memory_service_clock_wait.feature` use `@mock_only` but lack the required `@tdd_issue` and `@tdd_issue_9963` tags. **How to fix:** a) Create a companion `Type/Testing` issue titled `TDD: [AUTO-INF-4] Replace time.sleep() timestamp guards in memory_service_coverage_steps.py with monotonic busy-wait` with `Priority/Critical` and `MoSCoW/Must Have`. b) Set issue #9963 to depend on (be blocked by) the TDD issue. c) Add `@tdd_issue @tdd_issue_9963` tags to the scenarios in `memory_service_clock_wait.feature` (the deadline-exceeded scenario is the regression guard for the original flaky behavior; the normal-advance scenario validates the fix). d) Ensure the TDD issue is closed (its work is represented by this PR). #### 2. Commit Footer Uses Wrong Format (BLOCKING) The commit footer reads `Closes #9963` but the project requires `ISSUES CLOSED: #9963` format. All recent commits on master use `ISSUES CLOSED: #N` (e.g., `ISSUES CLOSED: #4300`, `ISSUES CLOSED: #9163`). The `Closes` keyword is a GitHub/Forgejo PR-level autoclose feature — it does not satisfy the commit traceability requirement. **How to fix:** Amend the commit footer to read `ISSUES CLOSED: #9963`. #### 3. No Milestone Assigned to PR (BLOCKING) The PR has no milestone assigned. Per CONTRIBUTING.md, every PR must be assigned to the same milestone as its linked issue(s). Issue #9963 has no milestone either — this should be corrected on both the issue and the PR before merging. **How to fix:** Assign the appropriate milestone (e.g., v3.6.0 based on the branch name) to both issue #9963 and this PR. #### 4. No Type/ Label on PR (BLOCKING) The PR has no `Type/` label. Per CONTRIBUTING.md §"Pull Request Requirements" item 12, exactly one `Type/` label must be applied. For a bug fix this should be `Type/Bug`. **How to fix:** Add `Type/Bug` label to this PR. #### 5. Forgejo Dependency Direction Missing (BLOCKING) The PR must "block" issue #9963 in the Forgejo dependency graph (PR → blocks → issue). Currently no dependency link exists between PR #11119 and issue #9963. Without this, the issue cannot auto-close when the PR merges, and the PR/issue lifecycle tracking is broken. **How to fix:** On this PR, add issue #9963 under "blocks". #### 6. CHANGELOG Not Updated (BLOCKING) No entry for this fix appears in `CHANGELOG.md`. Per CONTRIBUTING.md §"Pull Request Requirements" item 7, the changelog must be updated with one entry per commit. **How to fix:** Add an entry to the `[Unreleased]` section of `CHANGELOG.md` describing the fix (e.g., under `### Fixed`: "Replaced four `time.sleep(0.01)` timestamp guards in `memory_service_coverage_steps.py` with a deterministic `_wait_for_clock_advance()` helper to eliminate intermittent timestamp failures on fast CI runners (#9963)."). --- ### Non-Blocking Observations #### Branch Name Convention Deviation The branch `bugfix/m3.6.0-ci-pipeline-flakiness-stabilization` uses `m3.6.0` as the milestone segment. Convention is `bugfix/mN-<name>` where N is just the milestone number digit (e.g., `bugfix/m3-...`, `bugfix/m4-...`). While this does not block merging, future branches should follow the simpler `mN` format. #### Issue #9963 Priority Label Issue #9963 is labelled `Priority/High`. Per CONTRIBUTING.md, all `Type/Bug` issues **must** be `Priority/Critical`. This is a process compliance issue on the issue itself (not this PR), but the reviewer should flag it: please update the priority label to `Priority/Critical`. --- ### Code Quality Assessment (Passes) ✅ **Correctness**: `_wait_for_clock_advance()` correctly solves the root cause. The busy-wait loop uses the monotonic clock for deadline enforcement (immune to wall-clock adjustments) while checking the UTC clock for the actual advancement condition. The 2-second deadline prevents infinite hangs. All 4 call sites correctly capture `datetime.now(UTC)` before the operation that needs the timestamp gap. ✅ **Test Quality**: The new `memory_service_clock_wait.feature` tests both the deadline-exceeded error path and the normal-advance success path. Step definitions are well-documented with clear docstrings. The import of `_wait_for_clock_advance` from the coverage steps module is correct. ✅ **Type Safety**: All new functions are fully type-annotated. No `# type: ignore` present. ✅ **Readability**: Naming is clear, the helper docstring accurately describes behaviour, comments in the modified step functions are updated to match. ✅ **Security**: No security concerns. No hardcoded secrets. ✅ **Code Style**: SOLID principles respected. Files well within 500-line limit. Follows ruff conventions. ✅ **Documentation**: Docstrings present on all new functions. --- Please address the 6 blocking issues above and re-request review. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +13,4 @@
# deadline is exceeded.
@mock_only
Scenario: Clock-advance guard raises AssertionError when deadline is exceeded
Owner

BLOCKING — TDD regression tags missing

This scenario verifies the deadline-exceeded error path introduced to fix bug #9963. It must carry the permanent TDD regression tags so it serves as the regression guard for this bug:

  @tdd_issue @tdd_issue_9963
  Scenario: Clock-advance guard raises AssertionError when deadline is exceeded

Per CONTRIBUTING.md §"TDD Issue Test Tags": once @tdd_expected_fail is removed (i.e., the bug is fixed), @tdd_issue and @tdd_issue_<N> must remain permanently as a regression guard. Without these tags the test is not linked to the bug and the three-tag workflow is incomplete.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — TDD regression tags missing** This scenario verifies the deadline-exceeded error path introduced to fix bug #9963. It must carry the permanent TDD regression tags so it serves as the regression guard for this bug: ```gherkin @tdd_issue @tdd_issue_9963 Scenario: Clock-advance guard raises AssertionError when deadline is exceeded ``` Per CONTRIBUTING.md §"TDD Issue Test Tags": once `@tdd_expected_fail` is removed (i.e., the bug is fixed), `@tdd_issue` and `@tdd_issue_<N>` must remain permanently as a regression guard. Without these tags the test is not linked to the bug and the three-tag workflow is incomplete. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +19,4 @@
Then an AssertionError should be raised mentioning the deadline
@mock_only
Scenario: Clock-advance guard returns successfully once the clock advances
Owner

BLOCKING — TDD regression tags missing

This scenario should also carry @tdd_issue @tdd_issue_9963 as the permanent regression guard for the normal-advance path (proving the fix works). Both scenarios together replace the behaviour that was previously guarded by the four bare time.sleep() calls.

  @tdd_issue @tdd_issue_9963
  Scenario: Clock-advance guard returns successfully once the clock advances

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — TDD regression tags missing** This scenario should also carry `@tdd_issue @tdd_issue_9963` as the permanent regression guard for the normal-advance path (proving the fix works). Both scenarios together replace the behaviour that was previously guarded by the four bare `time.sleep()` calls. ```gherkin @tdd_issue @tdd_issue_9963 Scenario: Clock-advance guard returns successfully once the clock advances ``` --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / helm (pull_request) Successful in 52s
CI / push-validation (pull_request) Successful in 52s
CI / build (pull_request) Successful in 1m14s
Required
Details
CI / lint (pull_request) Successful in 1m22s
Required
Details
CI / benchmark-regression (pull_request) Failing after 1m33s
CI / quality (pull_request) Successful in 1m44s
Required
Details
CI / typecheck (pull_request) Successful in 1m48s
Required
Details
CI / security (pull_request) Successful in 2m1s
Required
Details
CI / integration_tests (pull_request) Successful in 4m11s
Required
Details
CI / e2e_tests (pull_request) Successful in 4m11s
CI / unit_tests (pull_request) Successful in 5m15s
Required
Details
CI / docker (pull_request) Successful in 1m28s
Required
Details
CI / coverage (pull_request) Successful in 11m17s
Required
Details
CI / status-check (pull_request) Successful in 4s
This pull request doesn't have enough approvals yet. 0 of 1 approvals granted.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin bugfix/m3.6.0-ci-pipeline-flakiness-stabilization:bugfix/m3.6.0-ci-pipeline-flakiness-stabilization
git switch bugfix/m3.6.0-ci-pipeline-flakiness-stabilization
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!11119
No description provided.