chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions #11119
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
overdue
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!11119
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bugfix/m3.6.0-ci-pipeline-flakiness-stabilization"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Replace four
time.sleep(0.01)timestamp guards inmemory_service_coverage_steps.pywith a deterministic_wait_for_clock_advance()helper that busy-waits on the monotonic clock until the UTC clock actually advances past the recordedbeforetimestamp, bounded by a 2-second deadline.This eliminates four sources of intermittent CI failures in the entity-tracking scenarios where two consecutive
datetime.now(UTC)calls could return the same microsecond value on fast CI runners.Changes
features/steps/memory_service_coverage_steps.py: Added_wait_for_clock_advance()helper; replaced 4×time.sleep(0.01)instep_track_same_entity(),step_memory_service_entities_over_time()(×2), andstep_track_same_entity_with_additional_metadata()features/memory_service_clock_wait.feature: New Behave feature verifying the helper raisesAssertionErroron deadline exceeded and returns normally once the clock advancesfeatures/steps/memory_service_clock_wait_steps.py: Step definitions for the above featureQuality Gates
nox -e lintnox -e typechecknox -e unit_tests(15,585 scenarios, 0 failed)nox -e integration_tests(1,865 tests, 0 failed/errors)nox -e e2e_tests(pre-existing LLM-secret failures unrelated to these changes)References
Closes #9963
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor
First Review — PR #11119: chore: fix CI pipeline flakiness by stabilizing test fixtures and assertions
Overall Assessment
The core implementation is well-engineered:
_wait_for_clock_advance()is deterministic, correctly bounded, type-annotated, well-documented, and the new Behave feature verifying the helper behavior is a solid addition. The approach correctly follows the established pattern from issue #1542.However, several blocking process compliance issues prevent approval. This PR addresses a
Type/Bugissue (#9963), which triggers the mandatory TDD workflow — and that workflow was not followed. There are also commit footer, PR metadata, and Forgejo dependency direction issues that must be corrected.CI Status
✅ All 5 required gates pass: lint, typecheck, security, unit_tests, coverage
⚠️
CI / benchmark-regressionis failing (1m33s). This check is not a required merge gate, and the failure appears unrelated to the changes in this PR (no benchmark code was modified). The author should verify this failure is pre-existing and not introduced by this PR.Blocking Issues
1. TDD Workflow Not Followed (BLOCKING)
Issue #9963 is labelled
Type/Bug. Per CONTRIBUTING.md §"Bug Fix Workflow", ALL bug fixes must follow the mandatory TDD workflow:Type/Testingissue titledTDD: <description>must exist and be closed before this fix PR can land.@tdd_issue,@tdd_issue_9963, and@tdd_expected_failin a separatetdd/mN-ci-pipeline-flakiness-stabilizationbranch.@tdd_expected_fail(leaving@tdd_issueand@tdd_issue_9963permanently).memory_service_clock_wait.featuremust carry@tdd_issue @tdd_issue_9963tags to serve as the permanent regression guard.Neither a TDD companion issue nor a
tdd/branch with the matching suffix exists. The new scenarios inmemory_service_clock_wait.featureuse@mock_onlybut lack the required@tdd_issueand@tdd_issue_9963tags.How to fix:
a) Create a companion
Type/Testingissue titledTDD: [AUTO-INF-4] Replace time.sleep() timestamp guards in memory_service_coverage_steps.py with monotonic busy-waitwithPriority/CriticalandMoSCoW/Must Have.b) Set issue #9963 to depend on (be blocked by) the TDD issue.
c) Add
@tdd_issue @tdd_issue_9963tags to the scenarios inmemory_service_clock_wait.feature(the deadline-exceeded scenario is the regression guard for the original flaky behavior; the normal-advance scenario validates the fix).d) Ensure the TDD issue is closed (its work is represented by this PR).
2. Commit Footer Uses Wrong Format (BLOCKING)
The commit footer reads
Closes #9963but the project requiresISSUES CLOSED: #9963format. All recent commits on master useISSUES CLOSED: #N(e.g.,ISSUES CLOSED: #4300,ISSUES CLOSED: #9163). TheCloseskeyword is a GitHub/Forgejo PR-level autoclose feature — it does not satisfy the commit traceability requirement.How to fix: Amend the commit footer to read
ISSUES CLOSED: #9963.3. No Milestone Assigned to PR (BLOCKING)
The PR has no milestone assigned. Per CONTRIBUTING.md, every PR must be assigned to the same milestone as its linked issue(s). Issue #9963 has no milestone either — this should be corrected on both the issue and the PR before merging.
How to fix: Assign the appropriate milestone (e.g., v3.6.0 based on the branch name) to both issue #9963 and this PR.
4. No Type/ Label on PR (BLOCKING)
The PR has no
Type/label. Per CONTRIBUTING.md §"Pull Request Requirements" item 12, exactly oneType/label must be applied. For a bug fix this should beType/Bug.How to fix: Add
Type/Buglabel to this PR.5. Forgejo Dependency Direction Missing (BLOCKING)
The PR must "block" issue #9963 in the Forgejo dependency graph (PR → blocks → issue). Currently no dependency link exists between PR #11119 and issue #9963. Without this, the issue cannot auto-close when the PR merges, and the PR/issue lifecycle tracking is broken.
How to fix: On this PR, add issue #9963 under "blocks".
6. CHANGELOG Not Updated (BLOCKING)
No entry for this fix appears in
CHANGELOG.md. Per CONTRIBUTING.md §"Pull Request Requirements" item 7, the changelog must be updated with one entry per commit.How to fix: Add an entry to the
[Unreleased]section ofCHANGELOG.mddescribing the fix (e.g., under### Fixed: "Replaced fourtime.sleep(0.01)timestamp guards inmemory_service_coverage_steps.pywith a deterministic_wait_for_clock_advance()helper to eliminate intermittent timestamp failures on fast CI runners (#9963).").Non-Blocking Observations
Branch Name Convention Deviation
The branch
bugfix/m3.6.0-ci-pipeline-flakiness-stabilizationusesm3.6.0as the milestone segment. Convention isbugfix/mN-<name>where N is just the milestone number digit (e.g.,bugfix/m3-...,bugfix/m4-...). While this does not block merging, future branches should follow the simplermNformat.Issue #9963 Priority Label
Issue #9963 is labelled
Priority/High. Per CONTRIBUTING.md, allType/Bugissues must bePriority/Critical. This is a process compliance issue on the issue itself (not this PR), but the reviewer should flag it: please update the priority label toPriority/Critical.Code Quality Assessment (Passes)
✅ Correctness:
_wait_for_clock_advance()correctly solves the root cause. The busy-wait loop uses the monotonic clock for deadline enforcement (immune to wall-clock adjustments) while checking the UTC clock for the actual advancement condition. The 2-second deadline prevents infinite hangs. All 4 call sites correctly capturedatetime.now(UTC)before the operation that needs the timestamp gap.✅ Test Quality: The new
memory_service_clock_wait.featuretests both the deadline-exceeded error path and the normal-advance success path. Step definitions are well-documented with clear docstrings. The import of_wait_for_clock_advancefrom the coverage steps module is correct.✅ Type Safety: All new functions are fully type-annotated. No
# type: ignorepresent.✅ Readability: Naming is clear, the helper docstring accurately describes behaviour, comments in the modified step functions are updated to match.
✅ Security: No security concerns. No hardcoded secrets.
✅ Code Style: SOLID principles respected. Files well within 500-line limit. Follows ruff conventions.
✅ Documentation: Docstrings present on all new functions.
Please address the 6 blocking issues above and re-request review.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +13,4 @@# deadline is exceeded.@mock_onlyScenario: Clock-advance guard raises AssertionError when deadline is exceededBLOCKING — TDD regression tags missing
This scenario verifies the deadline-exceeded error path introduced to fix bug #9963. It must carry the permanent TDD regression tags so it serves as the regression guard for this bug:
Per CONTRIBUTING.md §"TDD Issue Test Tags": once
@tdd_expected_failis removed (i.e., the bug is fixed),@tdd_issueand@tdd_issue_<N>must remain permanently as a regression guard. Without these tags the test is not linked to the bug and the three-tag workflow is incomplete.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +19,4 @@Then an AssertionError should be raised mentioning the deadline@mock_onlyScenario: Clock-advance guard returns successfully once the clock advancesBLOCKING — TDD regression tags missing
This scenario should also carry
@tdd_issue @tdd_issue_9963as the permanent regression guard for the normal-advance path (proving the fix works). Both scenarios together replace the behaviour that was previously guarded by the four baretime.sleep()calls.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
🌱 Grooming: proceed — PR cleared for processing.
(check
no_duplicates, categoryno_duplicates)PR #11119 addresses a specific memory_service test flakiness issue caused by microsecond-level timing collisions in datetime comparisons. The solution—a deterministic _wait_for_clock_advance() helper replacing time.sleep() calls—is novel. Scanning 331 open PRs found no overlap: CI optimization PRs address execution time (not flakiness), E2E test fixes target specific failures (OOM, container clone), and infrastructure PRs cover tools/coverage. No other PR references #9963 or memory_service clock timing issues. Unique scope and approach confirm no duplicate.
📋 Estimate: tier 1.
Test-additive work across 3 files in features/: replaces 4x time.sleep with a deterministic clock-advance helper, adds a new Behave feature file, and adds new step definitions. Format-sensitive (Behave .feature syntax) and test-additive changes consistently regress at tier 0 per calibration history. Scope is focused and logic is straightforward, but the new feature + steps files warrant tier 1. The sole CI failure (benchmark-regression, no parser available) is a pre-existing infrastructure issue unrelated to these changes.
(attempt #4, tier 1)
🔧 Implementer attempt —
blocked.Blockers:
e302ddd34bbut dispatch base was87d035bab4. The implementer pushed from inside the worktree (forbidden by the git contract) OR a third party pushed during the attempt. Re-dispatch will re-prefetch and pick up the new head.(attempt #5, tier 2)
🔧 Implementer attempt —
rebase-failed.Blockers:
e302ddd34b1a230ee4e4(attempt #7, tier 2)
🔧 Implementer attempt —
ci-not-ready.(attempt #11, tier 1)
🔧 Implementer attempt —
blocked.Blockers:
1069d36f4ebut dispatch base was1a230ee4e4. The implementer pushed from inside the worktree (forbidden by the git contract) OR a third party pushed during the attempt. Re-dispatch will re-prefetch and pick up the new head.🌱 Grooming: proceed — PR cleared for processing.
(check
no_duplicates, categoryno_duplicates)PR #11119 addresses a specialized test fixture timing issue in memory_service scenarios by replacing brittle sleep guards with deterministic clock-advance logic. No other open PR tackles the same clock-synchronization problem or touches memory_service test fixtures with similar intent. Related CI PRs focus on performance optimization, not fixture stability.
📋 Estimate: tier 1.
Multi-file test-layer change: replaces 4 sleep() calls with a busy-wait helper, adds a new Behave feature file, and adds new step definitions. All CI failures are infrastructure DNS failures (forgejo-http.cleverlibre.svc.cluster.local unresolvable on runners) unrelated to the code changes. The PR itself has additive test logic — per calibration notes, test-additive work consistently regresses at tier 0. Scope is isolated to the features/steps layer with no production code changes, but multi-file with new logic branches warrants tier 1.
(attempt #14, tier 1)
🔧 Implementer attempt —
blocked.Blockers:
fa42709facbut dispatch base was1069d36f4e. The implementer pushed from inside the worktree (forbidden by the git contract) OR a third party pushed during the attempt. Re-dispatch will re-prefetch and pick up the new head.fa42709fac217b7ffe9f✅ Approved
Reviewed at commit
217b7ff.Confidence: high.
Claimed by
merge_drive.py(pid 2329255) until2026-06-14T22:45:03.449287+00:00.This claim is advisory and will be released when the cycle ends, or after the TTL by a sibling driver's expired-claim sweep.
217b7ffe9f9be3d6e169Approved by the controller reviewer stage (workflow 493).