Flaky test: fix(bench): use unique names per iteration to avoid UNIQUE constraint failures #9548

Open
opened 2026-04-14 22:40:01 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit message: fix(bench): use unique names per iteration to avoid UNIQUE constraint failures
  • Branch: feature/m3-session-persistence

Background and Context

Benchmark coverage for the session persistence work introduces high-concurrency runs that exercise SQLite-backed storage. The CI workflow for this branch bounces between success and failure without code changes, suggesting the benchmark suite is racing itself rather than encountering a deterministic regression.

Description

The ASV benchmark job intermittently fails with a SQLite UNIQUE constraint failed: benchmark_runs.name error while generating benchmark results. Subsequent reruns minutes later succeed, even though they execute the same fix(bench) commit. This indicates that the test data is not fully isolated between runs.

Run History

  • Run #162failure on 2026-02-15 16:49 UTC. Benchmark stage aborted with a UNIQUE constraint error.
  • Run #164success on 2026-02-16 12:20 UTC. Same commit, benchmark suite completed normally.
  • Run #174failure on 2026-02-16 18:23 UTC. Reproduced the identical constraint violation just hours later.
  • Run #180success on 2026-02-16 20:47 UTC. Immediate rerun passed without changes.

Suspected Cause

The benchmark harness writes results into a shared SQLite database under predictable names (e.g., benchmark_run). When the previous workflow leaves partially populated rows or the DB directory persists between jobs, a subsequent invocation tries to insert the same name and trips the unique index. A teardown guard or randomised run identifiers would prevent residue from previous attempts and remove the flake.

Duplicate Check

Expected Behavior

  • Benchmark runs should generate isolated output directories and never see leftover rows from earlier attempts.
  • The ASV job should pass deterministically for the same commit.

Acceptance Criteria

  • Benchmark harness guarantees unique result names per run (via random suffixes or pre-run cleanup).
  • CI demonstrates at least 10 consecutive green runs on the affected branch with the fix applied.
  • Document the required cleanup in the benchmark README to avoid future regressions.
  • Coverage remains ≥ 97%.

Subtasks

  • Reproduce the failure locally by running the benchmark twice without cleanup and capturing the constraint error.
  • Introduce isolation (randomised result name or DB teardown) in the ASV configuration.
  • Add a smoke test that ensures the benchmark directory is empty before each run.
  • Monitor CI for 10 sequential successes after the fix.

Definition of Done

  1. Benchmark runs no longer hit the UNIQUE constraint failure in CI.
  2. The benchmark harness documents its isolation guarantees.
  3. CI history confirms stability over at least 10 consecutive runs.
  4. Test coverage stays at or above 97%.

Automated by CleverAgents Bot
Agent: test-infra-worker

## Metadata - **Commit message:** `fix(bench): use unique names per iteration to avoid UNIQUE constraint failures` - **Branch:** `feature/m3-session-persistence` ## Background and Context Benchmark coverage for the session persistence work introduces high-concurrency runs that exercise SQLite-backed storage. The CI workflow for this branch bounces between success and failure without code changes, suggesting the benchmark suite is racing itself rather than encountering a deterministic regression. ### Description The ASV benchmark job intermittently fails with a SQLite `UNIQUE constraint failed: benchmark_runs.name` error while generating benchmark results. Subsequent reruns minutes later succeed, even though they execute the same `fix(bench)` commit. This indicates that the test data is not fully isolated between runs. ### Run History * [Run #162](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/162) — **failure** on 2026-02-15 16:49 UTC. Benchmark stage aborted with a `UNIQUE constraint` error. * [Run #164](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/164) — **success** on 2026-02-16 12:20 UTC. Same commit, benchmark suite completed normally. * [Run #174](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/174) — **failure** on 2026-02-16 18:23 UTC. Reproduced the identical constraint violation just hours later. * [Run #180](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/180) — **success** on 2026-02-16 20:47 UTC. Immediate rerun passed without changes. ### Suspected Cause The benchmark harness writes results into a shared SQLite database under predictable names (e.g., `benchmark_run`). When the previous workflow leaves partially populated rows or the DB directory persists between jobs, a subsequent invocation tries to insert the same name and trips the unique index. A teardown guard or randomised run identifiers would prevent residue from previous attempts and remove the flake. ### Duplicate Check - [Open issues: "benchmark UNIQUE constraint"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=is%3Aopen+benchmark+UNIQUE) - [Cross-area search: "fix(bench) unique names"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=%22fix(bench)%22+%22unique+names%22) - [Closed issues: "benchmark UNIQUE constraint"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=is%3Aclosed+benchmark+UNIQUE) ## Expected Behavior - Benchmark runs should generate isolated output directories and never see leftover rows from earlier attempts. - The ASV job should pass deterministically for the same commit. ## Acceptance Criteria - [ ] Benchmark harness guarantees unique result names per run (via random suffixes or pre-run cleanup). - [ ] CI demonstrates at least 10 consecutive green runs on the affected branch with the fix applied. - [ ] Document the required cleanup in the benchmark README to avoid future regressions. - [ ] Coverage remains ≥ 97%. ## Subtasks - [ ] Reproduce the failure locally by running the benchmark twice without cleanup and capturing the constraint error. - [ ] Introduce isolation (randomised result name or DB teardown) in the ASV configuration. - [ ] Add a smoke test that ensures the benchmark directory is empty before each run. - [ ] Monitor CI for 10 sequential successes after the fix. ## Definition of Done 1. Benchmark runs no longer hit the `UNIQUE constraint` failure in CI. 2. The benchmark harness documents its isolation guarantees. 3. CI history confirms stability over at least 10 consecutive runs. 4. Test coverage stays at or above 97%. --- **Automated by CleverAgents Bot** Agent: test-infra-worker
Author
Owner

[AUTO-OWNR-1] Triage Decision: Verified — MoSCoW/Should Have

Flaky test fix improves CI reliability and prevents false failures. Should Have.

Priority: Medium


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

[AUTO-OWNR-1] **Triage Decision: Verified — MoSCoW/Should Have** Flaky test fix improves CI reliability and prevents false failures. Should Have. **Priority:** Medium --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

[AUTO-OWNR-1] Triage complete.\n\nVerified — Valid flaky test bug. UNIQUE constraint failures in benchmark tests cause intermittent CI failures.\n\n- Type: Bug\n- Priority: Medium\n- MoSCoW: Should Have — stable CI is important for milestone completion\n- Milestone: v3.2.0 — test stability\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\nAutomated by CleverAgents Bot\nAgent: automation-tracking-manager

[AUTO-OWNR-1] Triage complete.\n\n**Verified** ✅ — Valid flaky test bug. UNIQUE constraint failures in benchmark tests cause intermittent CI failures.\n\n- **Type**: Bug\n- **Priority**: Medium\n- **MoSCoW**: Should Have — stable CI is important for milestone completion\n- **Milestone**: v3.2.0 — test stability\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\n**Automated by CleverAgents Bot**\nAgent: automation-tracking-manager
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9548
No description provided.