TEST-INFRA: [flaky-tests] Improve tracking and identification of flaky tests #1706

Open
opened 2026-04-02 23:31:34 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: task/flaky-test-tracking-and-identification
  • Commit Message: chore(ci): add flaky test tracking, retry mechanism, and documentation
  • Milestone: v3.2.0
  • Parent Epic: #1678

Background and Context

Currently, there is no reliable way to identify and track flaky tests in the CI/CD pipeline. This makes it difficult to improve the reliability of the test suite and can lead to a loss of confidence in the test results.

This issue proposes several improvements to the CI/CD pipeline to better track and identify flaky tests.

Proposed Solutions

  1. Add a mechanism to track test results over time: This could be a database or a simple file that stores the results of each test run. This would allow us to identify tests that are frequently failing and passing.
  2. Implement a test retry mechanism: This would automatically re-run failed tests to see if they pass on the second attempt. This is a good way to identify flaky tests.
  3. Add a "flaky-test" label to the issue tracker: This label should be used exclusively for tracking flaky tests.
  4. Document the process for reporting and fixing flaky tests: This will ensure that everyone on the team knows how to deal with flaky tests.

Subtasks

  • Design and implement a persistent test result tracking mechanism (e.g., structured log file or lightweight DB) that records pass/fail per test per run
  • Integrate test retry logic into the CI pipeline so that failed tests are automatically re-run at least once before being marked as failures
  • Identify and flag tests that exhibit non-deterministic pass/fail behaviour across retries as flaky
  • Create a flaky-test label in the Forgejo issue tracker for exclusive use in tracking flaky test issues
  • Write documentation describing the process for reporting, labelling, and resolving flaky tests
  • Tests (Behave): Add BDD scenarios covering the retry mechanism and flaky-test detection logic
  • Tests (Robot): Add integration tests verifying the CI pipeline correctly retries and reports flaky tests
  • Verify coverage ≥ 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A persistent test result tracking mechanism is implemented and active in CI.
  • The test retry mechanism is operational and correctly identifies flaky tests.
  • The flaky-test label exists in the issue tracker.
  • Documentation for the flaky-test reporting and resolution process is published.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/flaky-test-tracking-and-identification` - **Commit Message**: `chore(ci): add flaky test tracking, retry mechanism, and documentation` - **Milestone**: v3.2.0 - **Parent Epic**: #1678 ## Background and Context Currently, there is no reliable way to identify and track flaky tests in the CI/CD pipeline. This makes it difficult to improve the reliability of the test suite and can lead to a loss of confidence in the test results. This issue proposes several improvements to the CI/CD pipeline to better track and identify flaky tests. ## Proposed Solutions 1. **Add a mechanism to track test results over time**: This could be a database or a simple file that stores the results of each test run. This would allow us to identify tests that are frequently failing and passing. 2. **Implement a test retry mechanism**: This would automatically re-run failed tests to see if they pass on the second attempt. This is a good way to identify flaky tests. 3. **Add a "flaky-test" label to the issue tracker**: This label should be used exclusively for tracking flaky tests. 4. **Document the process for reporting and fixing flaky tests**: This will ensure that everyone on the team knows how to deal with flaky tests. ## Subtasks - [ ] Design and implement a persistent test result tracking mechanism (e.g., structured log file or lightweight DB) that records pass/fail per test per run - [ ] Integrate test retry logic into the CI pipeline so that failed tests are automatically re-run at least once before being marked as failures - [ ] Identify and flag tests that exhibit non-deterministic pass/fail behaviour across retries as flaky - [ ] Create a `flaky-test` label in the Forgejo issue tracker for exclusive use in tracking flaky test issues - [ ] Write documentation describing the process for reporting, labelling, and resolving flaky tests - [ ] Tests (Behave): Add BDD scenarios covering the retry mechanism and flaky-test detection logic - [ ] Tests (Robot): Add integration tests verifying the CI pipeline correctly retries and reports flaky tests - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - [ ] All subtasks above are completed and checked off. - [ ] A persistent test result tracking mechanism is implemented and active in CI. - [ ] The test retry mechanism is operational and correctly identifies flaky tests. - [ ] The `flaky-test` label exists in the issue tracker. - [ ] Documentation for the flaky-test reporting and resolution process is published. - [ ] A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - [ ] The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - [ ] The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.2.0 milestone 2026-04-02 23:31:53 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • MoSCoW: MoSCoW/Could Have — CI/test infrastructure improvement. Could Have.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **MoSCoW**: MoSCoW/Could Have — CI/test infrastructure improvement. Could Have. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1706
No description provided.