TEST-INFRA: [flaky-tests] Inability to analyze flaky tests due to tooling limitations #2211

Open
opened 2026-04-03 09:34:54 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: task/test-infra-flaky-test-tooling-gaps
  • Commit Message: chore(ci): document and track tooling gaps blocking flaky test analysis
  • Milestone: v3.8.0
  • Parent Epic: #1678

Background and Context

Effective flaky test analysis requires access to detailed CI check run data — including per-run pass/fail status, retry counts, and failure logs — for a given pull request or commit. The current tooling available to agents and contributors does not expose this data, making it impossible to systematically identify flaky tests in the CI/CD pipeline.

A test is considered flaky if it passes and fails intermittently without any code changes. Identifying such tests requires analyzing the history of test runs and detecting patterns of intermittent failures and retries. Without the right tooling, this analysis cannot be performed.

This issue is related to #2207, which documents the environmental blockers (SSL errors, read tool failures, missing git clone) that prevented a specific CI history analysis run. This issue focuses specifically on the tooling gap: the absence of a mechanism to retrieve CI check run data at all, regardless of environment.

Current Behavior

The current set of available tools does not provide any way to retrieve CI check runs for a given pull request or commit. The only available information is the final status of a pull request (merged, closed, open), which is insufficient to identify flaky tests. Specifically:

  • No MCP tool or Forgejo API wrapper exposes per-commit or per-PR check run results.
  • Retry information (whether a check was retried and how many times) is not accessible.
  • Logs from failed CI checks are not retrievable through any available tool.

Expected Behavior

An agent or contributor should be able to:

  1. Query CI check run results for a given pull request or commit SHA.
  2. Inspect each check's status (success, failure, pending), duration, and a link to its logs.
  3. Determine whether a check was retried and how many times before it passed or was abandoned.
  4. Access the logs of failed checks to understand root causes.

Acceptance Criteria

  • A tool or documented API endpoint exists to retrieve CI check runs for a given PR or commit
  • The tool/endpoint exposes: check name, status, duration, and log URL for each run
  • Retry information (retry count, per-retry status) is accessible
  • Log content for failed checks is retrievable (directly or via URL)
  • The tooling is usable by automated agents operating in the standard execution environment

Supporting Information

  • Related issue: #2207 (environmental blockers for CI history analysis)
  • Parent Epic: #1678 (CI Execution Time Optimization — flaky tests inflate CI wall-clock time)
  • Forgejo exposes check run data via GET /api/v1/repos/{owner}/{repo}/statuses/{sha} and GET /api/v1/repos/{owner}/{repo}/commits/{ref}/statuses — these endpoints may serve as the basis for a new MCP tool

Subtasks

  • Audit existing Forgejo MCP tools to confirm no check-run retrieval capability exists
  • Identify the correct Forgejo API endpoints for retrieving CI check runs and commit statuses
  • Design and implement a new MCP tool (or document the curl-based API call) to retrieve check runs per PR/commit
  • Expose retry count and per-retry status in the tool output
  • Provide a mechanism to retrieve or link to logs for failed checks
  • Write BDD scenarios (Behave) covering the new tool's behavior
  • Write integration tests (Robot) for the new tool against a live Forgejo instance
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions) and fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/test-infra-flaky-test-tooling-gaps` - **Commit Message**: `chore(ci): document and track tooling gaps blocking flaky test analysis` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Background and Context Effective flaky test analysis requires access to detailed CI check run data — including per-run pass/fail status, retry counts, and failure logs — for a given pull request or commit. The current tooling available to agents and contributors does not expose this data, making it impossible to systematically identify flaky tests in the CI/CD pipeline. A test is considered flaky if it passes and fails intermittently without any code changes. Identifying such tests requires analyzing the history of test runs and detecting patterns of intermittent failures and retries. Without the right tooling, this analysis cannot be performed. This issue is related to #2207, which documents the environmental blockers (SSL errors, `read` tool failures, missing `git clone`) that prevented a specific CI history analysis run. This issue focuses specifically on the **tooling gap**: the absence of a mechanism to retrieve CI check run data at all, regardless of environment. ## Current Behavior The current set of available tools does not provide any way to retrieve CI check runs for a given pull request or commit. The only available information is the final status of a pull request (merged, closed, open), which is insufficient to identify flaky tests. Specifically: - No MCP tool or Forgejo API wrapper exposes per-commit or per-PR check run results. - Retry information (whether a check was retried and how many times) is not accessible. - Logs from failed CI checks are not retrievable through any available tool. ## Expected Behavior An agent or contributor should be able to: 1. Query CI check run results for a given pull request or commit SHA. 2. Inspect each check's status (success, failure, pending), duration, and a link to its logs. 3. Determine whether a check was retried and how many times before it passed or was abandoned. 4. Access the logs of failed checks to understand root causes. ## Acceptance Criteria - [ ] A tool or documented API endpoint exists to retrieve CI check runs for a given PR or commit - [ ] The tool/endpoint exposes: check name, status, duration, and log URL for each run - [ ] Retry information (retry count, per-retry status) is accessible - [ ] Log content for failed checks is retrievable (directly or via URL) - [ ] The tooling is usable by automated agents operating in the standard execution environment ## Supporting Information - Related issue: #2207 (environmental blockers for CI history analysis) - Parent Epic: #1678 (CI Execution Time Optimization — flaky tests inflate CI wall-clock time) - Forgejo exposes check run data via `GET /api/v1/repos/{owner}/{repo}/statuses/{sha}` and `GET /api/v1/repos/{owner}/{repo}/commits/{ref}/statuses` — these endpoints may serve as the basis for a new MCP tool ## Subtasks - [ ] Audit existing Forgejo MCP tools to confirm no check-run retrieval capability exists - [ ] Identify the correct Forgejo API endpoints for retrieving CI check runs and commit statuses - [ ] Design and implement a new MCP tool (or document the curl-based API call) to retrieve check runs per PR/commit - [ ] Expose retry count and per-retry status in the tool output - [ ] Provide a mechanism to retrieve or link to logs for failed checks - [ ] Write BDD scenarios (Behave) covering the new tool's behavior - [ ] Write integration tests (Robot) for the new tool against a live Forgejo instance - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions) and fix any errors ## Definition of Done This issue is complete when: - [ ] All subtasks above are completed and checked off. - [ ] A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - [ ] The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - [ ] The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-03 09:35:05 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#2211
No description provided.