TEST-INFRA: [ci-pipeline-design] Enable fail-fast for the CI pipeline #1770

Open
opened 2026-04-02 23:46:19 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: chore/ci-enable-fail-fast
  • Commit Message: chore(ci): enable fail-fast on matrix strategy to cancel in-progress jobs on failure
  • Milestone: v3.8.0
  • Parent Epic: #1678

Background and Context

The ci.yml workflow does not have fail-fast enabled on its matrix strategy. This means that if one job in the matrix fails, all other in-progress jobs continue to run to completion, consuming runner resources unnecessarily and delaying developer feedback.

Enabling fail-fast: true on the matrix configuration will cause GitHub Actions / Forgejo Actions to cancel all remaining in-progress jobs in the matrix as soon as any single job fails. This aligns with the project's fail-fast design philosophy (see CONTRIBUTING.md — Fail-Fast Principles) and reduces wasted CI resource consumption.

Current Behavior

When a matrix job fails (e.g., a test suite fails on one Python version), all other matrix jobs continue running until they complete, even though the overall CI run is already destined to fail.

Expected Behavior

When any matrix job fails, all other in-progress matrix jobs are immediately cancelled, providing faster feedback to developers and conserving runner resources.

Acceptance Criteria

  • The ci.yml workflow matrix strategy has fail-fast: true set explicitly.
  • When a matrix job fails, remaining in-progress matrix jobs are cancelled by the CI system.
  • The change is documented in docs/development/ci-cd.md.

Supporting Information

Subtasks

  • Add fail-fast: true to the strategy block of the matrix configuration in ci.yml.
  • Verify the change does not break any existing CI job dependencies or needs: chains.
  • Update docs/development/ci-cd.md to document the fail-fast behaviour.
  • Run nox (all default sessions), fix any errors.
  • Verify coverage >= 97% via nox -s coverage_report.

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (chore(ci): enable fail-fast on matrix strategy to cancel in-progress jobs on failure), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (chore/ci-enable-fail-fast).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `chore/ci-enable-fail-fast` - **Commit Message**: `chore(ci): enable fail-fast on matrix strategy to cancel in-progress jobs on failure` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Background and Context The `ci.yml` workflow does not have `fail-fast` enabled on its matrix strategy. This means that if one job in the matrix fails, all other in-progress jobs continue to run to completion, consuming runner resources unnecessarily and delaying developer feedback. Enabling `fail-fast: true` on the matrix configuration will cause GitHub Actions / Forgejo Actions to cancel all remaining in-progress jobs in the matrix as soon as any single job fails. This aligns with the project's fail-fast design philosophy (see CONTRIBUTING.md — Fail-Fast Principles) and reduces wasted CI resource consumption. ## Current Behavior When a matrix job fails (e.g., a test suite fails on one Python version), all other matrix jobs continue running until they complete, even though the overall CI run is already destined to fail. ## Expected Behavior When any matrix job fails, all other in-progress matrix jobs are immediately cancelled, providing faster feedback to developers and conserving runner resources. ## Acceptance Criteria - [ ] The `ci.yml` workflow matrix strategy has `fail-fast: true` set explicitly. - [ ] When a matrix job fails, remaining in-progress matrix jobs are cancelled by the CI system. - [ ] The change is documented in `docs/development/ci-cd.md`. ## Supporting Information - Related to the broader CI Pipeline Design improvement effort tracked under Epic #1678. - GitHub Actions / Forgejo Actions matrix `fail-fast` documentation: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/running-variations-of-jobs-in-a-workflow#handling-failures-with-fail-fast ## Subtasks - [ ] Add `fail-fast: true` to the `strategy` block of the matrix configuration in `ci.yml`. - [ ] Verify the change does not break any existing CI job dependencies or `needs:` chains. - [ ] Update `docs/development/ci-cd.md` to document the fail-fast behaviour. - [ ] Run `nox` (all default sessions), fix any errors. - [ ] Verify coverage >= 97% via `nox -s coverage_report`. ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`chore(ci): enable fail-fast on matrix strategy to cancel in-progress jobs on failure`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`chore/ci-enable-fail-fast`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-02 23:49:20 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • MoSCoW: MoSCoW/Could Have — CI/test infrastructure improvement.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **MoSCoW**: MoSCoW/Could Have — CI/test infrastructure improvement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1770
No description provided.