TEST-INFRA: [ci-execution-time] Optimize end-to-end (E2E) test execution #5787

Open
opened 2026-04-09 09:33:40 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: feat/test-infra/ci-execution-time-e2e-optimization
  • Commit Message: perf(testing): optimize E2E test execution with tiered strategy and suite-level parallelization
  • Milestone: Backlog (no milestone — see backlog note below)
  • Parent Epic: #5407

Backlog note: This issue was discovered during autonomous operation
on milestone v3.6.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Summary

The end-to-end (E2E) test suite (nox -s e2e_tests) is the most time-consuming part of the CI pipeline. These tests are critical for ensuring the quality of the system, but their long execution time can slow down the development workflow. This proposal outlines a strategy for optimizing the E2E test execution.

Problem

The E2E tests are inherently slow because they interact with real LLM APIs and test the entire system. The current implementation runs all E2E tests on every pull request, which can be inefficient. Additionally, the test environment setup and teardown for E2E tests can be complex and time-consuming.

Proposal

  1. Introduce a tiered E2E testing strategy: Split the E2E tests into different tiers based on their importance and execution time:
    • Tier 1 (Smoke Tests): A small set of critical-path tests that run on every pull request. These tests should cover the most important user journeys and provide a quick sanity check of the system.
    • Tier 2 (Full Regression): The full E2E test suite, which runs on a nightly basis or on demand. This ensures full test coverage without slowing down the development workflow.
  2. Optimize the E2E test environment: Investigate ways to optimize the setup and teardown of the E2E test environment. This could involve using a dedicated test environment, pre-building test data, or using a faster environment provisioning mechanism.
  3. Parallelize E2E tests at the suite level: While the E2E tests are already parallelized using pabot, explore parallelizing them at a higher level — for example, running different E2E test suites (e.g., e2e-anthropic, e2e-openai) in parallel as separate CI jobs.

Subtasks

  • Analyze the current E2E test suite and identify the most time-consuming tests.
  • Categorize the E2E tests into tiers (e.g., smoke, full regression).
  • Implement the tiered E2E testing strategy in the CI pipeline.
  • Profile the E2E test environment setup and teardown process to identify bottlenecks.
  • Implement environment optimizations (e.g., dedicated environment, pre-built data).
  • Investigate and implement suite-level parallelization of E2E tests.
  • Document the tiered E2E testing strategy in the project documentation.

Definition of Done

  • The average execution time of the E2E tests on pull requests is reduced by at least 50%.
  • The tiered E2E testing strategy is implemented and documented.
  • The E2E test environment is optimized for speed and reliability.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: new-issue-creator

## Metadata - **Branch**: `feat/test-infra/ci-execution-time-e2e-optimization` - **Commit Message**: `perf(testing): optimize E2E test execution with tiered strategy and suite-level parallelization` - **Milestone**: Backlog (no milestone — see backlog note below) - **Parent Epic**: #5407 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.6.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Summary The end-to-end (E2E) test suite (`nox -s e2e_tests`) is the most time-consuming part of the CI pipeline. These tests are critical for ensuring the quality of the system, but their long execution time can slow down the development workflow. This proposal outlines a strategy for optimizing the E2E test execution. ## Problem The E2E tests are inherently slow because they interact with real LLM APIs and test the entire system. The current implementation runs all E2E tests on every pull request, which can be inefficient. Additionally, the test environment setup and teardown for E2E tests can be complex and time-consuming. ## Proposal 1. **Introduce a tiered E2E testing strategy**: Split the E2E tests into different tiers based on their importance and execution time: - **Tier 1 (Smoke Tests)**: A small set of critical-path tests that run on every pull request. These tests should cover the most important user journeys and provide a quick sanity check of the system. - **Tier 2 (Full Regression)**: The full E2E test suite, which runs on a nightly basis or on demand. This ensures full test coverage without slowing down the development workflow. 2. **Optimize the E2E test environment**: Investigate ways to optimize the setup and teardown of the E2E test environment. This could involve using a dedicated test environment, pre-building test data, or using a faster environment provisioning mechanism. 3. **Parallelize E2E tests at the suite level**: While the E2E tests are already parallelized using `pabot`, explore parallelizing them at a higher level — for example, running different E2E test suites (e.g., `e2e-anthropic`, `e2e-openai`) in parallel as separate CI jobs. ## Subtasks - [ ] Analyze the current E2E test suite and identify the most time-consuming tests. - [ ] Categorize the E2E tests into tiers (e.g., smoke, full regression). - [ ] Implement the tiered E2E testing strategy in the CI pipeline. - [ ] Profile the E2E test environment setup and teardown process to identify bottlenecks. - [ ] Implement environment optimizations (e.g., dedicated environment, pre-built data). - [ ] Investigate and implement suite-level parallelization of E2E tests. - [ ] Document the tiered E2E testing strategy in the project documentation. ## Definition of Done - [ ] The average execution time of the E2E tests on pull requests is reduced by at least 50%. - [ ] The tiered E2E testing strategy is implemented and documented. - [ ] The E2E test environment is optimized for speed and reliability. - [ ] All nox stages pass. - [ ] Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: new-issue-creator
HAL9000 added this to the v3.8.0 milestone 2026-04-09 09:49:08 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
HAL9000 removed this from the v3.8.0 milestone 2026-04-09 13:10:22 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5787
No description provided.