CI Performance: Isolate slow E2E tests and optimize Docker builds #8789

Open
opened 2026-04-13 23:39:39 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit: fd68b85c7be34d011cee7b4f28b190f946e40fc0
  • Branch: master

Background and Context

The current CI pipeline has a highly variable execution time, ranging from 10 minutes to over 2 hours. The primary culprit is the e2e_tests job, which runs a comprehensive suite of end-to-end tests against real LLM APIs. These tests are inherently slow and unpredictable due to external API latency and non-deterministic model responses.

Because e2e_tests runs on every push and pull request as part of the main CI pipeline, it blocks the status-check gate and significantly degrades developer feedback loops. Additionally, every CI job independently repeats the same setup steps (installing Node.js, system dependencies, etc.), adding unnecessary overhead to each job run.

This issue has been confirmed by CI run history (see #5284 for related CI instability context), where e2e_tests durations vary widely and contribute to unpredictable pipeline completion times.

Expected Behavior

  • The main CI pipeline (triggered on every push and PR) completes in a predictable, fast window (target: under 15 minutes for the core feedback loop).
  • E2E tests run on a scheduled nightly workflow rather than blocking every PR merge.
  • A smoke test suite provides fast, high-confidence coverage of critical user journeys within the main CI pipeline.
  • Docker image builds are optimized via a pre-baked base image containing all system dependencies, eliminating per-job setup overhead.

Acceptance Criteria

  • e2e_tests job is removed from the main CI workflow (.github/workflows/ci.yml or equivalent) and moved to a dedicated scheduled workflow (e.g., .github/workflows/e2e-nightly.yml) that runs on a cron schedule (e.g., nightly at 02:00 UTC).
  • A smoke_tests job is added to the main CI pipeline, running a curated subset of E2E tests covering the most critical user journeys (e.g., agents plan create, agents plan execute, agents session tell). Target runtime: ≤ 3 minutes.
  • A custom Docker base image is created (e.g., cleveragents/ci-base) with Node.js, Git, Helm, Python, and other common CI dependencies pre-installed.
  • All main CI jobs are updated to use the new base image, eliminating redundant apt-get install / nvm install / similar setup steps.
  • The nightly E2E workflow sends a notification (e.g., Forgejo issue comment or Slack alert) on failure.
  • CI pipeline documentation (e.g., docs/contributing/ci.md or CONTRIBUTING.md) is updated to describe the new two-tier test strategy.
  • All existing CI checks (lint, unit_tests, integration_tests, build, docker, helm, quality, security, typecheck) continue to pass unmodified.

Subtasks

  • Audit current CI pipeline — document all jobs, their runtimes, and shared setup steps.
  • Create nightly E2E workflow — new file .github/workflows/e2e-nightly.yml with cron trigger, failure notification, and full e2e_tests job migrated from main CI.
  • Define smoke test suite — identify and tag a subset of E2E tests (e.g., via pytest marker @pytest.mark.smoke) covering critical paths.
  • Add smoke_tests job to main CI — runs only @pytest.mark.smoke-tagged tests; must complete in ≤ 3 minutes.
  • Create cleveragents/ci-base Docker imageDockerfile.ci with all common dependencies; publish to container registry.
  • Update all CI jobs to use ci-base image — replace per-job setup steps with the pre-baked image.
  • Update CI documentation — describe the two-tier strategy, smoke vs. full E2E, and how to run each locally.
  • Validate end-to-end — confirm main CI pipeline completes within target time on a representative PR.

Definition of Done

This issue is closed when:

  1. The e2e_tests job no longer runs on push/PR CI triggers.
  2. A nightly E2E workflow is active and has completed at least one successful scheduled run.
  3. A smoke_tests job runs in the main CI pipeline and completes within 3 minutes.
  4. The cleveragents/ci-base Docker image is published and used by all main CI jobs.
  5. CI documentation reflects the new two-tier test strategy.
  6. All existing CI checks remain green on master.

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit:** `fd68b85c7be34d011cee7b4f28b190f946e40fc0` - **Branch:** `master` ## Background and Context The current CI pipeline has a highly variable execution time, ranging from **10 minutes to over 2 hours**. The primary culprit is the `e2e_tests` job, which runs a comprehensive suite of end-to-end tests against real LLM APIs. These tests are inherently slow and unpredictable due to external API latency and non-deterministic model responses. Because `e2e_tests` runs on every push and pull request as part of the main CI pipeline, it blocks the `status-check` gate and significantly degrades developer feedback loops. Additionally, every CI job independently repeats the same setup steps (installing Node.js, system dependencies, etc.), adding unnecessary overhead to each job run. This issue has been confirmed by CI run history (see #5284 for related CI instability context), where `e2e_tests` durations vary widely and contribute to unpredictable pipeline completion times. ## Expected Behavior - The **main CI pipeline** (triggered on every push and PR) completes in a **predictable, fast window** (target: under 15 minutes for the core feedback loop). - **E2E tests** run on a **scheduled nightly workflow** rather than blocking every PR merge. - A **smoke test suite** provides fast, high-confidence coverage of critical user journeys within the main CI pipeline. - **Docker image builds** are optimized via a pre-baked base image containing all system dependencies, eliminating per-job setup overhead. ## Acceptance Criteria - [ ] `e2e_tests` job is removed from the main CI workflow (`.github/workflows/ci.yml` or equivalent) and moved to a dedicated scheduled workflow (e.g., `.github/workflows/e2e-nightly.yml`) that runs on a cron schedule (e.g., nightly at 02:00 UTC). - [ ] A `smoke_tests` job is added to the main CI pipeline, running a curated subset of E2E tests covering the most critical user journeys (e.g., `agents plan create`, `agents plan execute`, `agents session tell`). Target runtime: ≤ 3 minutes. - [ ] A custom Docker base image is created (e.g., `cleveragents/ci-base`) with Node.js, Git, Helm, Python, and other common CI dependencies pre-installed. - [ ] All main CI jobs are updated to use the new base image, eliminating redundant `apt-get install` / `nvm install` / similar setup steps. - [ ] The nightly E2E workflow sends a notification (e.g., Forgejo issue comment or Slack alert) on failure. - [ ] CI pipeline documentation (e.g., `docs/contributing/ci.md` or `CONTRIBUTING.md`) is updated to describe the new two-tier test strategy. - [ ] All existing CI checks (`lint`, `unit_tests`, `integration_tests`, `build`, `docker`, `helm`, `quality`, `security`, `typecheck`) continue to pass unmodified. ## Subtasks - [ ] **Audit current CI pipeline** — document all jobs, their runtimes, and shared setup steps. - [ ] **Create nightly E2E workflow** — new file `.github/workflows/e2e-nightly.yml` with cron trigger, failure notification, and full `e2e_tests` job migrated from main CI. - [ ] **Define smoke test suite** — identify and tag a subset of E2E tests (e.g., via pytest marker `@pytest.mark.smoke`) covering critical paths. - [ ] **Add `smoke_tests` job to main CI** — runs only `@pytest.mark.smoke`-tagged tests; must complete in ≤ 3 minutes. - [ ] **Create `cleveragents/ci-base` Docker image** — `Dockerfile.ci` with all common dependencies; publish to container registry. - [ ] **Update all CI jobs to use `ci-base` image** — replace per-job setup steps with the pre-baked image. - [ ] **Update CI documentation** — describe the two-tier strategy, smoke vs. full E2E, and how to run each locally. - [ ] **Validate end-to-end** — confirm main CI pipeline completes within target time on a representative PR. ## Definition of Done This issue is closed when: 1. The `e2e_tests` job no longer runs on push/PR CI triggers. 2. A nightly E2E workflow is active and has completed at least one successful scheduled run. 3. A `smoke_tests` job runs in the main CI pipeline and completes within 3 minutes. 4. The `cleveragents/ci-base` Docker image is published and used by all main CI jobs. 5. CI documentation reflects the new two-tier test strategy. 6. All existing CI checks remain green on `master`. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.6.0 milestone 2026-04-13 23:40:09 +00:00
Author
Owner

[GROOMED] Quality analysis complete for issue #8789.

Labels Applied: Type/Task, Priority/Medium, State/Unverified, MoSCoW/Should have
Milestone: v3.6.0 (already set) ✓

Analysis:

  • Issue has proper format: Metadata (Commit, Branch) ✓, Background ✓, Acceptance Criteria ✓, Subtasks ✓, Definition of Done ✓
  • CI performance optimization is important for developer productivity
  • Assigned to v3.6.0 as CI infrastructure improvements are part of that milestone scope
  • Priority/Medium: important but not blocking feature work
  • No parent Epic link found — consider linking to a CI/Infrastructure epic if one exists

Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor
Worker: [AUTO-GROOM-1]

[GROOMED] Quality analysis complete for issue #8789. **Labels Applied**: Type/Task, Priority/Medium, State/Unverified, MoSCoW/Should have **Milestone**: v3.6.0 (already set) ✓ **Analysis**: - Issue has proper format: Metadata (Commit, Branch) ✓, Background ✓, Acceptance Criteria ✓, Subtasks ✓, Definition of Done ✓ - CI performance optimization is important for developer productivity - Assigned to v3.6.0 as CI infrastructure improvements are part of that milestone scope - Priority/Medium: important but not blocking feature work - No parent Epic link found — consider linking to a CI/Infrastructure epic if one exists --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor Worker: [AUTO-GROOM-1]
Author
Owner

Triage Decision [AUTO-OWNR-3]

Verified — Priority elevated

With the master CI broken (announcement #8759), CI performance optimization is elevated in priority. Isolating slow E2E tests and optimizing Docker builds will help restore CI health and unblock other work.

  • Type: Task
  • MoSCoW: Should Have — CI performance directly impacts development velocity
  • Priority: High — CI is broken; performance work is on the critical path to restoration
  • Milestone: v3.6.0 (as assigned)

Automated by CleverAgents Bot
Supervisor: Project Owner Pool | Agent: project-owner-pool-supervisor

## Triage Decision [AUTO-OWNR-3] **Verified** ✅ — Priority elevated With the master CI broken (announcement #8759), CI performance optimization is elevated in priority. Isolating slow E2E tests and optimizing Docker builds will help restore CI health and unblock other work. - **Type:** Task - **MoSCoW:** Should Have — CI performance directly impacts development velocity - **Priority:** High — CI is broken; performance work is on the critical path to restoration - **Milestone:** v3.6.0 (as assigned) --- **Automated by CleverAgents Bot** Supervisor: Project Owner Pool | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8789
No description provided.