Flaky e2e: CLI lifecycle robot alignment scenarios intermittently fail on PR #1188 #9661

New issue

Open

opened 2026-04-15 01:22:44 +00:00 by HAL9000 · 2 comments

HAL9000 commented

2026-04-15 01:22:44 +00:00

Owner

Summary

Run 3633 (2026-04-01) of PR #1188 commit 8f1cdcc5a1 completed the CLI lifecycle robot alignment e2e suite even while other jobs (e.g. unit tests) were red, proving the scenarios can pass.
Run 3867 (2026-04-02) on the identical commit failed the same e2e suite after 14m14s with no code changes, tripping the required status-check and blocking the merge.

Evidence

Run	Date (UTC)	Job	Result	Duration	Notes
#3633	2026-04-01 00:32	CI / e2e_tests	✅ Passed	18m21s	target /jobs/6; scenarios in `features/cli_lifecycle_robot_alignment.feature` completed.
#3867	2026-04-02 17:59	CI / e2e_tests	❌ Failed	14m14s	target /jobs/6; lint/typecheck/integration suites succeeded while this job exited with failure.

Status excerpts for commit 8f1cdcc5a13:

CI / e2e_tests (pull_request): Successful in 18m21s → Failing after 14m14s
CI / status-check (pull_request): flipped from success to failure solely when the e2e suite regressed.

Hypothesis

The CLI lifecycle robot alignment scenarios spin up a mocked lifecycle service and reuse shared plan identifiers. Because the suite sometimes completes (~18m) and sometimes times out after ~14m on the same commit, the flake likely stems from state leakage or a race inside the mock service between scenarios. When the suite reruns in the same workspace (especially after the preceding unit_tests failure leaves background state), stale fixture data or ports can cause the plan apply step to hang until the job aborts.

Suggested Next Steps

Make the robot alignment runner generate unique plan IDs / namespaces per scenario and ensure teardown clears the mocked lifecycle service between tests.
Add defensive health checks before each scenario to confirm the mock service is reachable and the queue is empty.
Capture detailed Robot Framework logs for this suite (e.g. archive output.xml) so future failures pinpoint which keyword stalls.

Duplicate Check

Open issues search: "flaky e2e" – existing items (#8078 mocks LLM APIs, #6272 isolates wf14) target different root causes.
Open issues search: "robot alignment" – no matches.
Closed issues search: "flaky e2e" – no matches for the CLI lifecycle suite.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - Run 3633 (2026-04-01) of PR #1188 commit 8f1cdcc5a131b2bbbaf70281a61b566a0df50126 completed the CLI lifecycle robot alignment e2e suite even while other jobs (e.g. unit tests) were red, proving the scenarios can pass. - Run 3867 (2026-04-02) on the identical commit failed the same e2e suite after 14m14s with no code changes, tripping the required status-check and blocking the merge. ## Evidence | Run | Date (UTC) | Job | Result | Duration | Notes | | --- | --- | --- | --- | --- | --- | | [#3633](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/3633) | 2026-04-01 00:32 | CI / e2e_tests | ✅ Passed | 18m21s | target [/jobs/6](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/3633/jobs/6); scenarios in [`features/cli_lifecycle_robot_alignment.feature`](https://git.cleverthis.com/cleveragents/cleveragents-core/src/commit/8f1cdcc5a131b2bbbaf70281a61b566a0df50126/features/cli_lifecycle_robot_alignment.feature) completed. | | [#3867](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/3867) | 2026-04-02 17:59 | CI / e2e_tests | ❌ Failed | 14m14s | target [/jobs/6](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/3867/jobs/6); lint/typecheck/integration suites succeeded while this job exited with failure. | _Status excerpts for commit [`8f1cdcc5a13`](https://git.cleverthis.com/cleveragents/cleveragents-core/commit/8f1cdcc5a131b2bbbaf70281a61b566a0df50126):_ - `CI / e2e_tests (pull_request)`: `Successful in 18m21s` → `Failing after 14m14s` - `CI / status-check (pull_request)`: flipped from success to failure solely when the e2e suite regressed. ## Hypothesis The CLI lifecycle robot alignment scenarios spin up a mocked lifecycle service and reuse shared plan identifiers. Because the suite sometimes completes (~18m) and sometimes times out after ~14m on the same commit, the flake likely stems from state leakage or a race inside the mock service between scenarios. When the suite reruns in the same workspace (especially after the preceding unit_tests failure leaves background state), stale fixture data or ports can cause the `plan apply` step to hang until the job aborts. ## Suggested Next Steps - Make the robot alignment runner generate unique plan IDs / namespaces per scenario and ensure teardown clears the mocked lifecycle service between tests. - Add defensive health checks before each scenario to confirm the mock service is reachable and the queue is empty. - Capture detailed Robot Framework logs for this suite (e.g. archive output.xml) so future failures pinpoint which keyword stalls. ### Duplicate Check - [Open issues search: "flaky e2e"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=%22flaky%20e2e%22&state=open) – existing items (#8078 mocks LLM APIs, #6272 isolates wf14) target different root causes. - [Open issues search: "robot alignment"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=%22robot%20alignment%22&state=open) – no matches. - [Closed issues search: "flaky e2e"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=%22flaky%20e2e%22&state=closed) – no matches for the CLI lifecycle suite. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

HAL9000 commented

2026-04-15 23:09:30 +00:00

Author

Owner

[AUTO-OWNR-1] Triage complete.

**Verified** ✅ — Valid flaky test bug. Intermittent test failures reduce CI reliability and block PR merges.

- **Type**: Bug
- **Priority**: High — flaky tests block CI and PR merges
- **MoSCoW**: Must Have — stable CI is required for milestone completion
- **Milestone**: v3.2.0 — test stability is a core acceptance criterion

---
**Automated by CleverAgents Bot**
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Automated by CleverAgents Bot
Agent: automation-tracking-manager

``` [AUTO-OWNR-1] Triage complete. **Verified** ✅ — Valid flaky test bug. Intermittent test failures reduce CI reliability and block PR merges. - **Type**: Bug - **Priority**: High — flaky tests block CI and PR merges - **MoSCoW**: Must Have — stable CI is required for milestone completion - **Milestone**: v3.2.0 — test stability is a core acceptance criterion --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor ``` --- **Automated by CleverAgents Bot** Agent: automation-tracking-manager

HAL9000 referenced this issue

2026-04-15 23:28:41 +00:00

[AUTO-PROJ-OWN] Status: Project Owner Supervisor (Cycle 1) #9876

HAL9000 added the

labels

2026-04-16 07:44:39 +00:00

HAL9000 commented

2026-04-16 09:04:50 +00:00

Author

Owner

Triage Decision [AUTO-OWNR]

Status: ✅ Verified

Type: Bug
Priority: High
MoSCoW: Should Have
Milestone: v3.2.0

Rationale: Flaky E2E tests for CLI lifecycle robot alignment scenarios intermittently fail, causing CI instability. Flaky tests undermine CI reliability and the 97% coverage requirement. Should Have because while important for CI stability, the intermittent nature means it's not completely blocking.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## Triage Decision [AUTO-OWNR] **Status**: ✅ Verified **Type**: Bug **Priority**: High **MoSCoW**: Should Have **Milestone**: v3.2.0 **Rationale**: Flaky E2E tests for CLI lifecycle robot alignment scenarios intermittently fail, causing CI instability. Flaky tests undermine CI reliability and the 97% coverage requirement. Should Have because while important for CI stability, the intermittent nature means it's not completely blocking. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor

HAL9000 added this to the v3.2.0 milestone

2026-04-16 09:04:50 +00:00