fix(e2e): redesign e2e test chain in m5_acceptance.robot to eliminate cascading tdd_expected_fail dependencies #5965

Open
opened 2026-04-09 12:27:45 +00:00 by HAL9000 · 2 comments
Owner

Background

PR #5659 (fix(testing): add guard logic to Robot tdd_expected_fail listener to prevent flaky CI) exposed 12 cascading e2e test failures in robot/m5_acceptance.robot. These failures are pre-existing design issues in the e2e test chain, not regressions introduced by PR #5659.

The root cause is a chain design problem: tdd_expected_fail-tagged tests set prerequisite variables (e.g., context assembly outputs, session IDs, plan references) that subsequent dependent tests rely on. When the guard fix in PR #5659 correctly prevents blind inversion of infrastructure failures, these cascading dependencies break — the downstream tests never receive the variables they expect, causing a cascade of 12 failures.

The immediate workaround (tagging the 12 failing scenarios with tdd_expected_fail) was handled in issue #5863 to unblock PR #5659. This issue tracks the proper fix: redesigning the test chain so each test is fully independent.

Related issues:

  • #5436 — Robot Framework tdd_expected_fail listener missing guard logic (root cause)
  • #5659 — PR: fix(testing): add guard logic to Robot tdd_expected_fail listener
  • #5863 — Temporary workaround: mark 12 cascading e2e tests with tdd_expected_fail

Failing Scenarios

The following 12 scenarios in robot/m5_acceptance.robot fail due to cascading tdd_expected_fail dependencies (context assembly e2e tests):

  1. Context Assembly - Retrieve context snapshot after plan execution
  2. Context Assembly - Verify hot context contains recent plan output
  3. Context Assembly - Verify cold context contains historical plan data
  4. Context Assembly - Verify context snapshot includes resource references
  5. Context Assembly - Verify context snapshot actor state
  6. Context Assembly - Context assembly with multiple concurrent plans
  7. Context Assembly - Context snapshot persists across session restart
  8. Context Assembly - Context assembly respects token budget limits
  9. Context Assembly - Context assembly with empty plan history
  10. Context Assembly - Context assembly error handling for missing plan
  11. Context Assembly - Context snapshot includes decision tree references
  12. Context Assembly - Context assembly integrates with ACMS hot/cold tiers

Note: The exact scenario names should be confirmed from the CI logs of PR #5659. The above are representative based on the context assembly e2e test chain pattern. The implementer must identify the precise 12 failing scenarios from the CI output.

Problem Analysis

The current test chain design violates the principle of test independence:

Test A (tdd_expected_fail) → sets ${CONTEXT_SNAPSHOT_ID}
    ↓ (variable dependency)
Test B → uses ${CONTEXT_SNAPSHOT_ID}  ← FAILS when Test A is guarded/skipped
    ↓ (variable dependency)
Test C → uses output from Test B      ← FAILS in cascade
    ...
Test L → 12th cascading failure

When the tdd_expected_fail guard correctly identifies an infrastructure failure in Test A and does not invert the result, ${CONTEXT_SNAPSHOT_ID} is never set. All downstream tests that depend on it fail with variable-not-found errors, not assertion failures.

Acceptance Criteria

  • Each of the 12 failing scenarios is fully independent — no test relies on variables set by a tdd_expected_fail-tagged test
  • Test data setup for each scenario uses its own dedicated setup keyword or [Setup] block
  • No scenario in m5_acceptance.robot uses a variable that is only set by another test's execution
  • All 12 scenarios pass without tdd_expected_fail tags once the underlying bugs are fixed
  • The tdd_expected_fail tags added as a workaround in #5863 are removed from these 12 scenarios
  • The redesigned tests still correctly capture the intended acceptance criteria for milestone 5
  • All nox quality gates pass: nox -e lint, nox -e typecheck, nox -e integration_tests, nox -e e2e_tests

Metadata

  • Branch: fix/e2e-redesign-m5-acceptance-cascading-deps
  • Commit Message: fix(e2e): redesign e2e test chain in m5_acceptance.robot to eliminate cascading tdd_expected_fail dependencies
  • Milestone: v3.5.0
  • Parent Epic: #739

Subtasks

  • Identify the exact 12 failing scenarios from CI logs of PR #5659 (confirm scenario names)
  • Audit robot/m5_acceptance.robot for all inter-test variable dependencies
  • Design independent test data setup for each of the 12 affected scenarios (dedicated [Setup] keywords or suite-level fixtures that do not depend on tdd_expected_fail tests)
  • Refactor the 12 scenarios to use self-contained setup — each test creates its own prerequisites
  • Remove the tdd_expected_fail tags added as a workaround in #5863 from the 12 scenarios
  • Verify each scenario passes independently (can be run in isolation without other tests)
  • Run full e2e suite to confirm no new cascading failures introduced
  • Run nox -e lint and nox -e typecheck to confirm quality gates pass

Definition of Done

  • All 12 previously-cascading scenarios in robot/m5_acceptance.robot are redesigned to be fully independent
  • No tdd_expected_fail workaround tags remain on the 12 scenarios (they either pass legitimately or carry tdd_issue_<N> tags with proper expected-fail semantics)
  • Each scenario has its own [Setup] block or dedicated setup keyword that does not depend on variables from other tests
  • nox -e e2e_tests passes with all 12 scenarios green
  • nox -e lint passes
  • nox -e typecheck passes
  • Coverage >= 97%
  • PR is linked to this issue and merges cleanly to master

Automated by CleverAgents Bot
Supervisor: Acting on behalf of: Human Request | Agent: new-issue-creator

## Background PR #5659 (`fix(testing): add guard logic to Robot tdd_expected_fail listener to prevent flaky CI`) exposed 12 cascading e2e test failures in `robot/m5_acceptance.robot`. These failures are **pre-existing design issues** in the e2e test chain, not regressions introduced by PR #5659. The root cause is a chain design problem: `tdd_expected_fail`-tagged tests set prerequisite variables (e.g., context assembly outputs, session IDs, plan references) that subsequent dependent tests rely on. When the guard fix in PR #5659 correctly prevents blind inversion of infrastructure failures, these cascading dependencies break — the downstream tests never receive the variables they expect, causing a cascade of 12 failures. The immediate workaround (tagging the 12 failing scenarios with `tdd_expected_fail`) was handled in issue #5863 to unblock PR #5659. This issue tracks the proper fix: redesigning the test chain so each test is fully independent. **Related issues:** - #5436 — Robot Framework tdd_expected_fail listener missing guard logic (root cause) - #5659 — PR: fix(testing): add guard logic to Robot tdd_expected_fail listener - #5863 — Temporary workaround: mark 12 cascading e2e tests with tdd_expected_fail ## Failing Scenarios The following 12 scenarios in `robot/m5_acceptance.robot` fail due to cascading `tdd_expected_fail` dependencies (context assembly e2e tests): 1. `Context Assembly - Retrieve context snapshot after plan execution` 2. `Context Assembly - Verify hot context contains recent plan output` 3. `Context Assembly - Verify cold context contains historical plan data` 4. `Context Assembly - Verify context snapshot includes resource references` 5. `Context Assembly - Verify context snapshot actor state` 6. `Context Assembly - Context assembly with multiple concurrent plans` 7. `Context Assembly - Context snapshot persists across session restart` 8. `Context Assembly - Context assembly respects token budget limits` 9. `Context Assembly - Context assembly with empty plan history` 10. `Context Assembly - Context assembly error handling for missing plan` 11. `Context Assembly - Context snapshot includes decision tree references` 12. `Context Assembly - Context assembly integrates with ACMS hot/cold tiers` > **Note:** The exact scenario names should be confirmed from the CI logs of PR #5659. The above are representative based on the context assembly e2e test chain pattern. The implementer must identify the precise 12 failing scenarios from the CI output. ## Problem Analysis The current test chain design violates the principle of test independence: ``` Test A (tdd_expected_fail) → sets ${CONTEXT_SNAPSHOT_ID} ↓ (variable dependency) Test B → uses ${CONTEXT_SNAPSHOT_ID} ← FAILS when Test A is guarded/skipped ↓ (variable dependency) Test C → uses output from Test B ← FAILS in cascade ... Test L → 12th cascading failure ``` When the `tdd_expected_fail` guard correctly identifies an infrastructure failure in Test A and does not invert the result, `${CONTEXT_SNAPSHOT_ID}` is never set. All downstream tests that depend on it fail with variable-not-found errors, not assertion failures. ## Acceptance Criteria - [ ] Each of the 12 failing scenarios is fully independent — no test relies on variables set by a `tdd_expected_fail`-tagged test - [ ] Test data setup for each scenario uses its own dedicated setup keyword or `[Setup]` block - [ ] No scenario in `m5_acceptance.robot` uses a variable that is only set by another test's execution - [ ] All 12 scenarios pass without `tdd_expected_fail` tags once the underlying bugs are fixed - [ ] The `tdd_expected_fail` tags added as a workaround in #5863 are removed from these 12 scenarios - [ ] The redesigned tests still correctly capture the intended acceptance criteria for milestone 5 - [ ] All nox quality gates pass: `nox -e lint`, `nox -e typecheck`, `nox -e integration_tests`, `nox -e e2e_tests` ## Metadata - **Branch**: `fix/e2e-redesign-m5-acceptance-cascading-deps` - **Commit Message**: `fix(e2e): redesign e2e test chain in m5_acceptance.robot to eliminate cascading tdd_expected_fail dependencies` - **Milestone**: v3.5.0 - **Parent Epic**: #739 ## Subtasks - [ ] Identify the exact 12 failing scenarios from CI logs of PR #5659 (confirm scenario names) - [ ] Audit `robot/m5_acceptance.robot` for all inter-test variable dependencies - [ ] Design independent test data setup for each of the 12 affected scenarios (dedicated `[Setup]` keywords or suite-level fixtures that do not depend on `tdd_expected_fail` tests) - [ ] Refactor the 12 scenarios to use self-contained setup — each test creates its own prerequisites - [ ] Remove the `tdd_expected_fail` tags added as a workaround in #5863 from the 12 scenarios - [ ] Verify each scenario passes independently (can be run in isolation without other tests) - [ ] Run full e2e suite to confirm no new cascading failures introduced - [ ] Run `nox -e lint` and `nox -e typecheck` to confirm quality gates pass ## Definition of Done - [ ] All 12 previously-cascading scenarios in `robot/m5_acceptance.robot` are redesigned to be fully independent - [ ] No `tdd_expected_fail` workaround tags remain on the 12 scenarios (they either pass legitimately or carry `tdd_issue_<N>` tags with proper expected-fail semantics) - [ ] Each scenario has its own `[Setup]` block or dedicated setup keyword that does not depend on variables from other tests - [ ] `nox -e e2e_tests` passes with all 12 scenarios green - [ ] `nox -e lint` passes - [ ] `nox -e typecheck` passes - [ ] Coverage >= 97% - [ ] PR is linked to this issue and merges cleanly to master --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: Human Request | Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 12:36:36 +00:00
Author
Owner

🏷️ Label compliance fix applied by backlog groomer (cycle 60)

This issue was missing all labels. The following labels have been added based on issue content analysis:

  • State/Verified — confirmed pre-existing e2e test chain design issue
  • Type/Bug — cascading test failures due to inter-test variable dependencies
  • Priority/High — 12 e2e scenarios broken, blocking M5 acceptance coverage

Automated by CleverAgents Bot
Supervisor: Label Management | Agent: forgejo-label-manager

🏷️ **Label compliance fix applied by backlog groomer (cycle 60)** This issue was missing all labels. The following labels have been added based on issue content analysis: - `State/Verified` — confirmed pre-existing e2e test chain design issue - `Type/Bug` — cascading test failures due to inter-test variable dependencies - `Priority/High` — 12 e2e scenarios broken, blocking M5 acceptance coverage --- **Automated by CleverAgents Bot** Supervisor: Label Management | Agent: forgejo-label-manager
Author
Owner

MoSCoW classification: MoSCoW/Must have

Rationale: Cascading tdd_expected_fail dependencies in m5_acceptance.robot mean that 12 e2e tests are failing in a chain — if one fails, all subsequent tests fail. This undermines the reliability of the entire e2e test suite for v3.5.0 acceptance. The spec requires a reliable test suite. This is a Must Have for v3.5.0 milestone acceptance — the e2e tests must be independently runnable.

Also adding Points/5 — L — Redesigning the e2e test chain to eliminate cascading dependencies requires careful analysis of test dependencies and restructuring, estimated 1-2 days.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner

MoSCoW classification: **MoSCoW/Must have** Rationale: Cascading `tdd_expected_fail` dependencies in `m5_acceptance.robot` mean that 12 e2e tests are failing in a chain — if one fails, all subsequent tests fail. This undermines the reliability of the entire e2e test suite for v3.5.0 acceptance. The spec requires a reliable test suite. This is a Must Have for v3.5.0 milestone acceptance — the e2e tests must be independently runnable. Also adding `Points/5` — L — Redesigning the e2e test chain to eliminate cascading dependencies requires careful analysis of test dependencies and restructuring, estimated 1-2 days. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5965
No description provided.