test(e2e): workflow example 15 — disaster recovery, rollback a failed apply (trusted profile) #802

Closed
freemo wants to merge 1 commit from test/e2e-wf15-disaster-recovery into master
Owner

Summary

E2E test for Workflow Example 15 — disaster recovery: rollback a failed apply using the trusted profile. Tests plan rollback, plan correct (revert mode), correction diff, re-execution after correction, and the full recovery lifecycle.

Closes #761

ISSUES CLOSED: #761

Changes

  • Decision ID extraction (Major fix): Replaced generic Extract First Id From Text with dedicated Extract Decision Id From Tree keyword that uses a targeted regex ("decision_id"\s*:\s*"<ULID>") to avoid extracting plan_id from JSON tree output. This ensures plan explain and plan correct receive the correct decision ID.
  • Checkpoint ID extraction (Major fix): Replaced broken first-ULID extraction from plan artifacts with dedicated Extract Checkpoint Id From Status keyword that targets last_checkpoint_id / checkpoint_id fields in plan status --format json output. Removed the dangerous fallback that silently used plan_id as checkpoint. When no checkpoint is available, the test gracefully skips with a clear message via Skip If.
  • AC#9 final status assertion (Major fix): Removed tautological terms ('phase', 'errored', 'cancelled') from the terminal state assertion. After successful recovery, only 'applied' or 'completed' are accepted as valid terminal states.
  • Return code assertions on recovery steps (Major fix): Added Should Be Equal As Integers ${r_X.rc} 0 assertions to all critical recovery commands including plan diff (step 14), which previously lacked an rc check. The initial plan apply and plan artifacts still use expected_rc=None without rc checks since those may legitimately fail.
  • ROOT CAUSE annotation enforcement (Major fix — AC#4): Added a secondary plan tree --format plain request specifically for the ROOT CAUSE display annotation check. Also searches JSON tree and status output for root_cause/root cause references. Logs WARN (rather than failing) when not found, since LLM behavior is non-deterministic and may not annotate root cause.
  • Checkpoint skip with explicit documentation (Major fix — AC#6–AC#9): Replaced terse Skip If with thoroughly documented skip explaining when checkpoint absence is expected (LLM behavior) vs when it indicates a problem. Added guidance to investigate persistent skips.
  • Errored state verification (Major fix — Issue #3): Added dedicated assertion after initial apply/status to verify the plan entered an errored/failed state. Logs WARN when state is not errored (LLM may produce a successful plan), continuing with recovery steps regardless.
  • Correction ID extraction (Major fix — Issue #5): Replaced generic Extract First Id From Text on combined stdout+stderr with dedicated Extract Correction Id From Output keyword that targets the "correction_id" field specifically. Falls back to first ULID from stdout only (not stderr) to avoid extracting stray IDs.
  • Dead ELSE branch removed (Major fix — Issue #6): Removed unreachable IF/ELSE guard on correction ID extraction (rc==0 already guaranteed by prior assertion). Correction ID is now always extracted directly.
  • AC#3 status assertion tightened: Removed generic 'execute' term that matches field labels in every status output. Kept only specific state values: 'processing', 'errored', 'applied'.
  • AC#5 explain assertion improved (Issue #7, #20): Removed overly generic terms ('decision', 'question') that would match any explain output. Replaced with specific forensic content terms: 'rationale', 'snapshot', 'alternative', 'chosen', 'confidence', 'context'.
  • AC#7 correct assertion improved (Issue #7): Removed generic 'correct' (matches command name) and 'status'. Replaced with more specific terms: 'correction_id', 'corrected', 'applied', 'revert'.
  • AC#8 diff assertion strengthened (Issue #7): Replaced generic 'file' and 'change' with diff-specific markers: '---', '+++', 'original', 'modified'.
  • AC#6 rollback assertion tightened (Issue #12): Removed generic 'complete' from the disjunction. Kept 'rollback', 'restored', 'revert', 'checkpoint'.
  • plan correct — removed undocumented --plan flag (Issue #8): The spec signature for plan correct does not include a --plan flag. Removed to align with documented spec workflow.
  • Missing plan diff investigation step added (Issue #9): Added plan diff ${plan_id} --format plain step between tree (step 9) and explain (step 10), matching the spec WF15 workflow which shows investigating the diff before explain.
  • plan apply instead of lifecycle-apply (Issue #10): Changed both initial and final apply commands from plan lifecycle-apply to plan apply with --yes flag to match the spec-documented command.
  • Phase advancement verification (Issue #13): Added plan status check after strategize (step 5b) to verify phase transition before executing the execute phase.
  • Consistent explicit timeouts (Issue #14): Added explicit timeout=120s to steps 8, 9, 10, and 17 which previously relied on the default.
  • ULID regex consolidation comment (Issue #15): Added comment in Extract Plan Id keyword noting that M1 defines a similar keyword and consolidation is deferred to a separate ticket.
  • plan diff --correction spec alignment (Issue #16): Fixed plan diff invocation when correction_id is available to use plan diff --correction ${correction_id} without also passing plan_id, matching the spec's either/or notation.
  • Extract Plan Id fallback regex tightened (Issue #17): Replaced overly broad [\w-]+ regex with ULID/UUID-specific patterns: [0-9A-HJ-KM-NP-TV-Z]{26} and UUID 8-4-4-4-12 format.
  • Explicit expected_rc on steps 5 and 6 (Issue #19): Added explicit expected_rc=${0} to strategize and execute steps for consistency with other steps.
  • ULID regex corrected: Fixed Crockford Base32 regex from [0-9A-HJ-NP-Z] to [0-9A-HJ-KM-NP-TV-Z] which correctly excludes I, L, O, U.
  • Consistent Traceback/INTERNAL checks: Added Should Not Contain ... Traceback alongside all INTERNAL checks across all steps.
  • Git subprocess timeouts: Added timeout=60s on_timeout=kill to all three Run Process calls (git add, git commit, git rev-parse).
  • Test-case-level timeout: Added [Timeout] 30 minutes to prevent unbounded execution.
  • Keyword deduplication: Extract Plan Id now delegates to Extract First Id From Text for ULID/UUID scanning, keeping only its field-extraction fallback as additional logic.
  • Step renumbering: Sequential 1–17 (plus 5b for phase verification and 9b for investigation diff).

Manual Verification

Prerequisites

  • OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable set

Commands

nox -e e2e_tests

What to Look For

  • WF15 test passes (or SKIPs gracefully if no checkpoint available)
  • Decision IDs correctly extracted from plan tree JSON via targeted decision_id regex
  • Checkpoint IDs extracted from plan status --format json via targeted field regex
  • Recovery commands (rollback, correct, re-execute, apply) assert rc == 0
  • plan diff also asserts rc == 0
  • Final status only accepts applied or completed — not errored or generic terms
  • No Traceback or INTERNAL in any command's output
  • Test skips gracefully when no LLM keys are present

Deferred Items

  • Inline Python fixtures refactor (nit #18): Moving inline fixture strings to .py files loaded at runtime is a substantial refactor outside the scope of this ticket.
  • Extract Plan Id consolidation (nit #15): Both M1 and WF15 define local Extract Plan Id keywords with different signatures. Consolidating into common_e2e.resource is deferred to a separate ticket to avoid touching unrelated test files.
## Summary E2E test for Workflow Example 15 — disaster recovery: rollback a failed apply using the trusted profile. Tests plan rollback, plan correct (revert mode), correction diff, re-execution after correction, and the full recovery lifecycle. Closes #761 ISSUES CLOSED: #761 ## Changes - **Decision ID extraction (Major fix):** Replaced generic `Extract First Id From Text` with dedicated `Extract Decision Id From Tree` keyword that uses a targeted regex (`"decision_id"\s*:\s*"<ULID>"`) to avoid extracting `plan_id` from JSON tree output. This ensures `plan explain` and `plan correct` receive the correct decision ID. - **Checkpoint ID extraction (Major fix):** Replaced broken first-ULID extraction from `plan artifacts` with dedicated `Extract Checkpoint Id From Status` keyword that targets `last_checkpoint_id` / `checkpoint_id` fields in `plan status --format json` output. Removed the dangerous fallback that silently used `plan_id` as checkpoint. When no checkpoint is available, the test gracefully skips with a clear message via `Skip If`. - **AC#9 final status assertion (Major fix):** Removed tautological terms (`'phase'`, `'errored'`, `'cancelled'`) from the terminal state assertion. After successful recovery, only `'applied'` or `'completed'` are accepted as valid terminal states. - **Return code assertions on recovery steps (Major fix):** Added `Should Be Equal As Integers ${r_X.rc} 0` assertions to all critical recovery commands including `plan diff` (step 14), which previously lacked an rc check. The initial `plan apply` and `plan artifacts` still use `expected_rc=None` without rc checks since those may legitimately fail. - **ROOT CAUSE annotation enforcement (Major fix — AC#4):** Added a secondary `plan tree --format plain` request specifically for the ROOT CAUSE display annotation check. Also searches JSON tree and status output for `root_cause`/`root cause` references. Logs WARN (rather than failing) when not found, since LLM behavior is non-deterministic and may not annotate root cause. - **Checkpoint skip with explicit documentation (Major fix — AC#6–AC#9):** Replaced terse `Skip If` with thoroughly documented skip explaining when checkpoint absence is expected (LLM behavior) vs when it indicates a problem. Added guidance to investigate persistent skips. - **Errored state verification (Major fix — Issue #3):** Added dedicated assertion after initial apply/status to verify the plan entered an errored/failed state. Logs WARN when state is not errored (LLM may produce a successful plan), continuing with recovery steps regardless. - **Correction ID extraction (Major fix — Issue #5):** Replaced generic `Extract First Id From Text` on combined stdout+stderr with dedicated `Extract Correction Id From Output` keyword that targets the `"correction_id"` field specifically. Falls back to first ULID from stdout only (not stderr) to avoid extracting stray IDs. - **Dead ELSE branch removed (Major fix — Issue #6):** Removed unreachable IF/ELSE guard on correction ID extraction (rc==0 already guaranteed by prior assertion). Correction ID is now always extracted directly. - **AC#3 status assertion tightened:** Removed generic `'execute'` term that matches field labels in every status output. Kept only specific state values: `'processing'`, `'errored'`, `'applied'`. - **AC#5 explain assertion improved (Issue #7, #20):** Removed overly generic terms (`'decision'`, `'question'`) that would match any explain output. Replaced with specific forensic content terms: `'rationale'`, `'snapshot'`, `'alternative'`, `'chosen'`, `'confidence'`, `'context'`. - **AC#7 correct assertion improved (Issue #7):** Removed generic `'correct'` (matches command name) and `'status'`. Replaced with more specific terms: `'correction_id'`, `'corrected'`, `'applied'`, `'revert'`. - **AC#8 diff assertion strengthened (Issue #7):** Replaced generic `'file'` and `'change'` with diff-specific markers: `'---'`, `'+++'`, `'original'`, `'modified'`. - **AC#6 rollback assertion tightened (Issue #12):** Removed generic `'complete'` from the disjunction. Kept `'rollback'`, `'restored'`, `'revert'`, `'checkpoint'`. - **`plan correct` — removed undocumented `--plan` flag (Issue #8):** The spec signature for `plan correct` does not include a `--plan` flag. Removed to align with documented spec workflow. - **Missing `plan diff` investigation step added (Issue #9):** Added `plan diff ${plan_id} --format plain` step between tree (step 9) and explain (step 10), matching the spec WF15 workflow which shows investigating the diff before explain. - **`plan apply` instead of `lifecycle-apply` (Issue #10):** Changed both initial and final apply commands from `plan lifecycle-apply` to `plan apply` with `--yes` flag to match the spec-documented command. - **Phase advancement verification (Issue #13):** Added `plan status` check after strategize (step 5b) to verify phase transition before executing the execute phase. - **Consistent explicit timeouts (Issue #14):** Added explicit `timeout=120s` to steps 8, 9, 10, and 17 which previously relied on the default. - **ULID regex consolidation comment (Issue #15):** Added comment in `Extract Plan Id` keyword noting that M1 defines a similar keyword and consolidation is deferred to a separate ticket. - **`plan diff --correction` spec alignment (Issue #16):** Fixed `plan diff` invocation when correction_id is available to use `plan diff --correction ${correction_id}` without also passing `plan_id`, matching the spec's either/or notation. - **`Extract Plan Id` fallback regex tightened (Issue #17):** Replaced overly broad `[\w-]+` regex with ULID/UUID-specific patterns: `[0-9A-HJ-KM-NP-TV-Z]{26}` and UUID 8-4-4-4-12 format. - **Explicit `expected_rc` on steps 5 and 6 (Issue #19):** Added explicit `expected_rc=${0}` to strategize and execute steps for consistency with other steps. - **ULID regex corrected:** Fixed Crockford Base32 regex from `[0-9A-HJ-NP-Z]` to `[0-9A-HJ-KM-NP-TV-Z]` which correctly excludes I, L, O, U. - **Consistent Traceback/INTERNAL checks:** Added `Should Not Contain ... Traceback` alongside all `INTERNAL` checks across all steps. - **Git subprocess timeouts:** Added `timeout=60s on_timeout=kill` to all three `Run Process` calls (`git add`, `git commit`, `git rev-parse`). - **Test-case-level timeout:** Added `[Timeout] 30 minutes` to prevent unbounded execution. - **Keyword deduplication:** `Extract Plan Id` now delegates to `Extract First Id From Text` for ULID/UUID scanning, keeping only its field-extraction fallback as additional logic. - **Step renumbering:** Sequential 1–17 (plus 5b for phase verification and 9b for investigation diff). ## Manual Verification ### Prerequisites - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variable set ### Commands ```bash nox -e e2e_tests ``` ### What to Look For - WF15 test passes (or SKIPs gracefully if no checkpoint available) - Decision IDs correctly extracted from `plan tree` JSON via targeted `decision_id` regex - Checkpoint IDs extracted from `plan status --format json` via targeted field regex - Recovery commands (rollback, correct, re-execute, apply) assert `rc == 0` - `plan diff` also asserts `rc == 0` - Final status only accepts `applied` or `completed` — not `errored` or generic terms - No `Traceback` or `INTERNAL` in any command's output - Test skips gracefully when no LLM keys are present ## Deferred Items - **Inline Python fixtures refactor (nit #18):** Moving inline fixture strings to `.py` files loaded at runtime is a substantial refactor outside the scope of this ticket. - **`Extract Plan Id` consolidation (nit #15):** Both M1 and WF15 define local `Extract Plan Id` keywords with different signatures. Consolidating into `common_e2e.resource` is deferred to a separate ticket to avoid touching unrelated test files.
test(e2e): workflow example 15 — disaster recovery, rollback a failed apply (trusted profile)
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 31s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 41s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 29s
CI / unit_tests (pull_request) Successful in 5m9s
CI / integration_tests (pull_request) Successful in 5m44s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 7m42s
CI / benchmark-regression (pull_request) Successful in 38m0s
a0377ba8b1
Add end-to-end Robot Framework test for WF15: Disaster Recovery —
Rollback a Failed Apply.  The test exercises plan execution with the
trusted automation profile, then investigates via plan status, plan tree,
plan explain (--show-context --show-reasoning), plan rollback, plan
correct (--mode revert with corrective guidance), and plan diff
(--correction) before re-executing and applying.

Zero mocking — real CLI, real LLM API keys, real subprocess execution.
Fixture code provides a connection-pool module with known exhaustion risk.

ISSUES CLOSED: #761
freemo added this to the v3.2.0 milestone 2026-03-13 01:16:26 +00:00
freemo force-pushed test/e2e-wf15-disaster-recovery from a0377ba8b1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 31s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 41s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 29s
CI / unit_tests (pull_request) Successful in 5m9s
CI / integration_tests (pull_request) Successful in 5m44s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 7m42s
CI / benchmark-regression (pull_request) Successful in 38m0s
to 803fde3e97
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 25s
CI / e2e_tests (pull_request) Failing after 25s
CI / build (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 37s
CI / integration_tests (pull_request) Successful in 2m59s
CI / unit_tests (pull_request) Successful in 3m9s
CI / docker (pull_request) Successful in 1m0s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 16:24:29 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 803fde3e97
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 25s
CI / e2e_tests (pull_request) Failing after 25s
CI / build (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 37s
CI / integration_tests (pull_request) Successful in 2m59s
CI / unit_tests (pull_request) Successful in 3m9s
CI / docker (pull_request) Successful in 1m0s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 8ef5722458
Some checks failed
CI / lint (pull_request) Successful in 14s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 28s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Failing after 40s
CI / unit_tests (pull_request) Successful in 2m28s
CI / docker (pull_request) Successful in 51s
CI / integration_tests (pull_request) Successful in 3m46s
CI / coverage (pull_request) Successful in 5m17s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 16:31:36 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 8ef5722458
Some checks failed
CI / lint (pull_request) Successful in 14s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 28s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Failing after 40s
CI / unit_tests (pull_request) Successful in 2m28s
CI / docker (pull_request) Successful in 51s
CI / integration_tests (pull_request) Successful in 3m46s
CI / coverage (pull_request) Successful in 5m17s
CI / benchmark-regression (pull_request) Has been cancelled
to 93827a0e26
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 16s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 33s
CI / e2e_tests (pull_request) Failing after 32s
CI / unit_tests (pull_request) Successful in 3m5s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m29s
CI / coverage (pull_request) Successful in 5m26s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 16:42:18 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 93827a0e26
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 16s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 33s
CI / e2e_tests (pull_request) Failing after 32s
CI / unit_tests (pull_request) Successful in 3m5s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m29s
CI / coverage (pull_request) Successful in 5m26s
CI / benchmark-regression (pull_request) Has been cancelled
to 0e17b080d4
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Failing after 26s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 31s
CI / unit_tests (pull_request) Successful in 2m17s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 2m44s
CI / coverage (pull_request) Successful in 4m54s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:00:52 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 0e17b080d4
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Failing after 26s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 31s
CI / unit_tests (pull_request) Successful in 2m17s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 2m44s
CI / coverage (pull_request) Successful in 4m54s
CI / benchmark-regression (pull_request) Has been cancelled
to 62dd9d81fd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 20s
CI / build (pull_request) Successful in 22s
CI / security (pull_request) Successful in 32s
CI / e2e_tests (pull_request) Failing after 31s
CI / typecheck (pull_request) Successful in 46s
CI / integration_tests (pull_request) Successful in 3m4s
CI / unit_tests (pull_request) Successful in 3m16s
CI / docker (pull_request) Successful in 10s
CI / coverage (pull_request) Successful in 5m12s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:28:38 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 62dd9d81fd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 20s
CI / build (pull_request) Successful in 22s
CI / security (pull_request) Successful in 32s
CI / e2e_tests (pull_request) Failing after 31s
CI / typecheck (pull_request) Successful in 46s
CI / integration_tests (pull_request) Successful in 3m4s
CI / unit_tests (pull_request) Successful in 3m16s
CI / docker (pull_request) Successful in 10s
CI / coverage (pull_request) Successful in 5m12s
CI / benchmark-regression (pull_request) Has been cancelled
to 47436aca0a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 16s
CI / build (pull_request) Successful in 22s
CI / typecheck (pull_request) Successful in 31s
CI / security (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Successful in 54s
CI / integration_tests (pull_request) Successful in 3m4s
CI / unit_tests (pull_request) Successful in 5m7s
CI / docker (pull_request) Successful in 27s
CI / coverage (pull_request) Successful in 5m15s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:46:51 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 47436aca0a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 16s
CI / build (pull_request) Successful in 22s
CI / typecheck (pull_request) Successful in 31s
CI / security (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Successful in 54s
CI / integration_tests (pull_request) Successful in 3m4s
CI / unit_tests (pull_request) Successful in 5m7s
CI / docker (pull_request) Successful in 27s
CI / coverage (pull_request) Successful in 5m15s
CI / benchmark-regression (pull_request) Has been cancelled
to a0377ba8b1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 31s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 41s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 29s
CI / unit_tests (pull_request) Successful in 5m9s
CI / integration_tests (pull_request) Successful in 5m44s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 7m42s
CI / benchmark-regression (pull_request) Successful in 38m0s
2026-03-13 18:13:02 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from a0377ba8b1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 31s
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 41s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 29s
CI / unit_tests (pull_request) Successful in 5m9s
CI / integration_tests (pull_request) Successful in 5m44s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 7m42s
CI / benchmark-regression (pull_request) Successful in 38m0s
to 0111d02d65
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 31s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 25s
CI / security (pull_request) Successful in 35s
CI / unit_tests (pull_request) Successful in 2m7s
CI / docker (pull_request) Successful in 11s
CI / integration_tests (pull_request) Successful in 3m1s
CI / coverage (pull_request) Successful in 5m0s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 18:14:21 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 0111d02d65
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 31s
CI / build (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Failing after 25s
CI / security (pull_request) Successful in 35s
CI / unit_tests (pull_request) Successful in 2m7s
CI / docker (pull_request) Successful in 11s
CI / integration_tests (pull_request) Successful in 3m1s
CI / coverage (pull_request) Successful in 5m0s
CI / benchmark-regression (pull_request) Has been cancelled
to 9884fbb4a5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 16s
CI / build (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 34s
CI / security (pull_request) Successful in 33s
CI / e2e_tests (pull_request) Failing after 51s
CI / unit_tests (pull_request) Successful in 4m32s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 4m58s
CI / coverage (pull_request) Successful in 5m57s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 18:25:18 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from 9884fbb4a5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 16s
CI / build (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 34s
CI / security (pull_request) Successful in 33s
CI / e2e_tests (pull_request) Failing after 51s
CI / unit_tests (pull_request) Successful in 4m32s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 4m58s
CI / coverage (pull_request) Successful in 5m57s
CI / benchmark-regression (pull_request) Has been cancelled
to a3ccdaba41
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 16s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Successful in 55s
CI / unit_tests (pull_request) Successful in 2m7s
CI / integration_tests (pull_request) Successful in 2m34s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 5m27s
CI / benchmark-regression (pull_request) Successful in 33m24s
2026-03-13 18:36:45 +00:00
Compare
freemo force-pushed test/e2e-wf15-disaster-recovery from a3ccdaba41
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 16s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Successful in 55s
CI / unit_tests (pull_request) Successful in 2m7s
CI / integration_tests (pull_request) Successful in 2m34s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 5m27s
CI / benchmark-regression (pull_request) Successful in 33m24s
to cc07e67799
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 28s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 38s
CI / e2e_tests (pull_request) Failing after 36s
CI / integration_tests (pull_request) Successful in 2m57s
CI / unit_tests (pull_request) Successful in 3m44s
CI / docker (pull_request) Successful in 7s
CI / coverage (pull_request) Successful in 5m9s
CI / benchmark-regression (pull_request) Successful in 34m6s
2026-03-13 23:19:50 +00:00
Compare
Author
Owner

PM Review — Day 34

Status: Mergeable, 0 reviews, M3 (v3.2.0)
Author: @freemo

E2E test for WF15 (disaster recovery, rollback a failed apply). Most comprehensive test in the M3 batch — 15 sequential steps covering forensic inspection (--show-context --show-reasoning), rollback, correction (revert mode), diff --correction, re-execute, and lifecycle-apply.

See #799 for common review notes.

Action Items

Who Action Deadline
@hurui200320 Peer review Day 36
## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M3 (v3.2.0) **Author**: @freemo E2E test for WF15 (disaster recovery, rollback a failed apply). Most comprehensive test in the M3 batch — 15 sequential steps covering forensic inspection (`--show-context --show-reasoning`), rollback, correction (revert mode), `diff --correction`, re-execute, and lifecycle-apply. **See #799 for common review notes.** ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @hurui200320 | **Peer review** | Day 36 |
Author
Owner

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has been in review for 2+ days with 0 reviewer activity.

Reminder: Assigned reviewer — please post your review by Day 37 EOD or flag any blockers. These E2E test PRs are foundational for milestone acceptance gates and cannot remain unreviewed indefinitely.

If you are unable to review by the deadline, please comment so the review can be reassigned.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has been in review for 2+ days with 0 reviewer activity. **Reminder**: Assigned reviewer — please post your review by **Day 37 EOD** or flag any blockers. These E2E test PRs are foundational for milestone acceptance gates and cannot remain unreviewed indefinitely. If you are unable to review by the deadline, please comment so the review can be reassigned.
Author
Owner

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.
Author
Owner

PM Status — Day 37 — Rebase Required

This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.

@freemo — Please rebase this PR onto master by Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.


PM rebase request — Day 37

## PM Status — Day 37 — Rebase Required This PR has **merge conflicts** and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved. @freemo — Please rebase this PR onto `master` by **Day 39 EOD (2026-03-19)**. If you cannot rebase by then, please post a comment explaining the blocker. --- *PM rebase request — Day 37*
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from cc07e67799
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 28s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 38s
CI / e2e_tests (pull_request) Failing after 36s
CI / integration_tests (pull_request) Successful in 2m57s
CI / unit_tests (pull_request) Successful in 3m44s
CI / docker (pull_request) Successful in 7s
CI / coverage (pull_request) Successful in 5m9s
CI / benchmark-regression (pull_request) Successful in 34m6s
to 700aef534c
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 43s
CI / quality (pull_request) Successful in 28s
CI / security (pull_request) Successful in 48s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / unit_tests (pull_request) Successful in 3m13s
CI / integration_tests (pull_request) Successful in 4m32s
CI / e2e_tests (pull_request) Failing after 4m57s
CI / docker (pull_request) Successful in 58s
CI / coverage (pull_request) Successful in 6m55s
CI / benchmark-regression (pull_request) Successful in 39m56s
2026-03-18 08:45:57 +00:00
Compare
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from 700aef534c
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 43s
CI / quality (pull_request) Successful in 28s
CI / security (pull_request) Successful in 48s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / unit_tests (pull_request) Successful in 3m13s
CI / integration_tests (pull_request) Successful in 4m32s
CI / e2e_tests (pull_request) Failing after 4m57s
CI / docker (pull_request) Successful in 58s
CI / coverage (pull_request) Successful in 6m55s
CI / benchmark-regression (pull_request) Successful in 39m56s
to d0bf51d77d
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 39s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 3m58s
CI / typecheck (pull_request) Successful in 4m12s
CI / integration_tests (pull_request) Successful in 6m42s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 8m39s
CI / coverage (pull_request) Successful in 12m15s
CI / status-check (pull_request) Successful in 16s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-23 05:54:29 +00:00
Compare
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from d0bf51d77d
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 39s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 3m58s
CI / typecheck (pull_request) Successful in 4m12s
CI / integration_tests (pull_request) Successful in 6m42s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 8m39s
CI / coverage (pull_request) Successful in 12m15s
CI / status-check (pull_request) Successful in 16s
CI / benchmark-regression (pull_request) Has been cancelled
to 12c0e78854
Some checks failed
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 22s
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 4m5s
CI / unit_tests (pull_request) Successful in 6m44s
CI / e2e_tests (pull_request) Successful in 9m16s
CI / coverage (pull_request) Successful in 10m15s
CI / integration_tests (pull_request) Failing after 17m57s
CI / quality (pull_request) Failing after 17m59s
CI / security (pull_request) Failing after 17m59s
CI / benchmark-regression (pull_request) Failing after 4h39m1s
2026-03-23 07:03:52 +00:00
Compare
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from 12c0e78854
Some checks failed
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 22s
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 4m5s
CI / unit_tests (pull_request) Successful in 6m44s
CI / e2e_tests (pull_request) Successful in 9m16s
CI / coverage (pull_request) Successful in 10m15s
CI / integration_tests (pull_request) Failing after 17m57s
CI / quality (pull_request) Failing after 17m59s
CI / security (pull_request) Failing after 17m59s
CI / benchmark-regression (pull_request) Failing after 4h39m1s
to 08878c03a1
Some checks failed
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m43s
CI / typecheck (pull_request) Successful in 4m21s
CI / security (pull_request) Successful in 4m32s
CI / integration_tests (pull_request) Successful in 6m7s
CI / benchmark-publish (pull_request) Has been skipped
CI / unit_tests (pull_request) Successful in 7m28s
CI / docker (pull_request) Successful in 1m1s
CI / e2e_tests (pull_request) Successful in 9m53s
CI / coverage (pull_request) Successful in 11m42s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 27m41s
2026-03-23 13:22:08 +00:00
Compare
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from 08878c03a1
Some checks failed
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m43s
CI / typecheck (pull_request) Successful in 4m21s
CI / security (pull_request) Successful in 4m32s
CI / integration_tests (pull_request) Successful in 6m7s
CI / benchmark-publish (pull_request) Has been skipped
CI / unit_tests (pull_request) Successful in 7m28s
CI / docker (pull_request) Successful in 1m1s
CI / e2e_tests (pull_request) Successful in 9m53s
CI / coverage (pull_request) Successful in 11m42s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 27m41s
to 5d279081b8
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / lint (pull_request) Successful in 5m2s
CI / quality (pull_request) Successful in 5m39s
CI / typecheck (pull_request) Successful in 5m43s
CI / security (pull_request) Successful in 5m48s
CI / integration_tests (pull_request) Successful in 8m29s
CI / unit_tests (pull_request) Successful in 9m9s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 13m5s
CI / coverage (pull_request) Successful in 11m23s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 58m54s
2026-03-24 05:40:51 +00:00
Compare
hurui200320 force-pushed test/e2e-wf15-disaster-recovery from 5d279081b8
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / lint (pull_request) Successful in 5m2s
CI / quality (pull_request) Successful in 5m39s
CI / typecheck (pull_request) Successful in 5m43s
CI / security (pull_request) Successful in 5m48s
CI / integration_tests (pull_request) Successful in 8m29s
CI / unit_tests (pull_request) Successful in 9m9s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 13m5s
CI / coverage (pull_request) Successful in 11m23s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 58m54s
to 0d3371a274
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 37s
CI / lint (pull_request) Successful in 5m52s
CI / quality (pull_request) Successful in 6m39s
CI / typecheck (pull_request) Successful in 6m41s
CI / security (pull_request) Successful in 6m54s
CI / integration_tests (pull_request) Successful in 11m48s
CI / unit_tests (pull_request) Successful in 12m7s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 16m17s
CI / coverage (pull_request) Successful in 11m34s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h30m39s
2026-03-27 09:59:43 +00:00
Compare
freemo self-assigned this 2026-04-02 06:15:23 +00:00
freemo force-pushed test/e2e-wf15-disaster-recovery from 0d3371a274
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 37s
CI / lint (pull_request) Successful in 5m52s
CI / quality (pull_request) Successful in 6m39s
CI / typecheck (pull_request) Successful in 6m41s
CI / security (pull_request) Successful in 6m54s
CI / integration_tests (pull_request) Successful in 11m48s
CI / unit_tests (pull_request) Successful in 12m7s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 16m17s
CI / coverage (pull_request) Successful in 11m34s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h30m39s
to 91bcffe1b5
Some checks failed
CI / lint (pull_request) Failing after 5s
CI / typecheck (pull_request) Failing after 5s
CI / coverage (pull_request) Has been skipped
CI / security (pull_request) Failing after 5s
CI / quality (pull_request) Failing after 5s
CI / unit_tests (pull_request) Failing after 5s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 5s
CI / e2e_tests (pull_request) Failing after 5s
CI / build (pull_request) Failing after 4s
CI / helm (pull_request) Failing after 4s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
2026-04-02 07:00:21 +00:00
Compare
Author
Owner

🤖 Backlog Groomer (groomer-1): Closing as duplicate of #761.

Issue #761 (test(e2e): workflow example 15 — disaster recovery, rollback a failed apply) is the canonical version with full labels (MoSCoW/Must have, Priority/Critical, State/In Review, Type/Testing) and milestone v3.2.0. This issue is an exact title duplicate.

🤖 **Backlog Groomer (groomer-1):** Closing as duplicate of #761. Issue #761 (`test(e2e): workflow example 15 — disaster recovery, rollback a failed apply`) is the canonical version with full labels (`MoSCoW/Must have`, `Priority/Critical`, `State/In Review`, `Type/Testing`) and milestone `v3.2.0`. This issue is an exact title duplicate.
freemo closed this pull request 2026-04-02 17:33:07 +00:00
Some checks failed
CI / lint (pull_request) Failing after 5s
Required
Details
CI / typecheck (pull_request) Failing after 5s
Required
Details
CI / coverage (pull_request) Has been skipped
Required
Details
CI / security (pull_request) Failing after 5s
Required
Details
CI / quality (pull_request) Failing after 5s
Required
Details
CI / unit_tests (pull_request) Failing after 5s
Required
Details
CI / docker (pull_request) Has been skipped
Required
Details
CI / integration_tests (pull_request) Failing after 5s
Required
Details
CI / e2e_tests (pull_request) Failing after 5s
CI / build (pull_request) Failing after 4s
Required
Details
CI / helm (pull_request) Failing after 4s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!802
No description provided.