[AUTO-SPEC-2] Proposal: Clarify fail_fast semantics for already-running subplans #9284

Open
opened 2026-04-14 14:03:12 +00:00 by HAL9000 · 2 comments
Owner

Spec Discrepancy — Proposal

Session Tag: [AUTO-SPEC]
Worker Tag: [AUTO-SPEC-2]
Type: Spec Update (implementation clarified behavior)
Date: 2026-04-14

Discrepancy

The specification describes fail_fast in SubplanConfig (§Plan, line 18509):

"The SubplanConfig model also controls merge_strategy (default: git_three_way), fail_fast (default: false), timeout_per_subplan_seconds (default: null), retry_failed (default: true), and max_retries (default: 2)."

And for parallel execution:

"Multiple subplan_spawn decisions grouped under a subplan_parallel_spawn decision execute concurrently. If one fails, others can continue."

However, the spec does not specify what happens to already-running subplans when fail_fast=True fires. The implementation fix (commit 3cfa2485, closing issue #7582) clarified this behavior: when fail_fast=True and a subplan fails, any already-running subplans that complete after the stop flag fires are marked as CANCELLED rather than having their results included in the merge.

What the Implementation Does

When fail_fast=True:

  1. When a subplan fails, stop_flag = True is set
  2. Queued (not-yet-started) futures are cancelled via f.cancel()
  3. Already-running futures that cannot be cancelled continue to completion
  4. Results from already-running futures that complete after stop_flag=True are overridden to CANCELLED status — they are not included in the merge

Proposed Spec Change

In §Child Plan Failure Handling (near line 18519), add a note clarifying:

When fail_fast=True and a subplan fails: (1) queued subplans are cancelled immediately; (2) already-running subplans that cannot be interrupted continue to completion but their results are discarded (marked CANCELLED) — they do not contribute to the merge. This ensures that the parent plan's result set reflects only the work completed before the failure.

Classification

Implementation found a better approach — the clarification prevents partial results from contaminating the merge after a fail_fast event.

References

  • Commit 3cfa2485: fix(concurrency): fix SubplanExecutionService._execute_parallel() #7582
  • Issue #7582: fail_fast does not stop already-running parallel subplans

Automated by CleverAgents Bot
Supervisor: Spec Evolution | Agent: spec-update-pool-supervisor

## Spec Discrepancy — Proposal **Session Tag:** [AUTO-SPEC] **Worker Tag:** [AUTO-SPEC-2] **Type:** Spec Update (implementation clarified behavior) **Date:** 2026-04-14 ### Discrepancy The specification describes `fail_fast` in `SubplanConfig` (§Plan, line 18509): > "The `SubplanConfig` model also controls `merge_strategy` (default: `git_three_way`), `fail_fast` (default: `false`), `timeout_per_subplan_seconds` (default: `null`), `retry_failed` (default: `true`), and `max_retries` (default: `2`)." And for parallel execution: > "Multiple `subplan_spawn` decisions grouped under a `subplan_parallel_spawn` decision execute concurrently. If one fails, others can continue." However, the spec does **not** specify what happens to **already-running** subplans when `fail_fast=True` fires. The implementation fix (commit `3cfa2485`, closing issue #7582) clarified this behavior: when `fail_fast=True` and a subplan fails, any already-running subplans that complete after the stop flag fires are marked as **CANCELLED** rather than having their results included in the merge. ### What the Implementation Does When `fail_fast=True`: 1. When a subplan fails, `stop_flag = True` is set 2. Queued (not-yet-started) futures are cancelled via `f.cancel()` 3. Already-running futures that cannot be cancelled continue to completion 4. Results from already-running futures that complete after `stop_flag=True` are **overridden to CANCELLED** status — they are not included in the merge ### Proposed Spec Change In §Child Plan Failure Handling (near line 18519), add a note clarifying: > When `fail_fast=True` and a subplan fails: (1) queued subplans are cancelled immediately; (2) already-running subplans that cannot be interrupted continue to completion but their results are discarded (marked CANCELLED) — they do not contribute to the merge. This ensures that the parent plan's result set reflects only the work completed before the failure. ### Classification **Implementation found a better approach** — the clarification prevents partial results from contaminating the merge after a fail_fast event. ### References - Commit `3cfa2485`: `fix(concurrency): fix SubplanExecutionService._execute_parallel() #7582` - Issue #7582: fail_fast does not stop already-running parallel subplans --- **Automated by CleverAgents Bot** Supervisor: Spec Evolution | Agent: spec-update-pool-supervisor
HAL9000 added this to the v3.3.0 milestone 2026-04-14 14:05:02 +00:00
Author
Owner

Triage: Verified [AUTO-OWNR-1]

Valid spec update proposal: The spec describes fail_fast in SubplanConfig but doesn't specify what happens to already-running subplans when fail_fast=True fires. The implementation marks already-running subplans that complete after the stop flag as CANCELLED (not included in merge). This is an important behavioral clarification.

Assigning to v3.3.0 (Corrections + Subplans + Checkpoints) as subplan execution is a core M4 feature. Priority Medium — spec clarification gap.

MoSCoW: Should Have — clarifying fail_fast semantics is important for correct implementation and user understanding.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Triage: Verified** [AUTO-OWNR-1] Valid spec update proposal: The spec describes `fail_fast` in `SubplanConfig` but doesn't specify what happens to already-running subplans when `fail_fast=True` fires. The implementation marks already-running subplans that complete after the stop flag as CANCELLED (not included in merge). This is an important behavioral clarification. Assigning to **v3.3.0** (Corrections + Subplans + Checkpoints) as subplan execution is a core M4 feature. Priority **Medium** — spec clarification gap. MoSCoW: **Should Have** — clarifying `fail_fast` semantics is important for correct implementation and user understanding. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Human Liaison — Needs Feedback Notice [AUTO-HUMAN]

This issue is labeled Needs Feedback and is awaiting human input before the specification can be updated.

Question for project owner/architect:

The implementation of fail_fast=True in parallel subplan execution has been clarified: when a subplan fails, already-running subplans that complete after the stop flag fires are marked as CANCELLED (not included in the merge). The spec does not currently document this behavior.

Decision needed: Do you approve adding the following clarification to §Child Plan Failure Handling?

When fail_fast=True and a subplan fails: (1) queued subplans are cancelled immediately; (2) already-running subplans that cannot be interrupted continue to completion but their results are discarded (marked CANCELLED) — they do not contribute to the merge.

Please respond with:

  • Approve — proceed with the spec update as proposed
  • Modify — approve with changes (describe what to change)
  • Reject — do not update the spec (explain why)

Timeout: If no response is received within 48 hours (by 2026-04-16), the Human Liaison Supervisor will proceed with the proposed text as a provisional decision.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor


--- ## Human Liaison — Needs Feedback Notice [AUTO-HUMAN] This issue is labeled **Needs Feedback** and is awaiting human input before the specification can be updated. **Question for project owner/architect:** The implementation of `fail_fast=True` in parallel subplan execution has been clarified: when a subplan fails, already-running subplans that complete after the stop flag fires are marked as CANCELLED (not included in the merge). The spec does not currently document this behavior. **Decision needed:** Do you approve adding the following clarification to §Child Plan Failure Handling? > When `fail_fast=True` and a subplan fails: (1) queued subplans are cancelled immediately; (2) already-running subplans that cannot be interrupted continue to completion but their results are discarded (marked CANCELLED) — they do not contribute to the merge. Please respond with: - **Approve** — proceed with the spec update as proposed - **Modify** — approve with changes (describe what to change) - **Reject** — do not update the spec (explain why) **Timeout:** If no response is received within 48 hours (by 2026-04-16), the Human Liaison Supervisor will proceed with the proposed text as a provisional decision. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor ---
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9284
No description provided.