bug(subplan): SubplanService.validate_spawn incorrectly rejects spawn requests where total entries exceed max_parallelmax_parallel is a concurrency cap, not a total spawn cap #3567

Open
opened 2026-04-05 19:48:39 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/subplan-validate-spawn-max-parallel
  • Commit Message: fix(subplan): remove incorrect total-entry cap from validate_spawn — max_parallel is a concurrency bound, not a spawn count limit
  • Milestone: (none — backlog)
  • Parent Epic: #368

Backlog note: This issue was discovered during autonomous operation
on milestone v3.4.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Description

SubplanService.validate_spawn() contains a validation check that rejects spawn requests where the number of spawn entries exceeds max_parallel in PARALLEL mode. However, the spec defines max_parallel as a cap on concurrent execution, not on the total number of subplans that may be spawned.

SubplanExecutionService._execute_parallel() already handles this correctly by using min(max_parallel, len(statuses)) workers — meaning more subplans than max_parallel are perfectly valid and simply run in batches. The validation in validate_spawn() is therefore overly restrictive and directly contradicts the execution service's behaviour.

Spec Reference

Parallel execution is bounded by SubplanConfig.max_parallel (default: 5, range: 1–50). This cap prevents runaway resource consumption when a large number of child plans are spawned simultaneously. The runtime uses a ThreadPoolExecutor with min(max_parallel, len(subplans)) workers.

The spec explicitly states max_parallel caps concurrent workers, not total subplan count.

Faulty Code

src/cleveragents/application/services/subplan_service.py, lines 362–368:

if config.execution_mode == ExecutionMode.PARALLEL:
    entry_count: int = len(spawn_entries)
    if entry_count > config.max_parallel:
        errors.append(
            f"Number of spawn entries ({entry_count}) exceeds "
            f"max_parallel bound ({config.max_parallel})"
        )

Correct Execution-Time Behaviour (for contrast)

src/cleveragents/application/services/subplan_execution_service.py, _execute_parallel() (~line 180):

max_workers = min(self._config.max_parallel, len(statuses))

This correctly handles more subplans than max_parallel by batching them.

Impact

Any attempt to spawn more than max_parallel (default: 5) subplans in PARALLEL mode fails with a SpawnValidationError, even though the execution engine fully supports it. This severely limits the usefulness of parallel subplan execution for large decompositions.

Steps to Reproduce

  1. Create a SubplanConfig with execution_mode=PARALLEL and max_parallel=5
  2. Create 6 spawn entries
  3. Call SubplanService.validate_spawn(config, spawn_entries)
  4. Observe: SpawnValidationError: Number of spawn entries (6) exceeds max_parallel bound (5)

Suggested Fix

Remove the entry_count > config.max_parallel guard from validate_spawn(). The max_parallel bound is enforced at execution time by SubplanExecutionService, not at spawn time. Optionally add a comment clarifying that max_parallel is a concurrency cap, not a spawn-count cap.

Subtasks

  • Write a TDD issue-capture Behave scenario (tagged @tdd_expected_fail) demonstrating the bug: spawning max_parallel + 1 entries in PARALLEL mode must not raise SpawnValidationError
  • Remove the entry_count > config.max_parallel validation block from SubplanService.validate_spawn() (lines 362–368 of subplan_service.py)
  • Add an inline comment in validate_spawn() clarifying that max_parallel is a concurrency cap enforced at execution time, not a spawn-count limit
  • Update or add a Behave scenario confirming that spawning N > max_parallel entries in PARALLEL mode succeeds validation and executes in batches of max_parallel workers
  • Verify SubplanExecutionService._execute_parallel() continues to use min(max_parallel, len(statuses)) workers (no change expected, just confirm)
  • Run nox -e unit_tests — all scenarios pass
  • Run nox -e integration_tests — all Robot Framework tests pass
  • Run nox -e typecheck — Pyright reports no errors
  • Run nox -e lint — no linting violations
  • Run nox -e coverage_report — coverage remains ≥ 97%
  • Run full nox (all default sessions) — all stages pass

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • The TDD issue-capture scenario (tagged @tdd_expected_fail) is committed first, demonstrating the bug, then removed/converted to a passing scenario in the fix commit.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the fix.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/subplan-validate-spawn-max-parallel).
  • The commit is submitted as a pull request to master, reviewed by at least two non-author contributors, and merged before this issue is marked done.
  • All nox stages pass (lint, typecheck, unit_tests, integration_tests, coverage_report).
  • Coverage ≥ 97%.

Automated by CleverAgents Bot
Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/subplan-validate-spawn-max-parallel` - **Commit Message**: `fix(subplan): remove incorrect total-entry cap from validate_spawn — max_parallel is a concurrency bound, not a spawn count limit` - **Milestone**: *(none — backlog)* - **Parent Epic**: #368 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.4.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Description `SubplanService.validate_spawn()` contains a validation check that rejects spawn requests where the number of spawn entries exceeds `max_parallel` in `PARALLEL` mode. However, the spec defines `max_parallel` as a cap on *concurrent* execution, not on the total number of subplans that may be spawned. `SubplanExecutionService._execute_parallel()` already handles this correctly by using `min(max_parallel, len(statuses))` workers — meaning more subplans than `max_parallel` are perfectly valid and simply run in batches. The validation in `validate_spawn()` is therefore overly restrictive and directly contradicts the execution service's behaviour. ### Spec Reference > Parallel execution is bounded by `SubplanConfig.max_parallel` (default: `5`, range: 1–50). This cap prevents runaway resource consumption when a large number of child plans are spawned simultaneously. The runtime uses a `ThreadPoolExecutor` with `min(max_parallel, len(subplans))` workers. The spec explicitly states `max_parallel` caps *concurrent workers*, not total subplan count. ### Faulty Code `src/cleveragents/application/services/subplan_service.py`, lines 362–368: ```python if config.execution_mode == ExecutionMode.PARALLEL: entry_count: int = len(spawn_entries) if entry_count > config.max_parallel: errors.append( f"Number of spawn entries ({entry_count}) exceeds " f"max_parallel bound ({config.max_parallel})" ) ``` ### Correct Execution-Time Behaviour (for contrast) `src/cleveragents/application/services/subplan_execution_service.py`, `_execute_parallel()` (~line 180): ```python max_workers = min(self._config.max_parallel, len(statuses)) ``` This correctly handles more subplans than `max_parallel` by batching them. ### Impact Any attempt to spawn more than `max_parallel` (default: 5) subplans in `PARALLEL` mode fails with a `SpawnValidationError`, even though the execution engine fully supports it. This severely limits the usefulness of parallel subplan execution for large decompositions. ### Steps to Reproduce 1. Create a `SubplanConfig` with `execution_mode=PARALLEL` and `max_parallel=5` 2. Create 6 spawn entries 3. Call `SubplanService.validate_spawn(config, spawn_entries)` 4. Observe: `SpawnValidationError: Number of spawn entries (6) exceeds max_parallel bound (5)` ### Suggested Fix Remove the `entry_count > config.max_parallel` guard from `validate_spawn()`. The `max_parallel` bound is enforced at execution time by `SubplanExecutionService`, not at spawn time. Optionally add a comment clarifying that `max_parallel` is a concurrency cap, not a spawn-count cap. ## Subtasks - [ ] Write a TDD issue-capture Behave scenario (tagged `@tdd_expected_fail`) demonstrating the bug: spawning `max_parallel + 1` entries in `PARALLEL` mode must not raise `SpawnValidationError` - [ ] Remove the `entry_count > config.max_parallel` validation block from `SubplanService.validate_spawn()` (lines 362–368 of `subplan_service.py`) - [ ] Add an inline comment in `validate_spawn()` clarifying that `max_parallel` is a concurrency cap enforced at execution time, not a spawn-count limit - [ ] Update or add a Behave scenario confirming that spawning N > `max_parallel` entries in `PARALLEL` mode succeeds validation and executes in batches of `max_parallel` workers - [ ] Verify `SubplanExecutionService._execute_parallel()` continues to use `min(max_parallel, len(statuses))` workers (no change expected, just confirm) - [ ] Run `nox -e unit_tests` — all scenarios pass - [ ] Run `nox -e integration_tests` — all Robot Framework tests pass - [ ] Run `nox -e typecheck` — Pyright reports no errors - [ ] Run `nox -e lint` — no linting violations - [ ] Run `nox -e coverage_report` — coverage remains ≥ 97% - [ ] Run full `nox` (all default sessions) — all stages pass ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - The TDD issue-capture scenario (tagged `@tdd_expected_fail`) is committed first, demonstrating the bug, then removed/converted to a passing scenario in the fix commit. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the fix. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/subplan-validate-spawn-max-parallel`). - The commit is submitted as a **pull request** to `master`, reviewed by at least two non-author contributors, and **merged** before this issue is marked done. - All nox stages pass (lint, typecheck, unit_tests, integration_tests, coverage_report). - Coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.4.0 milestone 2026-04-05 20:04:17 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium — validate_spawn() incorrectly rejects valid spawn requests when entry count exceeds max_parallel. The spec explicitly defines max_parallel as a concurrency cap, not a total spawn cap. The execution engine already handles batching correctly.
  • Milestone: v3.4.0
  • Story Points: 1 — XS — Remove ~6 lines of incorrect validation code and add a clarifying comment. The fix is trivial and well-defined.
  • MoSCoW: Should Have — Correct subplan spawning behavior is important for parallel execution of large decompositions. The spec is unambiguous that max_parallel caps concurrent workers, not total spawn count.
  • Parent Epic: #368 (dependency link already exists)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium — `validate_spawn()` incorrectly rejects valid spawn requests when entry count exceeds `max_parallel`. The spec explicitly defines `max_parallel` as a concurrency cap, not a total spawn cap. The execution engine already handles batching correctly. - **Milestone**: v3.4.0 - **Story Points**: 1 — XS — Remove ~6 lines of incorrect validation code and add a clarifying comment. The fix is trivial and well-defined. - **MoSCoW**: Should Have — Correct subplan spawning behavior is important for parallel execution of large decompositions. The spec is unambiguous that `max_parallel` caps concurrent workers, not total spawn count. - **Parent Epic**: #368 (dependency link already exists) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.4.0 milestone 2026-04-06 23:38:27 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3567
No description provided.