fix(plan-executor): persist strategy decisions during Strategize phase #11138

Open
freemo wants to merge 2 commits from fix/issue-10813-strategize-decision-persistence into master
Owner

Summary

  • Fixes plan tree and plan correct CLI commands finding an empty decision tree
  • Wires StrategizeDecisionHook into the production PlanExecutor.run_strategize() path
  • Maintains JSON fallback in error_details""strategy_decisions_json"" for backward compatibility

Bug Description

PlanExecutor.run_strategize() produces StrategyDecision objects but was only storing them as JSON in
plan.error_details["strategy_decisions_json"]. Downstream CLI commands like plan tree and
plan correct query via DecisionService.list_decisions() which finds an empty tree because
decisions are never inserted into the database.

Fix

Wire StrategizeDecisionHook.record_strategy_choice() into the production execution path so every
strategy decision is persistently recorded during the Strategize phase. The JSON fallback in
error_details""strategy_decisions_json""` remains as a safety net for backward compatibility with _build_decisions`` which reconstructs the tree for the Execute phase.

Changes

  • src/cleveragents/application/services/plan_executor.py: Added decision_hook constructor parameter;
    persistence loop in run_strategize() that calls StrategizeDecisionHook.record_strategy_choice() for
    every decision with best-effort error handling (logs warning on failure, falls back to JSON)
  • features/plan_executor_decision_persistence.feature: 12 BDD scenarios covering positive path,
    graceful degradation on hook failure, and backward compatibility
  • features/steps/plan_executor_decision_persistence_steps.py: All step definitions exercising the
    persistence hooks via DecisionService queries
  • CHANGELOG.md / CONTRIBUTORS.md: Documentation updates
  • Parent issue: #10813
  • Fix type: bug fix

ISSUES CLOSES: #10813

## Summary - Fixes `plan tree` and `plan correct` CLI commands finding an empty decision tree - Wires `StrategizeDecisionHook` into the production `PlanExecutor.run_strategize()` path - Maintains JSON fallback in `error_details""strategy_decisions_json""` for backward compatibility ## Bug Description `PlanExecutor.run_strategize()` produces `StrategyDecision` objects but was only storing them as JSON in `plan.error_details["strategy_decisions_json"]`. Downstream CLI commands like ``plan tree`` and ``plan correct`` query via ``DecisionService.list_decisions()`` which finds an empty tree because decisions are never inserted into the database. ## Fix Wire ``StrategizeDecisionHook.record_strategy_choice()`` into the production execution path so every strategy decision is persistently recorded during the Strategize phase. The JSON fallback in ``error_details""strategy_decisions_json""` remains as a safety net for backward compatibility with ``_build_decisions`` which reconstructs the tree for the Execute phase. ## Changes - **`src/cleveragents/application/services/plan_executor.py`**: Added `decision_hook` constructor parameter; persistence loop in ``run_strategize()`` that calls ``StrategizeDecisionHook.record_strategy_choice()`` for every decision with best-effort error handling (logs warning on failure, falls back to JSON) - **`features/plan_executor_decision_persistence.feature`**: 12 BDD scenarios covering positive path, graceful degradation on hook failure, and backward compatibility - **`features/steps/plan_executor_decision_persistence_steps.py`**: All step definitions exercising the persistence hooks via ``DecisionService`` queries - **`CHANGELOG.md`** / **`CONTRIBUTORS.md`**: Documentation updates ## Related - Parent issue: #10813 - Fix type: bug fix ISSUES CLOSES: #10813
fix(plan-executor): persist strategy decisions during Strategize phase
Some checks failed
CI / push-validation (pull_request) Successful in 57s
CI / helm (pull_request) Successful in 59s
CI / build (pull_request) Successful in 1m18s
CI / lint (pull_request) Failing after 1m36s
CI / tdd_quality_gate (pull_request) Successful in 1m31s
CI / quality (pull_request) Successful in 2m3s
CI / security (pull_request) Successful in 2m7s
CI / typecheck (pull_request) Successful in 2m7s
CI / e2e_tests (pull_request) Successful in 4m3s
CI / integration_tests (pull_request) Successful in 7m14s
CI / unit_tests (pull_request) Failing after 10m35s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 3s
c42b989f51
PlanExecutor.run_strategize() produces StrategyDecision objects but was only
storing them as JSON in plan.error_details["strategy_decisions_json"]. Downstream
CLI commands like "plan tree" and "plan correct" query via DecisionService which
finds an empty tree because decisions are never inserted.

Wire StrategizeDecisionHook into the production execution path so every strategy
decision is persistently recorded during the Strategize phase, while maintaining
the JSON fallback for backward compatibility with _build_decisions.

Co-authored-by: <auto-generated>

ISSUES CLOSED: #10813
HAL9001 left a comment

Code Review — PR #11138

Review type: First review
CI status: FAILING — lint and unit_tests are failing; coverage was skipped as a downstream consequence of unit_tests failure.


Summary

This PR wires StrategizeDecisionHook into PlanExecutor.run_strategize() to fix decision persistence for Issue #10813. The approach and intent are correct, but there are several blocking defects across lint, test correctness, and semantic correctness that must be fixed before this can be merged.


CI Failures (Blocking)

Check Status
lint FAILING
unit_tests FAILING
coverage ⏭️ SKIPPED (depends on unit_tests)
typecheck passing
security passing
integration_tests passing
e2e_tests passing

Blocking Issues Found

1. Lint failure — unused noqa: BLE001 suppression (RUF100)

plan_executor.py line 816:

except Exception:  # noqa: BLE001

The ruff select list in pyproject.toml is ["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]. BLE is not in the selected rule set, so BLE001 is never checked. This makes # noqa: BLE001 an unused noqa directive, which ruff flags with RUF100 (unused-noqa). Fix: Remove the # noqa: BLE001 comment entirely. The broad except Exception: will not trigger any active ruff rule once the comment is removed.

2. Unit tests failing — multiple missing Behave step definitions

The feature file features/plan_executor_decision_persistence.feature references step texts that have no matching @given/@then decorator in features/steps/plan_executor_decision_persistence_steps.py or any other step file. Behave will raise StepNotImplementedError for each:

  • Given a dp plan in Strategize-Queued state with definition "Build feature\nAdd tests" — used in Background and two scenarios, never defined.
  • Given a dp plan in Strategize-Queued state with definition "Do work" — used in two scenarios, never defined.
  • Given a dp plan with no steps ("") in Strategize-Queued state — used in one scenario, never defined.
  • Given the dp mock lifecycle service (with the article "the", not "a") — used in Scenario 5, never defined (only "a dp mock lifecycle service" is defined).
  • Then the dp JSON fallback in error_details should **still** be populated — used in Scenario 4 (graceful degradation), never defined (only the variant without "still" is defined).

These missing step definitions mean every scenario that uses them will fail with a step-not-implemented error at runtime. These must all be added to the step file before the unit tests can pass.

3. Semantic bug — question and chosen_option both set to decision.step_text

In plan_executor.py at the new persistence loop:

self._decision_hook.record_strategy_choice(
    question=decision.step_text,
    chosen_option=decision.step_text,   # <-- identical to question
    confidence_score=None,
    rationale="",
)

record_strategy_choice() raises ValidationError if question or chosen_option is empty or whitespace. But more critically, using the same value for both question and chosen_option is semantically incorrect — a question and a chosen answer are distinct concepts. The StrategyDecision.step_text is the step description, which should map to chosen_option ("what was decided"), while question should capture what strategic question drove this decision. Passing step_text as both loses all decision provenance. At minimum, use a sensible placeholder for question (e.g., f"Strategy decision for step: {decision.step_text}") or derive both from richer decision metadata if available.

4. Semantic bug — tree structure lost; all decisions recorded with parent_decision_id=None

The StrategizeDecisionHook is constructed once with parent_decision_id=None (the default), and then used for every decision in the loop. The hook stores self.parent_decision_id and passes it unchanged to record_decision() for every call. This means all decisions are recorded as root-level nodes (no parent), regardless of the actual StrategyDecision.parent_id from the StrategizeResult.

The actual parent structure produced by StrategyActor (stored in StrategyDecision.parent_id) is never consulted. The Scenario "persisted decisions form a valid tree with correct parent-child relationships" will consequently fail because non-root decisions will have parent_decision_id=None instead of the root decisions ID.

Fix: Either (a) create a new hook instance per decision using the correct parent_id from StrategyDecision.parent_id, or (b) modify the hooks record_strategy_choice method to accept an optional parent_decision_id override per call.

5. Mock class defined inside step file — must be in features/mocks/

Per CONTRIBUTING.md: "All mocks, fakes, stubs, test doubles → features/mocks/ exclusively. Never in steps files, never in src/."

FailingDecisionSvc (a subclass of DecisionService) is defined twice inside features/steps/plan_executor_decision_persistence_steps.py (lines 144 and 339). This violates the mock placement rule. Move it to features/mocks/ as a named class and import it from there.


Non-Blocking Observations

A. Branch name does not follow project naming conventions

Branch fix/issue-10813-strategize-decision-persistence does not match the required format bugfix/mN-<description>. Issue #10813 is in milestone v3.2.0 (M3), so the expected branch name would be bugfix/m3-strategize-decision-persistence. Additionally, the issues own ## Metadata section specifies Branch: test/restore-e2e-tests, which conflicts. This is a process concern that should be resolved.

B. PR missing milestone assignment

The PR has no milestone set, but the linked issue #10813 is in v3.2.0. Per CONTRIBUTING.md, the PR must be assigned to the same milestone as the linked issue.

The PR body ends with ISSUES CLOSES: #10813 — this should be ISSUES CLOSED: #10813 (missing "D"). While this does not affect Forgejos auto-close parsing (which uses Closes #N or Fixes #N in the body), the commit footer uses the correct form.

D. Coverage skipped — gate not actually verified

Because unit_tests fails, the coverage CI job is skipped. Once tests are fixed, verify that coverage remains ≥ 97%.


Checklist Results

Category Result Notes
1. Correctness ⚠️ Partial Intent is correct; semantic bugs in parent_id + question/option undermine correctness
2. Spec Alignment Pass Decision hook approach aligns with spec design
3. Test Quality Fail Multiple missing step definitions; mock in wrong location; {decision_count} / {num_steps} literals
4. Type Safety Pass All annotations present; no # type: ignore added
5. Readability Pass Code is clear and well-commented
6. Performance Pass No N+1 or scalability concerns
7. Security Pass No secrets, no unsafe patterns
8. Code Style Fail Unused # noqa: BLE001 triggers RUF100; mock in steps file
9. Documentation Pass Docstrings updated; CHANGELOG updated
10. Commit & PR Quality ⚠️ Partial Commit message good; PR missing milestone; typo in footer; branch name non-conforming


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Code Review — PR #11138 **Review type:** First review **CI status:** ❌ FAILING — `lint` and `unit_tests` are failing; `coverage` was skipped as a downstream consequence of `unit_tests` failure. --- ### Summary This PR wires `StrategizeDecisionHook` into `PlanExecutor.run_strategize()` to fix decision persistence for Issue #10813. The approach and intent are correct, but there are several blocking defects across lint, test correctness, and semantic correctness that must be fixed before this can be merged. --- ### CI Failures (Blocking) | Check | Status | |---|---| | `lint` | ❌ FAILING | | `unit_tests` | ❌ FAILING | | `coverage` | ⏭️ SKIPPED (depends on unit_tests) | | `typecheck` | ✅ passing | | `security` | ✅ passing | | `integration_tests` | ✅ passing | | `e2e_tests` | ✅ passing | --- ### Blocking Issues Found #### 1. Lint failure — unused `noqa: BLE001` suppression (RUF100) `plan_executor.py` line 816: ```python except Exception: # noqa: BLE001 ``` The ruff `select` list in `pyproject.toml` is `["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]`. `BLE` is **not** in the selected rule set, so `BLE001` is never checked. This makes `# noqa: BLE001` an **unused noqa directive**, which ruff flags with `RUF100` (unused-noqa). **Fix:** Remove the `# noqa: BLE001` comment entirely. The broad `except Exception:` will not trigger any active ruff rule once the comment is removed. #### 2. Unit tests failing — multiple missing Behave step definitions The feature file `features/plan_executor_decision_persistence.feature` references step texts that have **no matching `@given`/`@then` decorator** in `features/steps/plan_executor_decision_persistence_steps.py` or any other step file. Behave will raise `StepNotImplementedError` for each: - `Given a dp plan in Strategize-Queued state with definition "Build feature\nAdd tests"` — used in Background and two scenarios, **never defined**. - `Given a dp plan in Strategize-Queued state with definition "Do work"` — used in two scenarios, **never defined**. - `Given a dp plan with no steps ("") in Strategize-Queued state` — used in one scenario, **never defined**. - `Given the dp mock lifecycle service` (with the article **"the"**, not "a") — used in Scenario 5, **never defined** (only `"a dp mock lifecycle service"` is defined). - `Then the dp JSON fallback in error_details should **still** be populated` — used in Scenario 4 (graceful degradation), **never defined** (only the variant without "still" is defined). These missing step definitions mean every scenario that uses them will fail with a step-not-implemented error at runtime. These must all be added to the step file before the unit tests can pass. #### 3. Semantic bug — `question` and `chosen_option` both set to `decision.step_text` In `plan_executor.py` at the new persistence loop: ```python self._decision_hook.record_strategy_choice( question=decision.step_text, chosen_option=decision.step_text, # <-- identical to question confidence_score=None, rationale="", ) ``` `record_strategy_choice()` raises `ValidationError` if `question` or `chosen_option` is empty or whitespace. But more critically, using the same value for both `question` and `chosen_option` is semantically incorrect — a question and a chosen answer are distinct concepts. The `StrategyDecision.step_text` is the step description, which should map to `chosen_option` ("what was decided"), while `question` should capture what strategic question drove this decision. Passing `step_text` as both loses all decision provenance. At minimum, use a sensible placeholder for `question` (e.g., `f"Strategy decision for step: {decision.step_text}"`) or derive both from richer decision metadata if available. #### 4. Semantic bug — tree structure lost; all decisions recorded with `parent_decision_id=None` The `StrategizeDecisionHook` is constructed **once** with `parent_decision_id=None` (the default), and then used for **every** decision in the loop. The hook stores `self.parent_decision_id` and passes it unchanged to `record_decision()` for every call. This means **all decisions are recorded as root-level nodes** (no parent), regardless of the actual `StrategyDecision.parent_id` from the `StrategizeResult`. The actual parent structure produced by `StrategyActor` (stored in `StrategyDecision.parent_id`) is **never consulted**. The Scenario `"persisted decisions form a valid tree with correct parent-child relationships"` will consequently fail because non-root decisions will have `parent_decision_id=None` instead of the root decisions ID. **Fix:** Either (a) create a new hook instance per decision using the correct `parent_id` from `StrategyDecision.parent_id`, or (b) modify the hooks `record_strategy_choice` method to accept an optional `parent_decision_id` override per call. #### 5. Mock class defined inside step file — must be in `features/mocks/` Per CONTRIBUTING.md: *"All mocks, fakes, stubs, test doubles → `features/mocks/` exclusively. Never in steps files, never in src/."* `FailingDecisionSvc` (a subclass of `DecisionService`) is defined **twice** inside `features/steps/plan_executor_decision_persistence_steps.py` (lines 144 and 339). This violates the mock placement rule. Move it to `features/mocks/` as a named class and import it from there. --- ### Non-Blocking Observations #### A. Branch name does not follow project naming conventions Branch `fix/issue-10813-strategize-decision-persistence` does not match the required format `bugfix/mN-<description>`. Issue #10813 is in milestone `v3.2.0` (M3), so the expected branch name would be `bugfix/m3-strategize-decision-persistence`. Additionally, the issues own `## Metadata` section specifies `Branch: test/restore-e2e-tests`, which conflicts. This is a process concern that should be resolved. #### B. PR missing milestone assignment The PR has no milestone set, but the linked issue #10813 is in `v3.2.0`. Per CONTRIBUTING.md, the PR must be assigned to the same milestone as the linked issue. #### C. PR body has a typo in the footer keyword The PR body ends with `ISSUES CLOSES: #10813` — this should be `ISSUES CLOSED: #10813` (missing "D"). While this does not affect Forgejos auto-close parsing (which uses `Closes #N` or `Fixes #N` in the body), the commit footer uses the correct form. #### D. Coverage skipped — gate not actually verified Because `unit_tests` fails, the `coverage` CI job is skipped. Once tests are fixed, verify that coverage remains ≥ 97%. --- ### Checklist Results | Category | Result | Notes | |---|---|---| | 1. Correctness | ⚠️ Partial | Intent is correct; semantic bugs in parent_id + question/option undermine correctness | | 2. Spec Alignment | ✅ Pass | Decision hook approach aligns with spec design | | 3. Test Quality | ❌ Fail | Multiple missing step definitions; mock in wrong location; `{decision_count}` / `{num_steps}` literals | | 4. Type Safety | ✅ Pass | All annotations present; no `# type: ignore` added | | 5. Readability | ✅ Pass | Code is clear and well-commented | | 6. Performance | ✅ Pass | No N+1 or scalability concerns | | 7. Security | ✅ Pass | No secrets, no unsafe patterns | | 8. Code Style | ❌ Fail | Unused `# noqa: BLE001` triggers RUF100; mock in steps file | | 9. Documentation | ✅ Pass | Docstrings updated; CHANGELOG updated | | 10. Commit & PR Quality | ⚠️ Partial | Commit message good; PR missing milestone; typo in footer; branch name non-conforming | --- --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +5,4 @@
Background:
Given a dp mock lifecycle service
And a dp in-memory decision service
Owner

BLOCKING — Multiple missing Behave step definitions (unit_tests CI failure)

The following step texts used in this feature file have no matching @given / @then decorator in features/steps/plan_executor_decision_persistence_steps.py or any other step file. Behave will raise StepNotImplementedError for each, causing the unit_tests CI job to fail:

  1. Given a dp plan in Strategize-Queued state with definition "Build feature\nAdd tests" — used in the Background (line 9) and in 2 explicit scenarios, never defined.
  2. Given a dp plan in Strategize-Queued state with definition "Do work" — used in 2 scenarios, never defined.
  3. Given a dp plan with no steps ("") in Strategize-Queued state — used in Scenario 2, never defined.
  4. Given the dp mock lifecycle service (article "the", not "a") — used in Scenario 5 (line 52), never defined (only "a dp mock lifecycle service" exists).
  5. Then the dp JSON fallback in error_details should **still** be populated — used in Scenario 4 (graceful degradation), never defined (only the variant without "still" is defined).

Fix: Add @given / @then decorators for all five missing step texts in features/steps/plan_executor_decision_persistence_steps.py.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Multiple missing Behave step definitions (unit_tests CI failure)** The following step texts used in this feature file have **no matching `@given` / `@then` decorator** in `features/steps/plan_executor_decision_persistence_steps.py` or any other step file. Behave will raise `StepNotImplementedError` for each, causing the `unit_tests` CI job to fail: 1. `Given a dp plan in Strategize-Queued state with definition "Build feature\nAdd tests"` — used in the Background (line 9) and in 2 explicit scenarios, **never defined**. 2. `Given a dp plan in Strategize-Queued state with definition "Do work"` — used in 2 scenarios, **never defined**. 3. `Given a dp plan with no steps ("") in Strategize-Queued state` — used in Scenario 2, **never defined**. 4. `Given the dp mock lifecycle service` (article **"the"**, not "a") — used in Scenario 5 (line 52), **never defined** (only `"a dp mock lifecycle service"` exists). 5. `Then the dp JSON fallback in error_details should **still** be populated` — used in Scenario 4 (graceful degradation), **never defined** (only the variant without "still" is defined). **Fix:** Add `@given` / `@then` decorators for all five missing step texts in `features/steps/plan_executor_decision_persistence_steps.py`. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +16,4 @@
Given the dp plan executor has a decision hook for the plan
When I dp call run_strategize with decision hook
Then all dp strategize decisions should be persisted to the decision service
And the dp decision service tree for the plan should have {decision_count} nodes
Owner

BLOCKING — {decision_count} is a literal string, not a Scenario Outline parameter

And the dp decision service tree for the plan should have {decision_count} nodes

This step is inside a plain Scenario (not a Scenario Outline with an Examples: table). Behave will pass the literal string "{decision_count}" to the step function:

@then("the dp decision service tree for the plan should have {decision_count} nodes")
def step_then_dp_tree_has_nodes(context, decision_count: str) -> None:
    assert len(tree) == int(decision_count)  # int("{decision_count}") → ValueError!

This will raise a ValueError at runtime when int("{decision_count}") is called. The same problem exists at line 73 with {num_steps}.

Fix: Either:

  • Convert to a Scenario Outline with an Examples: table supplying concrete values, or
  • Replace {decision_count} with a hard-coded integer literal (e.g., 2 based on the 2 decisions the test injects).

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — `{decision_count}` is a literal string, not a Scenario Outline parameter** ```gherkin And the dp decision service tree for the plan should have {decision_count} nodes ``` This step is inside a plain `Scenario` (not a `Scenario Outline` with an `Examples:` table). Behave will pass the literal string `"{decision_count}"` to the step function: ```python @then("the dp decision service tree for the plan should have {decision_count} nodes") def step_then_dp_tree_has_nodes(context, decision_count: str) -> None: assert len(tree) == int(decision_count) # int("{decision_count}") → ValueError! ``` This will raise a `ValueError` at runtime when `int("{decision_count}")` is called. The same problem exists at line 73 with `{num_steps}`. **Fix:** Either: - Convert to a `Scenario Outline` with an `Examples:` table supplying concrete values, **or** - Replace `{decision_count}` with a hard-coded integer literal (e.g., `2` based on the 2 decisions the test injects). --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +50,4 @@
state: ProcessingState = ProcessingState.QUEUED,
definition_of_done: str | None = "Build feature\nAdd tests",
decision_root_id: str | None = None,
invariants: list[PlanInvariant] | None = None,
Owner

BLOCKING — Mock class (FailingDecisionSvc) must be in features/mocks/, not in the step file

Per CONTRIBUTING.md: "All mocks, fakes, stubs, and test doubles must go in features/mocks/ exclusively. Never in step files, never in src/."

FailingDecisionSvc is a subclass of DecisionService acting as a test double. It is defined inline in this step file — which violates the projects mock placement rule. It is also defined twice (here at line ~144 and again at ~339), creating redundancy.

Fix:

  1. Create features/mocks/failing_decision_service.py containing FailingDecisionSvc.
  2. Import it from there in the step file:
    from features.mocks.failing_decision_service import FailingDecisionSvc
    
  3. Remove both inline class definitions from this step file.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Mock class (`FailingDecisionSvc`) must be in `features/mocks/`, not in the step file** Per CONTRIBUTING.md: *"All mocks, fakes, stubs, and test doubles must go in `features/mocks/` exclusively. Never in step files, never in `src/`.*" `FailingDecisionSvc` is a subclass of `DecisionService` acting as a test double. It is defined inline in this step file — which violates the projects mock placement rule. It is also defined **twice** (here at line ~144 and again at ~339), creating redundancy. **Fix:** 1. Create `features/mocks/failing_decision_service.py` containing `FailingDecisionSvc`. 2. Import it from there in the step file: ```python from features.mocks.failing_decision_service import FailingDecisionSvc ``` 3. Remove both inline class definitions from this step file. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

BLOCKING — Semantic bug: question and chosen_option set to the same value

self._decision_hook.record_strategy_choice(
    question=decision.step_text,
    chosen_option=decision.step_text,  # identical to question!
    confidence_score=None,
    rationale="",
)

Passing the same value for both question and chosen_option is semantically incorrect. question should represent the strategic question being answered, while chosen_option should represent the decision that was made. Using step_text for both collapses these two distinct concepts into one and loses all decision provenance information.

Additionally, record_strategy_choice() raises ValidationError if question or chosen_option is empty or whitespace. This means any StrategyDecision with an empty step_text will silently fail (the exception is swallowed by the except Exception: below), resulting in that decision not being persisted with no indication to the caller.

Fix: Use a meaningful question string separate from the chosen option:

self._decision_hook.record_strategy_choice(
    question=f"Strategy step {decision.sequence}: what approach to take?",
    chosen_option=decision.step_text,
    confidence_score=None,
    rationale="",
)

And add a guard for empty step_text before calling the hook to avoid silent failures.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Semantic bug: `question` and `chosen_option` set to the same value** ```python self._decision_hook.record_strategy_choice( question=decision.step_text, chosen_option=decision.step_text, # identical to question! confidence_score=None, rationale="", ) ``` Passing the same value for both `question` and `chosen_option` is semantically incorrect. `question` should represent the strategic question being answered, while `chosen_option` should represent the decision that was made. Using `step_text` for both collapses these two distinct concepts into one and loses all decision provenance information. Additionally, `record_strategy_choice()` raises `ValidationError` if `question` or `chosen_option` is empty or whitespace. This means any `StrategyDecision` with an empty `step_text` will silently fail (the exception is swallowed by the `except Exception:` below), resulting in that decision not being persisted with no indication to the caller. **Fix:** Use a meaningful question string separate from the chosen option: ```python self._decision_hook.record_strategy_choice( question=f"Strategy step {decision.sequence}: what approach to take?", chosen_option=decision.step_text, confidence_score=None, rationale="", ) ``` And add a guard for empty `step_text` before calling the hook to avoid silent failures. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

BLOCKING — Semantic bug: tree structure is lost; all decisions recorded with parent_decision_id=None

The StrategizeDecisionHook is constructed once with parent_decision_id=None (the default) and then reused for every decision in the loop. The hook stores a single self.parent_decision_id and passes it unchanged to record_decision() on every call. As a result, all decisions are recorded as root-level nodes — no parent-child tree structure is preserved.

The actual parent relationship is available in StrategyDecision.parent_id (from the StrategizeResult) but is never passed to the hook:

# Current: ignores decision.parent_id completely
self._decision_hook.record_strategy_choice(
    question=decision.step_text,
    chosen_option=decision.step_text,
    ...
)

# Fix option A: create a per-decision hook with the correct parent_id
per_decision_hook = StrategizeDecisionHook(
    decision_service=self._decision_hook.decision_service,
    plan_id=self._decision_hook.plan_id,
    parent_decision_id=decision.parent_id,
)
per_decision_hook.record_strategy_choice(...)

# Fix option B: add a parent_decision_id override parameter to record_strategy_choice()

The Scenario "persisted decisions form a valid tree with correct parent-child relationships" will fail exactly because of this bug — non-root decisions will show parent_decision_id=None instead of the root decisions ID.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Semantic bug: tree structure is lost; all decisions recorded with `parent_decision_id=None`** The `StrategizeDecisionHook` is constructed once with `parent_decision_id=None` (the default) and then reused for every decision in the loop. The hook stores a single `self.parent_decision_id` and passes it unchanged to `record_decision()` on every call. As a result, **all decisions are recorded as root-level nodes** — no parent-child tree structure is preserved. The actual parent relationship is available in `StrategyDecision.parent_id` (from the `StrategizeResult`) but is **never passed** to the hook: ```python # Current: ignores decision.parent_id completely self._decision_hook.record_strategy_choice( question=decision.step_text, chosen_option=decision.step_text, ... ) # Fix option A: create a per-decision hook with the correct parent_id per_decision_hook = StrategizeDecisionHook( decision_service=self._decision_hook.decision_service, plan_id=self._decision_hook.plan_id, parent_decision_id=decision.parent_id, ) per_decision_hook.record_strategy_choice(...) # Fix option B: add a parent_decision_id override parameter to record_strategy_choice() ``` The Scenario `"persisted decisions form a valid tree with correct parent-child relationships"` will fail exactly because of this bug — non-root decisions will show `parent_decision_id=None` instead of the root decisions ID. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

BLOCKING — Lint failure (RUF100): Unused noqa directive

The # noqa: BLE001 comment on this line suppresses ruff rule BLE001 (blind exception catch). However, BLE is not in the ruff select list defined in pyproject.toml (["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]), so BLE001 is never checked. This makes the noqa comment unused, which triggers RUF100 (unused-noqa) — the actual cause of the lint CI failure.

Fix: Remove the # noqa: BLE001 comment. The broad except Exception: bare clause will not trigger any active ruff rule once the comment is gone:

# Before (failing):
except Exception:  # noqa: BLE001

# After (correct):
except Exception:

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKING — Lint failure (RUF100): Unused `noqa` directive** The `# noqa: BLE001` comment on this line suppresses ruff rule BLE001 (blind exception catch). However, `BLE` is **not in the ruff `select` list** defined in `pyproject.toml` (`["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]`), so BLE001 is never checked. This makes the noqa comment unused, which triggers **RUF100** (unused-noqa) — the actual cause of the `lint` CI failure. **Fix:** Remove the `# noqa: BLE001` comment. The broad `except Exception:` bare clause will not trigger any active ruff rule once the comment is gone: ```python # Before (failing): except Exception: # noqa: BLE001 # After (correct): except Exception: ``` --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
fix: persist strategy decisions via DecisionService during strategize (#10813
Some checks failed
CI / helm (pull_request) Successful in 1m5s
CI / lint (pull_request) Failing after 1m35s
CI / build (pull_request) Successful in 1m31s
CI / push-validation (pull_request) Successful in 1m40s
CI / quality (pull_request) Successful in 2m2s
CI / security (pull_request) Successful in 2m7s
CI / tdd_quality_gate (pull_request) Successful in 2m8s
CI / typecheck (pull_request) Successful in 2m16s
CI / unit_tests (pull_request) Failing after 3m7s
CI / coverage (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Failing after 5m0s
CI / integration_tests (pull_request) Successful in 5m23s
CI / status-check (pull_request) Failing after 3s
8368681c5b
Fix bugs in the StrategizeDecisionHook wiring that prevented proper
decision tree persistence:

- Add parent_decision_id parameter to all four StrategizeDecisionHook
  methods (record_strategy_choice, record_resource_selection,
  record_subplan_spawn, record_invariant_enforced) so each call can
  specify its own tree position. Per-call value overrides the instance
  default for flexible tree construction.

- In PlanExecutor.run_strategize(), pass decision.parent_id from
  StrategizeResult so persisted decisions form a correct parent-child
  hierarchy instead of all having no parent (broken tree).

- Change chosen_option from step_text duplication to "Yes" which is
  semantically correct for stub decisions.

- Add missing step definitions matching Gherkin feature scenarios:
  dp_plan_with_definition, dp_empty_plan, dp_json_still_populated,
  and aliases for common plan definitions.

- Fix feature file template parameters ({decision_count}, {num_steps})
  that were unresolved - replaced with concrete values (2).

All existing behavior preserved as backward-compatible defaults.
Some checks failed
CI / helm (pull_request) Successful in 1m5s
CI / lint (pull_request) Failing after 1m35s
Required
Details
CI / build (pull_request) Successful in 1m31s
Required
Details
CI / push-validation (pull_request) Successful in 1m40s
CI / quality (pull_request) Successful in 2m2s
Required
Details
CI / security (pull_request) Successful in 2m7s
Required
Details
CI / tdd_quality_gate (pull_request) Successful in 2m8s
CI / typecheck (pull_request) Successful in 2m16s
Required
Details
CI / unit_tests (pull_request) Failing after 3m7s
Required
Details
CI / coverage (pull_request) Has been skipped
Required
Details
CI / docker (pull_request) Has been skipped
Required
Details
CI / e2e_tests (pull_request) Failing after 5m0s
CI / integration_tests (pull_request) Successful in 5m23s
Required
Details
CI / status-check (pull_request) Failing after 3s
This pull request has changes conflicting with the target branch.
  • CONTRIBUTORS.md
  • src/cleveragents/application/services/plan_executor.py
View command line instructions

Manual merge helper

Use this merge commit message when completing the merge manually.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/issue-10813-strategize-decision-persistence:fix/issue-10813-strategize-decision-persistence
git switch fix/issue-10813-strategize-decision-persistence
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!11138
No description provided.