cleveragents/cleveragents-core

Fork 3

BUG-HUNT: [consistency] Redundant plan fetching in `try_auto_run` #1285

New issue

Open

opened 2026-04-02 08:48:50 +00:00 by freemo · 1 comment

freemo commented

2026-04-02 08:48:50 +00:00

Owner

Bug Report: [consistency] — Redundant plan fetching in `try_auto_run`

Severity Assessment

Impact: Low. The code is inefficient and harder to reason about. It could potentially mask bugs related to in-memory state not being persisted correctly, as it immediately re-fetches the state from the source of truth.
Likelihood: High. This code path is executed every time try_auto_run is called.
Priority: Low

Location

File: src/cleveragents/application/services/plan_lifecycle_service.py
Function/Class: PlanLifecycleService.try_auto_run
Lines: 1815-1816 and 1832-1833

Description

In the try_auto_run method, after calling a phase completion method (e.g., self.complete_strategize(plan_id)), the returned plan object is immediately discarded and re-fetched from the database/cache with plan = self.get_plan(plan_id).

This is redundant. The phase completion methods already return the updated plan object. The immediate re-fetch is unnecessary and makes the data flow confusing. If the completion method were to perform an in-memory modification without committing, this pattern would hide that bug by overwriting the modified object with a fresh copy.

Evidence

# Relevant code snippet showing the issue
# In try_auto_run:
# ...
            # complete_strategize() calls auto_progress() which may
            # transition the plan to Execute/QUEUED.
            plan = self.complete_strategize(plan_id) # plan object is returned...
            plan = self.get_plan(plan_id) # ...and then immediately overwritten

# ...

            # complete_execute() calls auto_progress() which may
            # transition the plan to Apply/QUEUED.
            plan = self.complete_execute(plan_id) # plan object is returned...
            plan = self.get_plan(plan_id) # ...and then immediately overwritten

Expected Behavior

The code should use the plan object returned by the phase completion methods directly. The redundant get_plan calls should be removed.

Actual Behavior

The code performs unnecessary database/cache lookups, making it less efficient and harder to understand.

Suggested Fix

Remove the redundant get_plan calls:

# In try_auto_run:
# ...
            plan = self.complete_strategize(plan_id)
            # plan = self.get_plan(plan_id) # REMOVE THIS LINE

# ...

            plan = self.complete_execute(plan_id)
            # plan = self.get_plan(plan_id) # REMOVE THIS LINE

Metadata

Branch: fix/consistency-try-auto-run-redundant-plan-fetch
Commit Message: fix(consistency): remove redundant get_plan calls after phase completion in try_auto_run
Milestone: v3.3.0
Parent Epic: #362

Subtasks

Write a TDD issue-capture Behave scenario tagged @tdd_issue, @tdd_expected_fail that asserts the plan object returned by complete_strategize is used directly (i.e., get_plan is not called immediately after)
Write a TDD issue-capture Behave scenario tagged @tdd_issue, @tdd_expected_fail that asserts the plan object returned by complete_execute is used directly (i.e., get_plan is not called immediately after)
Remove the redundant plan = self.get_plan(plan_id) line at line 1816 in plan_lifecycle_service.py
Remove the redundant plan = self.get_plan(plan_id) line at line 1833 in plan_lifecycle_service.py
Remove the @tdd_expected_fail tags from both TDD scenarios once the fix is in place and verify they pass
Run nox -e typecheck to confirm no type regressions
Run nox -e unit_tests to confirm all Behave tests pass
Run nox -e coverage_report to confirm coverage remains ≥ 97%
Run nox (all default sessions) and fix any errors

Definition of Done

Both @tdd_expected_fail TDD capture scenarios are committed first and demonstrate the redundant get_plan calls
The redundant plan = self.get_plan(plan_id) call after complete_strategize is removed
The redundant plan = self.get_plan(plan_id) call after complete_execute is removed
Both TDD scenarios pass without @tdd_expected_fail after the fix
No regressions in existing try_auto_run or plan lifecycle tests
All nox stages pass (nox -e lint, nox -e typecheck, nox -e unit_tests, nox -e integration_tests, nox -e coverage_report)
Coverage >= 97%
A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

## Bug Report: [consistency] — Redundant plan fetching in `try_auto_run` ### Severity Assessment - **Impact**: Low. The code is inefficient and harder to reason about. It could potentially mask bugs related to in-memory state not being persisted correctly, as it immediately re-fetches the state from the source of truth. - **Likelihood**: High. This code path is executed every time `try_auto_run` is called. - **Priority**: Low ### Location - **File**: `src/cleveragents/application/services/plan_lifecycle_service.py` - **Function/Class**: `PlanLifecycleService.try_auto_run` - **Lines**: 1815-1816 and 1832-1833 ### Description In the `try_auto_run` method, after calling a phase completion method (e.g., `self.complete_strategize(plan_id)`), the returned `plan` object is immediately discarded and re-fetched from the database/cache with `plan = self.get_plan(plan_id)`. This is redundant. The phase completion methods already return the updated plan object. The immediate re-fetch is unnecessary and makes the data flow confusing. If the completion method were to perform an in-memory modification without committing, this pattern would hide that bug by overwriting the modified object with a fresh copy. ### Evidence ```python # Relevant code snippet showing the issue # In try_auto_run: # ... # complete_strategize() calls auto_progress() which may # transition the plan to Execute/QUEUED. plan = self.complete_strategize(plan_id) # plan object is returned... plan = self.get_plan(plan_id) # ...and then immediately overwritten # ... # complete_execute() calls auto_progress() which may # transition the plan to Apply/QUEUED. plan = self.complete_execute(plan_id) # plan object is returned... plan = self.get_plan(plan_id) # ...and then immediately overwritten ``` ### Expected Behavior The code should use the `plan` object returned by the phase completion methods directly. The redundant `get_plan` calls should be removed. ### Actual Behavior The code performs unnecessary database/cache lookups, making it less efficient and harder to understand. ### Suggested Fix Remove the redundant `get_plan` calls: ```python # In try_auto_run: # ... plan = self.complete_strategize(plan_id) # plan = self.get_plan(plan_id) # REMOVE THIS LINE # ... plan = self.complete_execute(plan_id) # plan = self.get_plan(plan_id) # REMOVE THIS LINE ``` ### Category consistency --- ## Metadata - **Branch**: `fix/consistency-try-auto-run-redundant-plan-fetch` - **Commit Message**: `fix(consistency): remove redundant get_plan calls after phase completion in try_auto_run` - **Milestone**: v3.3.0 - **Parent Epic**: #362 ## Subtasks - [ ] Write a TDD issue-capture Behave scenario tagged `@tdd_issue`, `@tdd_expected_fail` that asserts the `plan` object returned by `complete_strategize` is used directly (i.e., `get_plan` is not called immediately after) - [ ] Write a TDD issue-capture Behave scenario tagged `@tdd_issue`, `@tdd_expected_fail` that asserts the `plan` object returned by `complete_execute` is used directly (i.e., `get_plan` is not called immediately after) - [ ] Remove the redundant `plan = self.get_plan(plan_id)` line at line 1816 in `plan_lifecycle_service.py` - [ ] Remove the redundant `plan = self.get_plan(plan_id)` line at line 1833 in `plan_lifecycle_service.py` - [ ] Remove the `@tdd_expected_fail` tags from both TDD scenarios once the fix is in place and verify they pass - [ ] Run `nox -e typecheck` to confirm no type regressions - [ ] Run `nox -e unit_tests` to confirm all Behave tests pass - [ ] Run `nox -e coverage_report` to confirm coverage remains ≥ 97% - [ ] Run `nox` (all default sessions) and fix any errors ## Definition of Done - [ ] Both `@tdd_expected_fail` TDD capture scenarios are committed first and demonstrate the redundant `get_plan` calls - [ ] The redundant `plan = self.get_plan(plan_id)` call after `complete_strategize` is removed - [ ] The redundant `plan = self.get_plan(plan_id)` call after `complete_execute` is removed - [ ] Both TDD scenarios pass without `@tdd_expected_fail` after the fix - [ ] No regressions in existing `try_auto_run` or plan lifecycle tests - [ ] All nox stages pass (`nox -e lint`, `nox -e typecheck`, `nox -e unit_tests`, `nox -e integration_tests`, `nox -e coverage_report`) - [ ] Coverage >= 97% - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.

freemo added the

labels

2026-04-02 08:50:42 +00:00

freemo added this to the v3.3.0 milestone

2026-04-02 08:50:52 +00:00

freemo commented

2026-04-02 08:50:58 +00:00

Author

Owner

Dependency Note: This issue is a child of Epic #362 (Security & Safety Hardening). Per the project's ticket hierarchy, this issue blocks #362 — the Epic cannot be considered complete until this child issue is resolved and merged.

**Dependency Note:** This issue is a child of Epic #362 (Security & Safety Hardening). Per the project's ticket hierarchy, this issue **blocks** #362 — the Epic cannot be considered complete until this child issue is resolved and merged.

freemo self-assigned this

2026-04-02 18:45:26 +00:00

freemo removed the