BUG-HUNT: [concurrency] cost_budget_service.py check_budget_hierarchy + record_plan_cost TOCTOU race allows budget overrun #7493

Open
opened 2026-04-10 20:49:19 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: Concurrency — TOCTOU Race in Budget Check + Record Allows Budget Overrun

Severity Assessment

  • Impact: Cost budget exceeded — multiple concurrent requests can each pass the budget check, causing total spend to exceed the configured limit
  • Likelihood: High — any concurrent multi-request scenario
  • Priority: Critical

Location

  • File: src/cleveragents/application/services/cost_budget_service.py
  • Functions: check_budget_hierarchy, record_plan_cost
  • Lines: 158–215 and 219–244
  • Category: concurrency

Description

check_budget_hierarchy and record_plan_cost are separate methods with separate lock acquisitions. Between the moment Thread A's check returns allowed=True and the moment it calls record_plan_cost, Thread B can also pass the same check against the same (not-yet-updated) balance. Both threads then record their costs, causing the total to exceed the configured budget.

Evidence

# These two method calls happen separately, with a window between them:
result = service.check_budget_hierarchy(cost=30)  # allowed=True (budget: $100, spent: $60)
# ← WINDOW: Thread B also calls check_budget_hierarchy(30) → also allowed=True
service.record_plan_cost(30)                       # spent=90
# Thread B calls record_plan_cost(30)              # spent=120 → BUDGET EXCEEDED

Each method acquires self._lock internally, but the check and record are separate lock acquisitions, allowing the window between them.

Expected Behavior

Check and record should be atomic — no other thread should be able to change the budget state between them.

Actual Behavior

Concurrent calls can both pass the check and both record, causing budget overruns.

Suggested Fix

Add an atomic check_and_record method:

def check_and_record_cost(self, plan_id: str, cost: float) -> BudgetCheckResult:
    with self._lock:
        result = self._check_budget_hierarchy_locked(plan_id, cost)
        if result.allowed:
            self._record_plan_cost_locked(plan_id, cost)
        return result

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: Concurrency — TOCTOU Race in Budget Check + Record Allows Budget Overrun ### Severity Assessment - **Impact**: Cost budget exceeded — multiple concurrent requests can each pass the budget check, causing total spend to exceed the configured limit - **Likelihood**: High — any concurrent multi-request scenario - **Priority**: Critical ### Location - **File**: `src/cleveragents/application/services/cost_budget_service.py` - **Functions**: `check_budget_hierarchy`, `record_plan_cost` - **Lines**: 158–215 and 219–244 - **Category**: concurrency ### Description `check_budget_hierarchy` and `record_plan_cost` are separate methods with separate lock acquisitions. Between the moment Thread A's check returns `allowed=True` and the moment it calls `record_plan_cost`, Thread B can also pass the same check against the same (not-yet-updated) balance. Both threads then record their costs, causing the total to exceed the configured budget. ### Evidence ```python # These two method calls happen separately, with a window between them: result = service.check_budget_hierarchy(cost=30) # allowed=True (budget: $100, spent: $60) # ← WINDOW: Thread B also calls check_budget_hierarchy(30) → also allowed=True service.record_plan_cost(30) # spent=90 # Thread B calls record_plan_cost(30) # spent=120 → BUDGET EXCEEDED ``` Each method acquires `self._lock` internally, but the check and record are separate lock acquisitions, allowing the window between them. ### Expected Behavior Check and record should be atomic — no other thread should be able to change the budget state between them. ### Actual Behavior Concurrent calls can both pass the check and both record, causing budget overruns. ### Suggested Fix Add an atomic `check_and_record` method: ```python def check_and_record_cost(self, plan_id: str, cost: float) -> BudgetCheckResult: with self._lock: result = self._check_budget_hierarchy_locked(plan_id, cost) if result.allowed: self._record_plan_cost_locked(plan_id, cost) return result ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 21:39:19 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High — Concurrency/data integrity bug in autonomy hardening components that impacts M6 milestone functionality
  • Milestone: v3.5.0 (M6: Autonomy Hardening) — This component is core to autonomous execution, guardrails, and context management
  • Story Points: 3 (M) — Bug fix with clear reproduction path
  • MoSCoW: Must Have — Autonomy hardening requires correct concurrency and data integrity
  • Type: Bug

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified - **Priority**: High — Concurrency/data integrity bug in autonomy hardening components that impacts M6 milestone functionality - **Milestone**: v3.5.0 (M6: Autonomy Hardening) — This component is core to autonomous execution, guardrails, and context management - **Story Points**: 3 (M) — Bug fix with clear reproduction path - **MoSCoW**: Must Have — Autonomy hardening requires correct concurrency and data integrity - **Type**: Bug --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7493
No description provided.