feat(governance): implement cost and session budget tracking and enforcement #10586
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
overdue
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!10586
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/v360/cost-session-budget"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Epic Reference
This PR is part of Epic #8569 - Cost Budgets, Safety Profiles & Plugin Architecture (v3.6.0).
[Body updated - epic reference added]
Closes #8569
Implementation Attempt — Tier 1: haiku — In Progress
Fixed the following issues in PR #10586:
Lint Issues Fixed:
OrgCostAccumulatorfrom cost_tracking.pyandTest Issues Fixed:
Quality Gates Status:
Changes committed and force-pushed to pr-10586.
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
PR Review — #10586: feat(governance): implement cost and session budget tracking and enforcement
CI GATE — BLOCKING
This PR has 3 failing CI checks that must be resolved before this PR can be approved:
Per company policy, all CI gates (lint, typecheck, security, unit_tests, coverage) must pass before a PR can be approved and merged. Please fix the root causes and push corrections.
PR QUALITY — BLOCKING
CODE ISSUES — REQUEST_CHANGES
1. cost_tracking.py:
enforce_budget()— Incorrect budget_type labels (Bug)In
enforce_budget(), when the session budget is exceeded,budget_typeis set to"plan":But it is set to
"daily"when the org budget is exceeded:The issue #8609 describes a three-tier hierarchy: plan, session, organization. But this code only distinguishes "plan" (session) and "daily" (org). This is inconsistent with the domain model. Either:
"session"and"org"to match the acceptance criteria, orPlease fix the budget_type values to match the actual tier they represent.
2. cost_tracking.py:
check_budget()— Overly complex conditional (Readability)The warning flag computation is needlessly nested:
This is 3 levels of nesting. Please simplify into a clear if/else block.
3. cost_tracking.py:
check_budget()— Redundant utilization calculationThe method calls
utilization()up to 3 times on the same object (once for exceeded check, once for warning in the same tier, once at the end). Sinceutilization()may involve computation, cache the result.4. cost_tracking.py:
get_budget_status()— Unused import (Lint)ThreadSafeOrgCostAccumulatoris imported but the parameter usesThreadSafeOrgCostAccumulator | Nonewhich is fine. Howeverthreadingis imported to useself._lockin__init__but_lockis never used anywhere in the class. Either use the lock (e.g., in record_cost) or remove the import.5. cost_budget_tracking_enforcement_steps.py — Missing docstrings on step functions
All step functions need docstrings for consistency (the file already has some, but the
given("no session budget is configured")function at line ~242 is missing one).SUGGESTIONS
Magic numbers — The value
0.9for the 90% warning threshold appears as a literal. Consider extracting to a constant likeWARNING_THRESHOLD = 0.9.Thread safety claim vs reality — The class docstring states thread safety for org accumulators, but the instance
self._lockis never used in any method. If this is intended, the lock should be acquired inrecord_costandenforce_budget. If not, remove the threading import and thread safety claims.Commit message in Metadata section — Issue #8609 specifies only
featandgovernanceas separate fields, but Conventional Changelog requires a full first line. Ensurefeat(governance): ...is used as the complete commit message first line.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +239,4 @@def step_verify_exhaustion_event(context: object) -> None:"""Verify exhaustion event was recorded."""assert len(context.cost_metadata.budget_exhaustion_events) > 0event = context.cost_metadata.budget_exhaustion_events[0]Step function is missing a docstring. All step functions should have docstrings for consistency.
@ -0,0 +1,257 @@"""Cost tracking and budget enforcement service.The session budget is exceeded but budget_type is set to plan. This should be session. Similarly, org budget exceeded uses daily but should be org. Inconsistent with the three-tier hierarchy described in the issue.
@ -0,0 +16,4 @@BudgetLevel,OrgCostAccumulator,SessionCostBudget,ThreadSafeOrgCostAccumulator,self._lockis initialized but never used in any method. Either use it (acquire in record_cost/enforce_budget) or remove the threading import and lock initialization.@ -0,0 +77,4 @@reason=(f"Session budget of ${session_budget.max_cost_usd:.2f} "f"would be exceeded (current: ${session_budget.total_cost:.2f}, "f"proposed: ${proposed_cost:.2f})"Overly complex ternary conditional. Please simplify into a clear if/else block for readability.
@ -0,0 +85,4 @@# Check org budgetif org_accumulator is not None:if org_accumulator.would_exceed(proposed_cost):utilization = org_accumulator.utilization()utilization() is called up to 3 times on the same budget object without caching the result. Please cache to avoid redundant computation.
Automated by CleverAgents Bot | PR Review
Reviewed PR #10586 with
REQUEST_CHANGESstatus.View review →
REVIEW_BODY_PLACEHOLDER
BLOCKING - budget_type values do not match the BudgetLevel domain model.
PR Review - #10586
BLOCKING - Missing milestone: PR has no milestone but issue #8609 is in v3.6.0
BLOCKING - CI fails on 3 checks: lint, unit_tests, status-check
See inline comments for code issues.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
BLOCKING - Type mismatch: feature expects exceeded_level == org (String) but check_budget returns BudgetLevel.ORG (Enum). These do not compare equal in Python 3.11+. This is likely the root cause of the unit_tests CI failure.
BLOCKING - Missing docstring on step_no_session_budget at line 242. All other step functions have docstrings.
BLOCKING - budget_type values do not match the BudgetLevel domain model. Line 197 uses plan instead of BudgetLevel.SESSION. Line 208 uses daily instead of BudgetLevel.ORG. The BDD test Three-tier budget hierarchy expects exceeded_level to be ORG. Fix both to use BudgetLevel enum values.
Suggestion - Refactor the nested ternary at line 84 to a clear if/else block for readability.
Suggestion - utilization() is called up to 3 times. Cache the result in a local variable and reuse it.
BLOCKING - self._lock = threading.Lock() on line 44 is never used in any method. Either use it or remove it. This is likely the lint failure.
Suggestion - Extract the magic number 0.9 on line 85 as a module-level constant WARNING_THRESHOLD = 0.9.
Suggestion - get_budget_status() at line 222 is a public method (in all) but missing a docstring.
PR Review — #10586: feat(governance): implement cost and session budget tracking and enforcement
Request Changes Submitted — Review Details
Status: REQUEST_CHANGES
Review ID: 6854
View review on Forgejo
BLOCKING Issues
Suggestions (non-blocking)
Checklist Summary
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Implementation Attempt -- Tier -1: gpt5-mini -- Success
Fixed all issues identified in the PR review for #10586:
Code Fixes
Test Fixes
Files Changed
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor
The CI checks for this PR are currently failing (lint, unit_tests, status-check). Per company policy, all CI gates (lint, typecheck, security, unit_tests, coverage) must pass before approval. Please address the CI failures and ensure all previous review comments have been resolved.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Fix Applied
Resolved all blocking review comments from PR #10586:
self._lockfrom CostTrackingService.init()Remaining: Labels need manual application via forgejo-label-manager (API blocked by permissions).
Automated by CleverAgents Bot | Agent: task-implementor
Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
Previous Feedback — Addressed Items
The following blocking issues raised in prior reviews have been successfully resolved:
✅ Missing milestone — PR is now correctly assigned to v3.6.0.
✅ budget_type mismatch (hardcoded "plan"/"daily") — Now correctly uses
BudgetLevel.SESSION.valueandBudgetLevel.ORG.valueinenforce_budget().✅ BudgetExhaustionEvent validator — Now accepts
{"plan", "session", "org"}instead of{"plan", "daily"}.✅ Unused
threading.Lock— Removed fromCostTrackingService.__init__().✅ Magic number 0.9 — Extracted to
WARNING_THRESHOLD = 0.9module-level constant.✅ Nested ternary in
check_budget()— Replaced with clear if/else blocks.✅ CHANGELOG.md — Entry added under
[Unreleased].✅ CONTRIBUTORS.md — Updated with cost budget tracking contribution.
✅ Docstrings on step functions — Most step functions now have docstrings.
Remaining Blocking Issues
CI is still failing on 3 checks: lint, unit_tests, status-check. Two root causes have been identified:
BLOCKING 1 — Triple blank line (E303 lint failure)
In
features/steps/cost_budget_tracking_enforcement_steps.py, there are 3 blank lines between the end ofstep_record_negative_tokensand the start ofstep_configure_org_budget(lines 332–334). Python/ruff enforces a maximum of 2 blank lines between top-level definitions (E303). This is the root cause of the lint CI failure.Fix: Remove one of the three blank lines so only 2 remain between those functions.
BLOCKING 2 — Three-tier budget hierarchy step uses hardcoded
Nonefor org_accumulatorIn
step_check_budget(line 186–193), theorg_accumulatoris hardcoded asNone:The "Three-tier budget hierarchy" scenario sets
context.org_accumulatorviastep_configure_org_budget, butstep_check_budgetnever reads it. As a result, the check only evaluates the session budget ($50 limit, $40 spent, $15 proposed → $55 > $50 → fails at SESSION level), NOT the org budget. But the scenario expects the check to fail due to the organization budget.This is the root cause of the unit_tests CI failure. The step must be updated to pass
context.org_accumulatorwhen it is set:Additionally, the Three-tier scenario spends $40 at session level AND $80 at org level. The session budget is $50 so after $40 spent, the $15 proposal would exceed the session budget ($40+$15=$55 > $50) before checking the org budget. To ensure the failure is due to org (not session), the scenario should either use a session budget large enough that only the org is exceeded first, e.g., a $200 session budget with the $80 org spend and $15 proposal ($80+$15=$95 < $100 org limit, though... that wouldn't fail either). The scenario logic needs to be re-thought: with $40 session spent + $15 proposed = $55, which exceeds the $50 session limit. The test cannot reach the org budget check because the session budget is hit first. The fix is to either increase the session budget limit in this scenario or reduce the amounts so only the org limit is exceeded.
BLOCKING 3 —
BudgetExceededErrornot in__all__BudgetExceededErroris defined insrc/cleveragents/core/exceptions.pyafter the__all__list and is therefore not exported from the module. All other exception classes in this file are listed in__all__. This breaks the public API contract and is an oversight from the initial implementation.Fix: Add
"BudgetExceededError"to the__all__list inexceptions.py.Additional Non-Blocking Issues
SUGGESTION — Feature scenario text inconsistency
In
features/cost_budget_tracking_enforcement.feature, the "Enforce budget limit when exceeded" scenario reads:But the step
step_verify_error_typeassertscontext.budget_error.budget_type == BudgetLevel.SESSION.value. The feature text says "plan" but the domain concept being tested is "session". This is misleading Gherkin — the living documentation says "plan" but means "session". The feature step text should be updated to"the error should indicate session budget exceeded"(and the step decorator updated to match) for clarity.SUGGESTION —
OrgCostAccumulatorimported but only used to constructThreadSafeOrgCostAccumulatorIn
cost_budget_tracking_enforcement_steps.py,OrgCostAccumulatoris imported for use instep_configure_org_budget. This is correct usage, though it's worth noting the intent is to wrap it immediately inThreadSafeOrgCostAccumulator. No change required.CI Summary
Overall Assessment
Good progress has been made — all previously-raised blocking issues have been addressed. The core implementation logic in
cost_tracking.pyis clean, well-structured, and correctly uses theBudgetLevelenum. The exceptions, CHANGELOG, and milestone have all been properly updated. Three targeted fixes remain before this PR can be approved:step_check_budgetto usecontext.org_accumulatorand re-think the Three-tier scenario amountsBudgetExceededErrorto__all__inexceptions.pyAutomated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +20,4 @@And another LLM call costs $2.50Then the second call should raise BudgetExceededErrorAnd the error should indicate plan budget exceededSUGGESTION (non-blocking): The step text says 'the error should indicate plan budget exceeded' but the step definition asserts
budget_type == BudgetLevel.SESSION.value('session'). The Gherkin text is misleading — it says 'plan' but tests 'session'. Update the feature text to match the domain concept:(And update the
@thendecorator in the step file to match.)Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +185,4 @@@when("a check for ${cost} cost is performed")def step_check_budget(context: object, cost: str) -> None:"""Check if cost fits in budget."""BLOCKING (unit_tests CI failure): This step hardcodes
Nonefororg_accumulator, which means the 'Three-tier budget hierarchy' scenario cannot test the organization budget path — the org accumulator stored incontext.org_accumulatoris silently ignored.Additionally, with a $50 session budget, $40 already spent, and $15 proposed: the session budget check fires first ($40+$15=$55 > $50) before the org budget is ever checked. The scenario expects failure due to ORG budget, but this logic will always fail at SESSION first. The scenario amounts need adjustment so only the org limit is exceeded.
Fix:
Also update the Three-tier scenario in the feature file to use a session budget large enough that it isn't hit first (e.g.,
a session budget of $200.00andan organization budget of $100.00, with$80.00 has been spent at organization levelanda check for $25.00 cost is performed— this would make $80+$25=$105 > $100 org limit while session is unaffected).Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +329,4 @@except ValueError as e:context.error_raised = Truecontext.error = eBLOCKING (E303 lint failure): There are 3 consecutive blank lines here between
step_record_negative_tokensandstep_configure_org_budget. Python/ruff only allows 2 blank lines between top-level definitions. Remove one of these blank lines.Fix: Delete one of the three blank lines so only 2 remain.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -357,1 +357,4 @@]class BudgetExceededError(DomainError):BLOCKING:
BudgetExceededErroris defined after the__all__list and is not included in it. All other exceptions in this module are exported via__all__. This breaks the public API contract for this module.Fix: Add
"BudgetExceededError"to the__all__list defined above. The list is alphabetically ordered — insert it between"ActorConfigurationError"and"AutoMergeError"(or wherever it fits alphabetically).Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot | PR Review
Re-reviewed PR #10586 with
REQUEST_CHANGESstatus.Review ID: 7855
Prior feedback addressed (8/8): milestone, budget_type values, validator, unused lock, magic number, nested ternary, CHANGELOG, CONTRIBUTORS.
3 new blocking issues found:
step_check_budgetignorescontext.org_accumulator— Three-tier scenario broken (unit test failure)BudgetExceededErrornot in__all__(API contract broken)View review →
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
Review of Current State (HEAD:
a9692c6a)This is a re-review of the current head commit. The most recent commit (
a9692c6a fix(governance): correct budget_type docs, remove unused lock and lint fixes) addressed CHANGELOG, CONTRIBUTORS, and some refactoring ofcost_tracking.py. However, all three blocking issues identified in the previous review (ID 7855, submitted today at the same SHA) remain unresolved.✅ Previously Addressed Items (Confirmed Still Present)
The following items that were resolved in earlier rounds continue to be correctly addressed:
budget_typelabels now useBudgetLevel.SESSION.value/BudgetLevel.ORG.valueinenforce_budget()BudgetExhaustionEventvalidator accepts{"plan", "session", "org"}threading.Lockremoved fromCostTrackingService.__init__()WARNING_THRESHOLD = 0.9extracted as a module-level constantcheck_budget()replaced with clear if/else blocks[Unreleased]❌ Blocking Issues — ALL THREE Remain Unresolved
BLOCKING 1 — E303: Triple blank line in steps file (lint failure)
features/steps/cost_budget_tracking_enforcement_steps.pystill has 3 consecutive blank lines (lines 332, 333, 334) between the end ofstep_record_negative_tokens(which ends at line 331) and the@given("an organization budget of ${amount}")decorator at line 335.Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions. This is the root cause of the lint CI failure.
Fix: Delete one of the three blank lines (line 332, 333, or 334) so only 2 remain between those two functions.
BLOCKING 2 —
step_check_budgetstill ignorescontext.org_accumulator(unit_tests failure)In
features/steps/cost_budget_tracking_enforcement_steps.pyat line 186–192,step_check_budgetstill hardcodesNonefororg_accumulator:This means the "Three-tier budget hierarchy" scenario cannot exercise the org budget path. The
context.org_accumulatorset up bystep_configure_org_budgetis silently discarded.Additionally, the Three-tier scenario amounts cause the session budget to be exceeded before the org budget is checked. The scenario spends $40 at session level AND $80 at org level. But both
step_spend_at_session_levelandstep_spend_at_org_levelcallrecord_costwith BOTHcontext.session_budgetandcontext.org_accumulator, so after spending $40 and then $80, the session budget ($50 limit) is already exceeded ($40+$80=$120 > $50). A subsequent$15check would fail at SESSION level, not ORG level — which contradicts the step assertions expectingexceeded_level == "org".Fix (step):
Fix (scenario amounts): The feature scenario needs amounts that only exceed the org budget, not the session budget. For example:
$200.00(large enough not to be exceeded)$100.00$10.00$92.00$15.00→ org total would be $10+$92+$15=$117 > $100 org limit, while session $10+$15=$25 < $200Note: The
step_spend_at_session_levelandstep_spend_at_org_levelboth record to BOTH accumulators, so care must be taken to choose amounts that only cross the org limit.BLOCKING 3 —
BudgetExceededErrornot in__all__(public API contract broken)In
src/cleveragents/core/exceptions.py, the__all__list ends at line 358 with"ValidationError". TheBudgetExceededErrorclass is defined at line 360, after the__all__list, and is not included in it. All other exception classes in this module are exported via__all__. This omission breaks the public API contract for the exceptions module.Fix: Add
"BudgetExceededError"to the__all__list. The list is in alphabetical order — it should be inserted between"AuthenticationError"and"AuthorizationError"(B comes after Au, Bud... fits between Auth... and Auto...). Actually since "Budget" < "Business", it fits between"AuthorizationError"and"BusinessRuleViolation". The corrected list should include:CI Status
step_check_budgetignoresorg_accumulator; Three-tier scenario assertion failsNon-Blocking Suggestion (Carried Over)
Gherkin text inconsistency — The "Enforce budget limit when exceeded" scenario in
features/cost_budget_tracking_enforcement.feature(line 22) still reads:But the step function
step_verify_error_typeassertsbudget_type == BudgetLevel.SESSION.value(i.e.,"session"). The living documentation says "plan" but the domain concept being tested is "session". This is a non-blocking suggestion: update the feature text to"the error should indicate session budget exceeded"(and update the@thendecorator to match) so that the Gherkin accurately describes the behavior.Summary
Three targeted fixes are needed before this PR can be approved:
step_check_budgetto usegetattr(context, "org_accumulator", None)AND fix the Three-tier scenario amounts"BudgetExceededError"to__all__inexceptions.pyThe core implementation in
cost_tracking.pyremains clean and well-structured. These are all test/metadata issues.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +19,4 @@When an LLM call costs $3.00And another LLM call costs $2.50Then the second call should raise BudgetExceededErrorAnd the error should indicate plan budget exceededSUGGESTION (non-blocking — carried over): The step text says "the error should indicate plan budget exceeded" but the step function
step_verify_error_typeassertsbudget_type == BudgetLevel.SESSION.value(i.e."session"). The Gherkin living documentation says "plan" but the domain concept being tested is "session".Suggestion: Update the feature text on this line to:
And update the
@thendecorator in the step file to@then("the error should indicate session budget exceeded")so that Gherkin accurately describes the behavior.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +185,4 @@@when("a check for ${cost} cost is performed")def step_check_budget(context: object, cost: str) -> None:"""Check if cost fits in budget."""BLOCKING (unit_tests failure — still present):
step_check_budgetstill hardcodesNonefororg_accumulator(line 190). Thecontext.org_accumulatorconfigured bystep_configure_org_budgetis silently ignored, making the Three-tier budget hierarchy scenario unable to test the org budget path.Additionally, the Three-tier scenario amounts cause the session budget to be exceeded BEFORE the org budget is checked:
step_spend_at_session_levelandstep_spend_at_org_levelboth record to both accumulators, so after $40 + $80 = $120 total, the $50 session limit is already exceeded. The $15 check would fail at SESSION level, not ORG level.Fix (step):
Fix (scenario): Adjust Three-tier scenario amounts so only the org limit (not the session limit) is crossed. Example: session=$200, org=$100, spend $10 session + $92 org + check $15 (org total $117 > $100, session $25 < $200).
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +329,4 @@except ValueError as e:context.error_raised = Truecontext.error = eBLOCKING (E303 lint failure — still present): Lines 332, 333, and 334 are all blank, giving 3 consecutive blank lines between
step_record_negative_tokens(ends at line 331) and the@given("an organization budget of ${amount}")decorator at line 335. Python/ruff E303 enforces max 2 blank lines between top-level definitions.Fix: Delete one of lines 332–334 so only 2 blank lines remain.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -357,1 +357,4 @@]class BudgetExceededError(DomainError):BLOCKING (API contract — still present):
BudgetExceededErroris defined AFTER the__all__list and is NOT included in it. All other exception classes in this file are exported via__all__. This omission breaks the public API contract for the exceptions module.Fix: Add
"BudgetExceededError"to the__all__list in alphabetical order, between"AuthorizationError"and"BusinessRuleViolation":Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot | PR Review
Re-reviewed PR #10586 with
REQUEST_CHANGESstatus.Review ID: 7867
All 3 blocking issues from review 7855 remain unresolved:
step_check_budgetignorescontext.org_accumulator— Three-tier scenario broken + unit_tests CI failureBudgetExceededErrornot in__all__inexceptions.py— public API contract brokenView review →
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
No New Commits Since Last Review
The current HEAD is
a9692c6a— the same commit that was reviewed in review #7867 (submitted earlier today at 10:23 UTC). No new commits have been pushed to this branch since the last review. All three blocking issues identified in review #7867 carry over unchanged and are confirmed still present in the codebase.✅ Previously Resolved Items (Confirmed Still Present)
All items resolved in prior rounds remain correctly addressed:
budget_typelabels useBudgetLevel.SESSION.value/BudgetLevel.ORG.valueinenforce_budget()BudgetExhaustionEventvalidator accepts{"plan", "session", "org"}threading.Lockremoved fromCostTrackingService.__init__()WARNING_THRESHOLD = 0.9extracted as module-level constantcheck_budget()replaced with clear if/else blocks[Unreleased]❌ Blocking Issues — All Three Still Unresolved
CI continues to fail on lint, unit_tests, and status-check. The three root causes are unchanged.
BLOCKING 1 — E303: Triple blank lines in steps file (lint failure)
features/steps/cost_budget_tracking_enforcement_steps.pylines 332–334 are three consecutive blank lines between the end ofstep_record_negative_tokens(line 331) and the@given("an organization budget of ${amount}")decorator (line 335). Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions.Fix: Delete one of lines 332, 333, or 334 so only 2 blank lines remain.
BLOCKING 2 —
step_check_budgetignorescontext.org_accumulator(unit_tests failure)step_check_budget(line 187–192) hardcodesNonefororg_accumulator:The "Three-tier budget hierarchy" scenario sets
context.org_accumulatorviastep_configure_org_budget, but this step never reads it. The org budget path is never exercised.Furthermore, the scenario amounts cause the session budget to be exceeded before the org budget is evaluated, so the assertion
exceeded_level == "org"will always fail even after fixing the step.Fix (step):
Fix (scenario amounts): Choose amounts where only the org budget is exceeded. For example:
$200.00(large — will not be exceeded)$100.00$10.00$92.00$15.00→ org total: $10+$92+$15=$117 > $100 (org limit exceeded) while session: $10+$15=$25 < $200 (session OK)BLOCKING 3 —
BudgetExceededErrornot in__all__(public API contract broken)BudgetExceededErroris defined insrc/cleveragents/core/exceptions.pyat line 360, after the__all__list closes at line 357. It is not included in__all__, breaking the public API contract. All other exceptions in this file are exported via__all__.Fix: Add
"BudgetExceededError"to the__all__list. It belongs between"AuthorizationError"and"BusinessRuleViolation"(alphabetical order):Non-Blocking Suggestion (Carried Over)
Gherkin text inconsistency —
features/cost_budget_tracking_enforcement.featureline 22 reads:But
step_verify_error_type(line 130) assertsbudget_type == BudgetLevel.SESSION.value(i.e.,"session"). The living documentation says "plan" but the domain concept being tested is "session". Suggestion: update the feature text and the@thendecorator to"the error should indicate session budget exceeded".CI Summary
step_check_budgetignoresorg_accumulator; Three-tier scenario failsSummary
No new commits have been pushed since the last review. The three targeted fixes remain pending:
step_check_budgetto usegetattr(context, "org_accumulator", None)AND fix the Three-tier scenario amounts to only exceed the org limit"BudgetExceededError"to__all__inexceptions.py(between"AuthorizationError"and"BusinessRuleViolation")The core implementation in
cost_tracking.pyis clean and well-structured. These are all test/metadata issues that are straightforward to fix.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +188,4 @@"""Check if cost fits in budget."""cost_amount = float(cost)context.budget_check = context.cost_service.check_budget(context.session_budget, None, cost_amountBLOCKING —
org_accumulatorignored (unit_tests failure)step_check_budgetpassesNonehardcoded fororg_accumulator. When the "Three-tier budget hierarchy" scenario setscontext.org_accumulatorviastep_configure_org_budget, this step silently discards it — the org budget path is never exercised and theexceeded_level == "org"assertion fails.Fix (step):
Fix (scenario amounts): Also update the Three-tier scenario to use amounts where only the org budget is exceeded (e.g., session budget
$200.00, org budget$100.00, spend$10.00at session level +$92.00at org level, then check$15.00).@ -0,0 +331,4 @@context.error = eBLOCKING — E303: Triple blank lines (lint failure)
Lines 332, 333, and 334 are all empty — three consecutive blank lines between the end of
step_record_negative_tokensand the@given("an organization budget of ${amount}")decorator. Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions. This is the root cause of the lint CI failure.Fix: Delete one of these three blank lines so only 2 remain.
BLOCKING —
BudgetExceededErrornot in__all__(public API broken)The
__all__list closes here at]butBudgetExceededError(defined below at line 360) is not included. All other exception classes in this module are exported via__all__. This omission breaks the public API contract.Fix: Add
"BudgetExceededError"to__all__in alphabetical order between"AuthorizationError"and"BusinessRuleViolation":Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
a9692c6ad1b7ceaca8a5Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
HEAD:
b7ceaca8— New Commit Since Last ReviewThis is a re-review of HEAD
b7ceaca8(a new commit pushed since the last review ata9692c6a). The CI ran again on 2026-05-08 and now shows additional failures:typecheckandhelmare failing alongside the previously-identifiedlint,unit_tests, andstatus-check.✅ Previously Addressed Items (Confirmed Still Present)
All items resolved in prior rounds remain correctly addressed:
budget_typelabels useBudgetLevel.SESSION.value/BudgetLevel.ORG.valueinenforce_budget()BudgetExhaustionEventvalidator accepts{"plan", "session", "org"}threading.Lockremoved fromCostTrackingService.__init__()WARNING_THRESHOLD = 0.9extracted as module-level constantcheck_budget()replaced with clear if/else blocks[Unreleased]CostTrackingServiceimplementation incost_tracking.pyis clean and well-structured❌ Blocking Issues
Three issues from previous reviews remain unresolved, and CI now also shows two additional failing checks.
BLOCKING 1 — E303: Triple blank line in steps file (lint CI failure)
features/steps/cost_budget_tracking_enforcement_steps.pylines 332, 333, and 334 are three consecutive blank lines between the end ofstep_record_negative_tokens(line 331) and the@given("an organization budget of ${amount}")decorator (line 335). Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions. This is the root cause of theCI / lintfailure.Fix: Delete one of lines 332, 333, or 334 so that exactly 2 blank lines remain.
BLOCKING 2 —
step_check_budgetignorescontext.org_accumulator+ broken Three-tier scenario amounts (unit_tests CI failure)In
features/steps/cost_budget_tracking_enforcement_steps.pyat lines 186–192,step_check_budgethardcodesNonefororg_accumulator:This means the Three-tier budget hierarchy scenario cannot reach the org budget check path. The
context.org_accumulatorset bystep_configure_org_budgetis silently discarded.Additionally, the scenario amounts are logically flawed. Both
step_spend_at_session_levelandstep_spend_at_org_levelcallrecord_costwith BOTHcontext.session_budgetANDcontext.org_accumulator. So with the current amounts:The scenario asserts
exceeded_level == "org"but the code would fail at SESSION level first (even if the step were fixed to passorg_accumulator).Fix (step — lines 186–192):
Fix (Three-tier scenario amounts in
features/cost_budget_tracking_enforcement.feature):With these amounts: after both spends, session=$92 (<$200 OK), org=$92 (<$100 OK). Check $15: session would be $107 (<$200 OK), org would be $107 (>$100 EXCEEDED). The org budget fails first, correctly exercising the org budget path.
BLOCKING 3 —
BudgetExceededErrornot in__all__inexceptions.py(public API contract broken)In
src/cleveragents/core/exceptions.py, the__all__list is defined at lines 328–357 and ends with"ValidationError". TheBudgetExceededErrorclass is defined at line 360 — after the closing]of__all__— and is therefore not exported. All other exceptions in this file are in__all__.Fix: Add
"BudgetExceededError"to the__all__list in alphabetical order between"AuthorizationError"and"BusinessRuleViolation":BLOCKING 4 — NEW:
typecheckCI failing (introduced by this PR)CI / typecheckpasses on master (1m26s) but fails on this PR (13m2s before failure). Since all three commits in this PR add new Python code, the Pyright type errors are introduced by this PR. The author must inspect the CI logs or runnox -s typechecklocally to identify and fix all Pyright errors.Possible causes to investigate:
BudgetExceededErrorclass defined after__all__could cause import-order issuescost_metadata.budget_remaining = session_budget.remaining()— ifremaining()returnsfloat | Nonewhilebudget_remainingfield isfloat | None, this should be fine, but Pyright might still flag it)from __future__ import annotationsin some file (thoughcost_tracking.pyandcost_metadata.pyboth have it)BLOCKING 5 — NEW:
helmCI failing (introduced by this PR)CI / helmpasses on master (39s) but fails on this PR (12m54s — an unusually long runtime before failure suggesting a timeout). Since this PR only touches Python files (cost_tracking.py,exceptions.py,cost_metadata.py, step files, feature file, CHANGELOG, CONTRIBUTORS), the helm failure is unexpected.The author should inspect the helm CI logs. If the failure is confirmed to be a transient infrastructure issue unrelated to this PR's changes, please post a comment documenting this finding so it can be tracked separately.
Non-Blocking Suggestion (Carried Over)
Gherkin text inconsistency —
features/cost_budget_tracking_enforcement.featureline 22 reads:But
step_verify_error_type(line 133) assertscontext.budget_error.budget_type == BudgetLevel.SESSION.value(i.e.,"session"). The living documentation says "plan" but the domain concept being tested is "session". Suggestion: Update the feature text and the@thendecorator to"the error should indicate session budget exceeded"so the Gherkin accurately reflects the behavior.Commit Quality Note
Commit
2dd4803eusesCloses #8609in its footer, but the project standard per CONTRIBUTING.md isISSUES CLOSED: #N. Commitb7ceaca8correctly usesISSUES CLOSED: #8609. Please ensure all commits use theISSUES CLOSED: #Nformat consistently.CI Summary
step_check_budgetignoresorg_accumulator; Three-tier scenario amounts cause SESSION to fail before ORGSummary
Five targeted fixes are needed before this PR can be approved:
step_check_budgetto usegetattr(context, "org_accumulator", None)AND fix the Three-tier scenario amounts (suggested amounts provided above)"BudgetExceededError"to__all__inexceptions.py(between"AuthorizationError"and"BusinessRuleViolation")nox -s typechecklocally to identify Pyright errors)The core
cost_tracking.pyimplementation remains clean, well-structured, and correctly implements the budget hierarchy. Issues 1–3 are carryovers from previous reviews; issues 4–5 are newly introduced by the latest push.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
Previously Addressed Items — Confirmed Still Present
All items resolved in prior rounds remain correctly addressed in this HEAD (
b7ceaca8):budget_typeusesBudgetLevel.SESSION.value/BudgetLevel.ORG.valueinenforce_budget()BudgetExhaustionEventvalidator accepts{"plan", "session", "org"}threading.Lockremoved fromCostTrackingService.__init__()WARNING_THRESHOLD = 0.9extracted as module-level constantcheck_budget()replaced with clear if/else blocks[Unreleased]CostTrackingServiceimplementation incost_tracking.pyis clean and correctBlocking Issues — 5 Issues Require Resolution
CI is failing on lint, unit_tests, typecheck, helm, and status-check. Three issues carry over from review #7874. Two new failures appeared after the push of the current HEAD
b7ceaca8.BLOCKING 1 — E303: Triple blank lines in steps file (lint CI failure)
features/steps/cost_budget_tracking_enforcement_steps.pylines 332–334 are three consecutive blank lines between the end ofstep_record_negative_tokens(line 331) and the@given("an organization budget of ${amount}")decorator (line 335). Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions.Fix: Delete one of lines 332, 333, or 334 so only 2 blank lines remain.
BLOCKING 2 —
step_check_budgetignorescontext.org_accumulator+ broken Three-tier scenario amounts (unit_tests CI failure)step_check_budget(lines 186–192) hardcodesNonefororg_accumulator, so the Three-tier budget hierarchy scenario cannot reach the org budget check path. Thecontext.org_accumulatorset bystep_configure_org_budgetis silently discarded.Furthermore, the Three-tier scenario amounts are logically flawed: with $50 session budget, $40 already spent, and a $15 check, $40+$15=$55 exceeds the session limit FIRST — so the assertion
exceeded_level == "org"will always fail even after fixing the step.Fix (step — lines 186–192):
Fix (Three-tier scenario amounts in feature file lines 64–71):
After both spends, session=$92 (<$200), org=$92 (<$100). Check $15: session=$107 (<$200 OK), org=$107 (>$100 EXCEEDED). Only org fails, correctly exercising the org budget path.
BLOCKING 3 —
BudgetExceededErrornot in__all__in exceptions.py (public API contract broken)BudgetExceededErroris defined at line 360, after the__all__list closes at line 357. It is not exported. All other exceptions in this file are in__all__.Fix: Add
"BudgetExceededError"between"AuthorizationError"and"BusinessRuleViolation"in the__all__list:BLOCKING 4 — NEW: typecheck CI failing
CI / typecheckis failing (13m2s). Pyright strict mode errors are introduced by this PR. Runnox -s typechecklocally to identify all errors and fix them.# type: ignoreis never permitted.BLOCKING 5 — NEW: helm CI failing
CI / helmis failing (12m54s). Since this PR only touches Python files and documentation, this may be a transient infrastructure issue. Inspect the CI logs. If confirmed transient and unrelated to this PR, post a comment documenting that finding.Commit Quality Issue
Commit
2dd4803eusesCloses #8609in its body, but the project standard isISSUES CLOSED: #N. The latest commitb7ceaca8correctly usesISSUES CLOSED: #8609. Ensure all commits use the correct format before merge.Non-Blocking Suggestion (Carried Over)
features/cost_budget_tracking_enforcement.featureline 22 reads:But
step_verify_error_typeassertsbudget_type == BudgetLevel.SESSION.value. The Gherkin says "plan" but means "session". Suggestion: Update the step text and decorator to"the error should indicate session budget exceeded".CI Summary
Overall Assessment
All prior blocking issues from previous rounds have been addressed. The core
cost_tracking.pyimplementation is clean and well-structured. Five targeted fixes remain before approval:step_check_budgetto usegetattr(context, "org_accumulator", None)AND fix the Three-tier scenario amounts"BudgetExceededError"to__all__inexceptions.pynox -s typechecklocally)Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +19,4 @@When an LLM call costs $3.00And another LLM call costs $2.50Then the second call should raise BudgetExceededErrorAnd the error should indicate plan budget exceededSUGGESTION (non-blocking — carried over): This step says "the error should indicate plan budget exceeded" but
step_verify_error_typeassertsbudget_type == BudgetLevel.SESSION.value(i.e., "session"). The Gherkin living documentation says "plan" but the domain concept is "session".Suggestion: Update this step and its decorator to:
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +188,4 @@"""Check if cost fits in budget."""cost_amount = float(cost)context.budget_check = context.cost_service.check_budget(context.session_budget, None, cost_amountBLOCKING (unit_tests failure — still present):
step_check_budgethardcodesNonefororg_accumulator(line 191). Thecontext.org_accumulatorconfigured bystep_configure_org_budgetis silently ignored, making the Three-tier budget hierarchy scenario unable to exercise the org budget path.Also: the Three-tier scenario amounts cause the SESSION budget to be exceeded before the ORG budget is evaluated — fix those too (see review body for suggested amounts).
Fix:
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +329,4 @@except ValueError as e:context.error_raised = Truecontext.error = eBLOCKING (E303 lint failure — still present): Lines 332, 333, and 334 are three consecutive blank lines between
step_record_negative_tokens(ends at line 331) and the@given("an organization budget of ${amount}")decorator (line 335). Python/ruff E303 enforces max 2 blank lines between top-level definitions.Fix: Delete one of lines 332, 333, or 334 so only 2 blank lines remain.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -357,1 +357,4 @@]class BudgetExceededError(DomainError):BLOCKING (public API contract — still present):
BudgetExceededErroris defined after the__all__list closes at line 357 and is therefore NOT exported from this module. All other exception classes in this file are in__all__.Fix: Add
"BudgetExceededError"to__all__in alphabetical order between"AuthorizationError"and"BusinessRuleViolation":Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot | PR Review
Re-reviewed PR #10586 with
REQUEST_CHANGESstatus.Review ID: 8279
All 8 prior items confirmed addressed. 5 blocking issues remain (3 carried over from #7874, 2 newly introduced by HEAD
b7ceaca8):step_check_budgethardcodesNonefororg_accumulator+ broken Three-tier scenario amounts (unit_tests failure)BudgetExceededErrornot in__all__in exceptions.pyAutomated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
PR Review — #10586: feat(governance): implement cost and session budget tracking and enforcement
Overview
This PR implements cost and session budget tracking and enforcement, adding
CostTrackingService,BudgetExceededError, and BDD test coverage. The coreCostTrackingServiceimplementation incost_tracking.pyis well-structured and correctly implements the budget hierarchy. However, 6 blocking issues must be resolved — 5 cause direct CI failures (lint, unit_tests, typecheck, helm, status-check).CI Status Summary (HEAD:
b7ceaca8)BLOCKING Issues
BLOCKING 1 — E303: Triple blank lines in steps file (lint CI failure)
features/steps/cost_budget_tracking_enforcement_steps.pylines 332, 333, and 334 contain three consecutive blank lines betweenstep_record_negative_tokensand the@givendecorator. ruff E303 allows a maximum of 2 blank lines between top-level definitions.Fix: Delete one of lines 332, 333, or 334 so exactly 2 blank lines remain.
BLOCKING 2 — step_check_budget ignores context.org_accumulator (unit_tests CI failure)
features/steps/cost_budget_tracking_enforcement_steps.pyline 191 hardcodesNonefororg_accumulator. The Three-tier budget hierarchy scenario setscontext.org_accumulatorviastep_configure_org_budget, butstep_check_budgetsilently discards it.Fix:
BLOCKING 3 — Three-tier scenario amounts cause session budget to be exceeded first (unit_tests CI failure)
Both
step_spend_at_session_levelandstep_spend_at_org_levelcallrecord_cost()with bothsession_budgetANDorg_accumulator. After both steps: session = $40 + $80 = $120 which exceeds the $50 session limit. The check would fail at SESSION level, not ORG level — contradictingexceeded_level should be "org".Fix (suggested amounts):
After both spends: session=$92 (<$200), org=$92 (<$100). Check $15: session->$107 (<$200 OK), org->$107 (>$100 EXCEEDED). Org fails first — correct.
BLOCKING 4 — BudgetExceededError not in all in exceptions.py (public API contract broken)
src/cleveragents/core/exceptions.pydefines__all__ending at line 357, butBudgetExceededErroris appended after__all__at line 360. It is therefore not exported. All other exception classes in this file are in__all__.Fix: Add
"BudgetExceededError"to__all__alphabetically between"AuthorizationError"and"BusinessRuleViolation":BLOCKING 5 — typecheck CI failure (introduced by this PR)
CI / typecheckfails after 13m2s on this PR but passes on master. Runnox -s typechecklocally to identify and fix all Pyright errors. Per project policy,# type: ignoreis never permitted.Likely causes:
BudgetExceededErrordefined after__all__may cause Pyright import analysis issues; verify Pydantic model attribute types align with method return types.BLOCKING 6 — helm CI failure (investigate root cause)
CI / helmfails after 12m54s on this PR but passes on master (39s). Since this PR only modifies Python files and CHANGELOG/CONTRIBUTORS, the failure is unexpected. Please inspect the helm CI logs and either fix or document as a pre-existing infrastructure issue.Non-Blocking Suggestions
Suggestion 1 — Gherkin text mismatch: Line 22 reads
"the error should indicate plan budget exceeded"butstep_verify_error_typeassertsBudgetLevel.SESSION.value("session"). Update to"the error should indicate session budget exceeded"so the Gherkin accurately reflects the behavior.Suggestion 2 — Commit footer consistency: Commit
2dd4803eusesCloses #8609in its footer. Per CONTRIBUTING.md the required format isISSUES CLOSED: #N. Commitb7ceaca8correctly usesISSUES CLOSED: #8609. Commit9aef24b0has no issue reference footer at all — it should includeISSUES CLOSED: #8609. Please ensure all commits use theISSUES CLOSED: #Nformat consistently.Checklist Summary
Summary
Six fixes are needed:
The core CostTrackingService implementation remains well-structured and correctly implements the budget hierarchy. Issues 1-4 are carryovers from previous reviews; issues 5-6 are present at this HEAD.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +68,4 @@And $80.00 has been spent at organization levelAnd a check for $15.00 cost is performedThen the check should fail due to organization budgetAnd the exceeded_level should be "org"BLOCKING — Three-tier scenario amounts cause session budget to be exceeded first, not org
Both spend steps record cost to BOTH session_budget AND org_accumulator:
Even after fixing step_check_budget to pass org_accumulator, check_budget() will fail at SESSION level first (since $120+$15 > $50), not org. The assertion
exceeded_level should be "org"will fail.Fix (suggested amounts):
After both spends: session=$92 (<$200 OK), org=$92 (<$100 OK). Check $15: session->$107 (<$200 OK), org->$107 (>$100 EXCEEDED). Org fails first.
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +189,4 @@cost_amount = float(cost)context.budget_check = context.cost_service.check_budget(context.session_budget, None, cost_amount)BLOCKING — step_check_budget always passes None for org_accumulator
Line 191 hardcodes
Noneinstead of readingcontext.org_accumulator. The Three-tier budget hierarchy scenario setscontext.org_accumulatorviastep_configure_org_budget, but this step silently discards it — the org budget path is never exercised.Fix:
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +331,4 @@context.error = eBLOCKING — E303: Triple blank lines (lint CI failure)
Lines 332, 333, and 334 are three consecutive blank lines. ruff E303 enforces a maximum of 2 blank lines between top-level definitions. This is the root cause of the CI / lint failure.
Fix: Delete one blank line so exactly 2 remain before the decorator:
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -355,3 +355,50 @@ __all__ = ["ToolTypeMismatchError",BLOCKING — BudgetExceededError is not in all and therefore not exported
BudgetExceededErroris defined at line 360, after the closing]of__all__at line 357. This means it is NOT part of the module's public API. Every other exception class in this file is listed in__all__. This breaks the public API contract and may contribute to the typecheck CI failure.Fix: Add
"BudgetExceededError"to__all__in alphabetical order between"AuthorizationError"and"BusinessRuleViolation":Also consider moving the class definition to before
__all__for consistency with all other exceptions in the file.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot | PR Review
Reviewed PR #10586 with
REQUEST_CHANGESstatus.Review ID: 8300
6 blocking issues found:
step_check_budgetignorescontext.org_accumulator— hardcodesNone(unit_tests CI failure)BudgetExceededErrornot in__all__inexceptions.py(public API contract broken)typecheckCI failure — Pyright errors introduced by this PRhelmCI failure — investigate CI logs for root causeView review
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Re-Review — PR #10586: feat(governance): implement cost and session budget tracking and enforcement
No New Commits Since Last Review
The current HEAD is
b7ceaca8— the same commit reviewed in review #8300 (submitted at 01:58:07Z today). No new commits have been pushed to this branch since the last review. All six blocking issues identified in review #8300 carry over unchanged and are confirmed still present in the codebase by direct inspection of the source.✅ Previously Resolved Items (Confirmed Still Present)
All items resolved in prior rounds remain correctly addressed:
budget_typelabels useBudgetLevel.SESSION.value/BudgetLevel.ORG.valueinenforce_budget()BudgetExhaustionEventvalidator accepts{"plan", "session", "org"}threading.Lockremoved fromCostTrackingService.__init__()WARNING_THRESHOLD = 0.9extracted as module-level constantcheck_budget()replaced with clear if/else blocks[Unreleased]CostTrackingServiceimplementation incost_tracking.pyis clean and well-structuredType/Featurelabel applied; single label requirement met❌ Blocking Issues — All Six Remain Unresolved
CI continues to fail on lint, unit_tests, typecheck, helm, and status-check. All six root causes are unchanged.
BLOCKING 1 — E303: Triple blank lines in steps file (lint CI failure)
features/steps/cost_budget_tracking_enforcement_steps.pylines 332, 333, and 334 are three consecutive blank lines between the end ofstep_record_negative_tokens(line 331) and the@given("an organization budget of ${amount}")decorator (line 335). Python/ruff E303 enforces a maximum of 2 blank lines between top-level definitions.Fix: Delete one of lines 332, 333, or 334 so only 2 blank lines remain:
BLOCKING 2 —
step_check_budgetignorescontext.org_accumulator(unit_tests CI failure)features/steps/cost_budget_tracking_enforcement_steps.pylines 186–192:step_check_budgethardcodesNonefororg_accumulator:The
context.org_accumulatorset bystep_configure_org_budgetis silently discarded — the org budget path is never exercised.Fix:
BLOCKING 3 — Three-tier scenario amounts cause session budget to be exceeded first (unit_tests CI failure)
features/cost_budget_tracking_enforcement.featurelines 64–71: the Three-tier scenario has a$50.00session budget, spends$40.00then$80.00at both session and org levels. After both spends: session_total = $120 which exceeds the $50 limit. The check for $15 fails at SESSION level, not ORG level — contradicting the assertionexceeded_level should be "org".Fix (suggested amounts):
After both spends: session=$92 (<$200 OK), org=$92 (<$100 OK). Check $15: session=$107 (<$200 OK), org=$107 (>$100 EXCEEDED). Org fails first — correct.
BLOCKING 4 —
BudgetExceededErrornot in__all__inexceptions.py(public API contract broken)src/cleveragents/core/exceptions.pydefines__all__ending at line 357 with"ValidationError".BudgetExceededErroris appended at line 360, after__all__closes. It is not exported. All other exception classes in this file are in__all__.Fix: Add
"BudgetExceededError"to__all__alphabetically between"AuthorizationError"and"BusinessRuleViolation":BLOCKING 5 —
typecheckCI failing (introduced by this PR)CI / typecheckfails after 13m2s. Pyright strict mode errors introduced by this PR. Runnox -s typechecklocally to identify and fix all errors. Per project policy,# type: ignoreis never permitted.BLOCKING 6 —
helmCI failing (investigate root cause)CI / helmfails after 12m54s on this PR but passes on master (39s). Since this PR only modifies Python files, the helm failure is unexpected. Inspect the CI logs and either fix or document as a pre-existing infrastructure issue.Commit Quality Issues (Minor, Non-Blocking)
2dd4803eusesCloses #8609in its footer; per CONTRIBUTING.md the required format isISSUES CLOSED: #N. Commitb7ceaca8correctly usesISSUES CLOSED: #8609.9aef24b0has no issue reference footer at all — it should includeISSUES CLOSED: #8609.Non-Blocking Suggestion (Carried Over)
features/cost_budget_tracking_enforcement.featureline 22 reads:But
step_verify_error_typeassertsbudget_type == BudgetLevel.SESSION.value(i.e.,"session"). The Gherkin says "plan" but means "session". Suggestion: Update the step text and@thendecorator to"the error should indicate session budget exceeded"so the living documentation accurately describes what is being tested.CI Summary
step_check_budgetignoresorg_accumulator; Three-tier scenario amounts wrongOverall Assessment
No new commits have been pushed since review #8300. The core
CostTrackingServiceimplementation incost_tracking.pyremains clean, correct, and well-structured. Six targeted fixes are still needed:step_check_budgetto usegetattr(context, "org_accumulator", None)instead of hardcodedNone"BudgetExceededError"to__all__inexceptions.py(between"AuthorizationError"and"BusinessRuleViolation")typecheckCI failure (runnox -s typechecklocally and resolve all Pyright errors without# type: ignore)helmCI failure and fix or document as pre-existingAutomated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
🌱 Grooming: proceed — PR cleared for processing.
(check
no_duplicates, categoryno_duplicates)Anchor PR #10586 addresses cost and session budget enforcement/governance, distinct from complementary v3.6.0 epic work: #10610 implements the CostTracker measurement service, #10616 adds cost-reporting CLI display, #10662 handles safety-profile enforcement (different domain). No topical overlap between enforcement-layer governance and measurement/display layers. Anchor is part of Epic #8569; no other open PR closes the same issue with later merged_at timestamp.
📋 Estimate: tier 1.
Substantial multi-file feature (+1019/-572, 7 files) introducing a new cost tracking module and BDD step definitions. CI has two distinct failure classes: (1) ruff format violations on 2 files (mechanical), and (2) a Behave AmbiguousStep conflict between
@when('an LLM call costs ${cost} from {provider} using {model}')and@when('an LLM call costs ${cost}')in the same step file — requires understanding Behave step matching semantics to resolve correctly. Cross-file context needed to fix step definitions without breaking existing scenarios. Not tier 0 due to size, new logic, and test-fixture ambiguity requiring judgment. Not tier 2 — fixes are well-scoped with no architectural redesign.(attempt #4, tier 1)
🔧 Implementer attempt —
rebase-failed.Blockers:
b7ceaca8a566df33d94466df33d9446c7ef98fdb(attempt #7, tier 1)
🔧 Implementer attempt —
rebased.Pushed 1 commit:
6c7ef98.6c7ef98fdbf6af7782d5(attempt #8, tier 1)
🔧 Implementer attempt —
rebased.Pushed 1 commit:
f6af778.- Remove triple blank line in steps file (ruff format E303) - Collapse multiline if conditions in cost_tracking.py (ruff format) - Add BudgetExceededError to __all__ in exceptions.py - Fix step_check_budget to pass context.org_accumulator instead of None - Fix Three-tier scenario amounts so org budget fails first (session $200, org $100) - Disambiguate @when step using {cost:f} type specifier to avoid AmbiguousStep - Rename duplicate @then step to avoid conflict with lsp_registry_steps.py - Add missing column header rows to Gherkin tables in feature file ISSUES CLOSED: #8609351f1277602861f01132(attempt #11, tier 1)
🔧 Implementer attempt —
rebased.Pushed 1 commit:
2861f01.2861f011324699d29057(attempt #12, tier 2)
🔧 Implementer attempt —
rebased.Pushed 1 commit:
4699d29.The BudgetExhaustionEvent validator dropped "daily" from its allowed set, but the legacy CostTracker in providers/cost_tracker.py still emits "daily" events for the per-day budget path. That broke two cost_controls.feature scenarios (BudgetExhaustionEvent accepts daily budget type; CostTracker blocks at 100% of daily budget). Restore "daily" alongside "plan", "session", "org". CostTrackingService lived in src/cleveragents/core/cost_tracking.py but imported from cleveragents.domain.models.core.*, violating the architecture.feature invariant "core package should be self-contained" (core may only import from {core, config, shared}). Move the service to application/services/, where domain imports are permitted, and update the lone caller in the BDD step file. ISSUES CLOSED: #8609🌱 Grooming: proceed — PR cleared for processing.
(check
no_duplicates, categoryno_duplicates)Anchor PR implements cost/session budget tracking & enforcement for Epic #8569. Related budget PRs (#10610 CostTracker, #10616 cost reporting, #10662 safety profiles, #10653 dynamic budget) target the same epic but have distinct scopes (service vs CLI vs enforcement vs context assembly). Anchor's larger diff and governance focus suggest broader umbrella feature rather than duplicate of any single related PR. No exact topical match found among 411 open PRs.
📋 Estimate: tier 1.
6 files, +783 LOC implementing cost and session budget tracking/enforcement — substantial multi-file new feature closing a governance epic. Enforcement semantics imply integration with session lifecycle control points across subsystems. All code-level CI gates passed (lint, typecheck, security, quality, unit_tests, integration_tests); the three failing gates (coverage, docker, status-check) are all infrastructure failures (git fetch network reset, artifact upload EPERM) unrelated to code quality. Tier 1 is appropriate: multi-file, new logic branches, cross-cutting governance layer, but no evidence of concurrency hazards or repo-wide architectural restructuring that would warrant tier 2.
(attempt #16, tier 1)
🔧 Implementer attempt —
blocked.Blockers:
7f13f0e819but dispatch base wasdfb4ae0bfe. The implementer pushed from inside the worktree (forbidden by the git contract) OR a third party pushed during the attempt. Re-dispatch will re-prefetch and pick up the new head.(attempt #17, tier 2)
🔧 Implementer attempt —
ci-not-ready.✅ Approved
Reviewed at commit
7f13f0e.Confidence: high.
Claimed by
merge_drive.py(pid 3317687) until2026-06-04T09:49:24.064972+00:00.This claim is advisory and will be released when the cycle ends, or after the TTL by a sibling driver's expired-claim sweep.
Approved by the controller reviewer stage (workflow 226).