feat(guardrails): implement Per-Session and Per-Org Cost Budgets #584

Closed
opened 2026-03-04 23:45:53 +00:00 by freemo · 2 comments
Owner

Metadata

Field Value
Commit Message feat(guardrails): implement Per-Session and Per-Org Cost Budgets
Branch feature/m6plus-per-session-per-org-cost-budgets

Summary

Implement per-session and per-organization budget caps that aggregate cost limits across multiple plans. Currently only per-plan budgets are partially supported. The spec defines a hierarchy: per-plan → per-session → per-org, with the tightest limit always winning.

Spec Reference

Section: Core Concepts > Guardrails > Cost and Rate Limits
Lines: ~28242-28252

Current State

  • Per-plan budget enforcement exists in AutonomyGuardrailService (wall-clock limits, step limits).
  • Per-plan max_cost_usd is referenced in automation profiles.
  • No per-session budget aggregation: No tracking of cumulative cost across all plans in a session.
  • No per-org budget enforcement: No organization-level spend caps for server mode.
  • No integration with LLM provider billing APIs.

Description

The spec defines three budget levels beyond per-plan:

  1. Per-session budgets: Aggregate limits across all plans in a session. Prevents a single interactive session from consuming excessive resources. When exceeded, all plans in the session pause.

  2. Per-org budgets: Server-enforced limits for multi-user deployments. Administrators set organization-wide spend caps that cannot be overridden by individual users. When exceeded, no new plans can start.

  3. Budget hierarchy: Per-plan → per-session → per-org. The tightest applicable limit always wins.

The spec notes: "Cost and rate limits are future concerns that require integration with LLM provider billing APIs and internal metering. The system should define configuration surfaces for these limits but may initially implement only the per-plan and per-actor limits, with per-session, per-org, and billing integration added later."

Implementation approach:

  • Define configuration surfaces: session.max_cost_usd, org.max_cost_usd
  • Implement session-level cost accumulator (sum of all plan costs in session)
  • Implement org-level cost accumulator (server mode, stored in DB)
  • Budget check before each LLM invocation

Acceptance Criteria

  • Configuration keys: session.max_cost_usd, org.max_cost_usd (in addition to existing per-plan)
  • Session-level cost tracking: accumulate costs across all plans in a session
  • Per-session budget enforcement: pause all plans when session budget exceeded
  • Per-org budget tracking (server mode): accumulate costs across all sessions in an org
  • Per-org budget enforcement (server mode): reject new plans when org budget exceeded
  • Budget hierarchy: tightest limit wins (per-plan, per-session, per-org)
  • BUDGET_WARNING event emitted at configurable threshold (e.g., 80% utilization)
  • BUDGET_EXCEEDED event emitted when limit hit
  • CLI display: agents session show includes session cost and budget utilization
  • Unit tests for budget aggregation and enforcement at each level
  • Configuration validation: per-session must be >= per-plan, per-org must be >= per-session
  • Extends: existing AutonomyGuardrailService budget enforcement
  • Related: LLM trace cost tracking in trace_service.py
  • Used by: Diagnostic Dashboard cost summary section

Suggested Milestone

v3.6.0

Priority

Low

Suggested Assignee

@freemo — Architecture/budget system design

Subtasks

  • Code: Implement configuration keys session.max_cost_usd, org.max_cost_usd and budget hierarchy (per-plan → per-session → per-org, tightest wins)
  • Code: Implement session-level cost accumulator and per-session budget enforcement (pause all plans when exceeded)
  • Code: Implement per-org cost tracking and enforcement for server mode (reject new plans when exceeded)
  • Code: Emit BUDGET_WARNING and BUDGET_EXCEEDED events at configurable thresholds; add budget display to agents session show
  • Docs: Document budget hierarchy, configuration, and enforcement behavior
  • Behave tests: Add BDD feature file features/guardrails/cost_budgets.feature covering per-session and per-org budget enforcement
  • Robot tests: Add Robot Framework integration test for budget aggregation and enforcement at each level
  • ASV benchmarks: Add ASV benchmark for budget check overhead per LLM invocation (benchmarks/bench_budget_check.py)
  • Quality: coverage ≥97%: Verify via nox -s coverage_report
  • Quality: nox full suite: Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata | Field | Value | |-------|-------| | **Commit Message** | `feat(guardrails): implement Per-Session and Per-Org Cost Budgets` | | **Branch** | `feature/m6plus-per-session-per-org-cost-budgets` | ## Summary Implement per-session and per-organization budget caps that aggregate cost limits across multiple plans. Currently only per-plan budgets are partially supported. The spec defines a hierarchy: per-plan → per-session → per-org, with the tightest limit always winning. ## Spec Reference **Section**: Core Concepts > Guardrails > Cost and Rate Limits **Lines**: ~28242-28252 ## Current State - Per-plan budget enforcement exists in `AutonomyGuardrailService` (wall-clock limits, step limits). - Per-plan `max_cost_usd` is referenced in automation profiles. - **No per-session budget aggregation**: No tracking of cumulative cost across all plans in a session. - **No per-org budget enforcement**: No organization-level spend caps for server mode. - No integration with LLM provider billing APIs. ## Description The spec defines three budget levels beyond per-plan: 1. **Per-session budgets**: Aggregate limits across all plans in a session. Prevents a single interactive session from consuming excessive resources. When exceeded, all plans in the session pause. 2. **Per-org budgets**: Server-enforced limits for multi-user deployments. Administrators set organization-wide spend caps that cannot be overridden by individual users. When exceeded, no new plans can start. 3. **Budget hierarchy**: Per-plan → per-session → per-org. The tightest applicable limit always wins. The spec notes: "Cost and rate limits are future concerns that require integration with LLM provider billing APIs and internal metering. The system should define configuration surfaces for these limits but may initially implement only the per-plan and per-actor limits, with per-session, per-org, and billing integration added later." ### Implementation approach: - Define configuration surfaces: `session.max_cost_usd`, `org.max_cost_usd` - Implement session-level cost accumulator (sum of all plan costs in session) - Implement org-level cost accumulator (server mode, stored in DB) - Budget check before each LLM invocation ## Acceptance Criteria - [ ] Configuration keys: `session.max_cost_usd`, `org.max_cost_usd` (in addition to existing per-plan) - [ ] Session-level cost tracking: accumulate costs across all plans in a session - [ ] Per-session budget enforcement: pause all plans when session budget exceeded - [ ] Per-org budget tracking (server mode): accumulate costs across all sessions in an org - [ ] Per-org budget enforcement (server mode): reject new plans when org budget exceeded - [ ] Budget hierarchy: tightest limit wins (per-plan, per-session, per-org) - [ ] `BUDGET_WARNING` event emitted at configurable threshold (e.g., 80% utilization) - [ ] `BUDGET_EXCEEDED` event emitted when limit hit - [ ] CLI display: `agents session show` includes session cost and budget utilization - [ ] Unit tests for budget aggregation and enforcement at each level - [ ] Configuration validation: per-session must be >= per-plan, per-org must be >= per-session ## Related Issues - Extends: existing `AutonomyGuardrailService` budget enforcement - Related: LLM trace cost tracking in `trace_service.py` - Used by: Diagnostic Dashboard cost summary section ## Suggested Milestone v3.6.0 ## Priority Low ## Suggested Assignee @freemo — Architecture/budget system design ## Subtasks - [ ] **Code**: Implement configuration keys `session.max_cost_usd`, `org.max_cost_usd` and budget hierarchy (per-plan → per-session → per-org, tightest wins) - [ ] **Code**: Implement session-level cost accumulator and per-session budget enforcement (pause all plans when exceeded) - [ ] **Code**: Implement per-org cost tracking and enforcement for server mode (reject new plans when exceeded) - [ ] **Code**: Emit `BUDGET_WARNING` and `BUDGET_EXCEEDED` events at configurable thresholds; add budget display to `agents session show` - [ ] **Docs**: Document budget hierarchy, configuration, and enforcement behavior - [ ] **Behave tests**: Add BDD feature file `features/guardrails/cost_budgets.feature` covering per-session and per-org budget enforcement - [ ] **Robot tests**: Add Robot Framework integration test for budget aggregation and enforcement at each level - [ ] **ASV benchmarks**: Add ASV benchmark for budget check overhead per LLM invocation (`benchmarks/bench_budget_check.py`) - [ ] **Quality: coverage ≥97%**: Verify via `nox -s coverage_report` - [ ] **Quality: nox full suite**: Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo self-assigned this 2026-03-05 00:30:27 +00:00
freemo added this to the v3.6.0 milestone 2026-03-05 00:30:27 +00:00
Author
Owner

Implementation Started

Starting work on per-session and per-org cost budgets. Codebase exploration complete.

Architecture Plan:

  1. Settings: Add session_max_cost_usd, org_max_cost_usd, budget_warning_threshold to Settings
  2. Domain Model: Add cost_budget fields to Session model; add OrgCostAccumulator for per-org tracking
  3. Budget Service: New CostBudgetService in application/services/ for hierarchy enforcement
  4. Guardrail Integration: Extend AutonomyGuardrailService with budget hierarchy check methods
  5. Events: Use existing BUDGET_WARNING and BUDGET_EXCEEDED event types
  6. CLI: Update session show to display budget utilization
  7. Tests: Behave BDD, Robot Framework, and ASV benchmarks

Branch: feature/m6plus-per-session-per-org-cost-budgets

## Implementation Started Starting work on per-session and per-org cost budgets. Codebase exploration complete. ### Architecture Plan: 1. **Settings**: Add `session_max_cost_usd`, `org_max_cost_usd`, `budget_warning_threshold` to `Settings` 2. **Domain Model**: Add `cost_budget` fields to `Session` model; add `OrgCostAccumulator` for per-org tracking 3. **Budget Service**: New `CostBudgetService` in `application/services/` for hierarchy enforcement 4. **Guardrail Integration**: Extend `AutonomyGuardrailService` with budget hierarchy check methods 5. **Events**: Use existing `BUDGET_WARNING` and `BUDGET_EXCEEDED` event types 6. **CLI**: Update `session show` to display budget utilization 7. **Tests**: Behave BDD, Robot Framework, and ASV benchmarks Branch: `feature/m6plus-per-session-per-org-cost-budgets`
Author
Owner

Implementation Complete

PR #675 implements the three-tier budget hierarchy (per-plan → per-session → per-org) with the tightest limit winning.

Key Components

Domain Models (src/cleveragents/domain/models/core/cost_budget.py):

  • BudgetLevel enum, BudgetCheckResult (frozen), SessionCostBudget, OrgCostAccumulator, ThreadSafeOrgCostAccumulator

Service (src/cleveragents/application/services/cost_budget_service.py):

  • CostBudgetService — thread-safe budget management, hierarchy enforcement, BUDGET_WARNING (once per session) and BUDGET_EXCEEDED event emission

Integration:

  • AutonomyGuardrailService extended with associate_plan_with_session(), check_budget_hierarchy(), record_plan_cost_to_session()
  • Settings gains session_max_cost_usd, org_max_cost_usd, budget_warning_threshold
  • Session model gains cost_budget field; as_cli_dict() includes budget data
  • DI container wires CostBudgetService as Singleton
  • CLI session show displays a cost budget panel

Verification

  • nox -s lint
  • nox -s typecheck ✓ (pyright standard mode, 0 errors)
  • nox -s unit_tests ✓ (9809 scenarios, 37792 steps pass)
  • nox -s coverage_report ✓ (98% ≥ 97% threshold)
  • nox -s integration_tests — all 11 new cost budget Robot tests pass; pre-existing failures unchanged

Test Coverage

  • 54 Behave scenarios covering domain models, service methods, validation, hierarchy enforcement, event emission, and guardrail integration
  • 11 Robot Framework integration tests
  • ASV benchmarks for performance regression tracking
## Implementation Complete PR #675 implements the three-tier budget hierarchy (per-plan → per-session → per-org) with the tightest limit winning. ### Key Components **Domain Models** (`src/cleveragents/domain/models/core/cost_budget.py`): - `BudgetLevel` enum, `BudgetCheckResult` (frozen), `SessionCostBudget`, `OrgCostAccumulator`, `ThreadSafeOrgCostAccumulator` **Service** (`src/cleveragents/application/services/cost_budget_service.py`): - `CostBudgetService` — thread-safe budget management, hierarchy enforcement, `BUDGET_WARNING` (once per session) and `BUDGET_EXCEEDED` event emission **Integration**: - `AutonomyGuardrailService` extended with `associate_plan_with_session()`, `check_budget_hierarchy()`, `record_plan_cost_to_session()` - `Settings` gains `session_max_cost_usd`, `org_max_cost_usd`, `budget_warning_threshold` - `Session` model gains `cost_budget` field; `as_cli_dict()` includes budget data - DI container wires `CostBudgetService` as Singleton - CLI `session show` displays a cost budget panel ### Verification - `nox -s lint` ✓ - `nox -s typecheck` ✓ (pyright standard mode, 0 errors) - `nox -s unit_tests` ✓ (9809 scenarios, 37792 steps pass) - `nox -s coverage_report` ✓ (98% ≥ 97% threshold) - `nox -s integration_tests` — all 11 new cost budget Robot tests pass; pre-existing failures unchanged ### Test Coverage - 54 Behave scenarios covering domain models, service methods, validation, hierarchy enforcement, event emission, and guardrail integration - 11 Robot Framework integration tests - ASV benchmarks for performance regression tracking
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#584
No description provided.