feat(estimation): build historical plan statistics query service for estimation context assembly #652

Open
opened 2026-03-09 20:16:25 +00:00 by freemo · 9 comments
Owner

Metadata

  • Commit Message: feat(estimation): build historical plan statistics query service for estimation context assembly
  • Branch: feature/m6-estimation-historical-stats

Background and Context

The specification (lines 19000-19004) states the estimation actor analyzes three inputs: (1) the initial prompt/request, (2) the strategy produced by Strategize, and (3) "Historical data from similar plans (if available)." The database already stores cost_actual_usd, token_count_input, token_count_output, created_at, and completed_at on plans (lines 43456-43459), providing raw data for historical analysis. However, there is no service that queries and aggregates this data for the estimation actor's context.

The CostTracker (in providers/cost_tracker.py) tracks live budget consumption but has no historical query capability — it works only within a single plan execution. No service exists to answer "What was the average cost/duration of previous plans using action X?" which the estimation actor needs for informed estimates.

Expected Behavior

A HistoricalPlanStatsService exists that queries completed plans by action name and returns aggregated statistics (mean/median cost, duration, step count, success rate) for use in the estimation actor's context assembly.

Acceptance Criteria

  1. HistoricalPlanStatsService exists in src/cleveragents/application/services/ with method get_stats_for_action(action_name: str, limit: int = 50) -> HistoricalPlanStats.
  2. A repository method get_completed_plans_by_action(action_name, limit) returns the N most recent completed plans for a given action.
  3. HistoricalPlanStats Pydantic value object packages: mean/median/p90 cost, mean/median duration, average step count, average child plan count, success rate.
  4. The service integrates with the ACMS context assembly pipeline so historical stats are injected into the estimation actor's hot context tier.
  5. Empty history (first run) returns a valid empty stats object, not an error.
  6. All existing tests continue to pass.

Subtasks

  • Create HistoricalPlanStats Pydantic model in src/cleveragents/domain/models/core/
  • Add get_completed_plans_by_action() method to plan repository
  • Implement HistoricalPlanStatsService with statistical aggregation logic
  • Integrate with ACMS context assembly for estimation actor context injection
  • Tests (Behave): Add scenarios for empty history, normal history, and outlier handling
  • Tests (Robot): Add integration test for stats query with real DB
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `feat(estimation): build historical plan statistics query service for estimation context assembly` - **Branch**: `feature/m6-estimation-historical-stats` ## Background and Context The specification (lines 19000-19004) states the estimation actor analyzes three inputs: (1) the initial prompt/request, (2) the strategy produced by Strategize, and (3) "Historical data from similar plans (if available)." The database already stores `cost_actual_usd`, `token_count_input`, `token_count_output`, `created_at`, and `completed_at` on plans (lines 43456-43459), providing raw data for historical analysis. However, there is no service that queries and aggregates this data for the estimation actor's context. The `CostTracker` (in `providers/cost_tracker.py`) tracks live budget consumption but has no historical query capability — it works only within a single plan execution. No service exists to answer "What was the average cost/duration of previous plans using action X?" which the estimation actor needs for informed estimates. ## Expected Behavior A `HistoricalPlanStatsService` exists that queries completed plans by action name and returns aggregated statistics (mean/median cost, duration, step count, success rate) for use in the estimation actor's context assembly. ## Acceptance Criteria 1. `HistoricalPlanStatsService` exists in `src/cleveragents/application/services/` with method `get_stats_for_action(action_name: str, limit: int = 50) -> HistoricalPlanStats`. 2. A repository method `get_completed_plans_by_action(action_name, limit)` returns the N most recent completed plans for a given action. 3. `HistoricalPlanStats` Pydantic value object packages: mean/median/p90 cost, mean/median duration, average step count, average child plan count, success rate. 4. The service integrates with the ACMS context assembly pipeline so historical stats are injected into the estimation actor's hot context tier. 5. Empty history (first run) returns a valid empty stats object, not an error. 6. All existing tests continue to pass. ## Subtasks - [x] Create `HistoricalPlanStats` Pydantic model in `src/cleveragents/domain/models/core/` - [x] Add `get_completed_plans_by_action()` method to plan repository - [x] Implement `HistoricalPlanStatsService` with statistical aggregation logic - [x] Integrate with ACMS context assembly for estimation actor context injection - [x] Tests (Behave): Add scenarios for empty history, normal history, and outlier handling - [x] Tests (Robot): Add integration test for stats query with real DB - [x] Verify coverage >=97% via `nox -s coverage_report` - [x] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.5.0 milestone 2026-03-09 20:16:31 +00:00
freemo self-assigned this 2026-04-02 06:13:57 +00:00
Member

Implementation Notes

Design Decisions

1. HistoricalPlanStats (domain model)

  • Located at src/cleveragents/domain/models/core/historical_plan_stats.py
  • Frozen Pydantic v2 value object (immutable once constructed)
  • Provides empty() class method factory for first-run scenarios — returns valid stats with sample_size=0 and all metrics at 0.0
  • as_context_dict() method produces a compact dict that omits zero-valued metrics, suitable for ACMS context fragment injection
  • Exported from domain.models.core.__init__ alongside EstimationResult

2. get_completed_plans_by_action() (repository method)

  • Added to LifecyclePlanRepository in infrastructure/database/repositories.py
  • Filters on terminal states: complete, applied, errored, cancelled
  • Uses @database_retry decorator consistent with other repository methods
  • Returns plans ordered by created_at DESC (most recent first)
  • Returns empty list for limit < 1 without hitting the database

3. HistoricalPlanStatsService (application service)

  • Located at src/cleveragents/application/services/historical_plan_stats_service.py
  • Stateless service (thread-safe) with injected LifecyclePlanRepository
  • get_stats_for_action(action_name, limit=50) is the primary entry point
  • Extracts cost from cost_metadata.total_cost, duration from timestamps.created_at to timestamps.applied_at (or execute_completed_at), step count from last_completed_step, child plan count from subplan_statuses
  • Uses statistics.mean() and statistics.median() from stdlib; custom _percentile() helper for p90
  • Gracefully handles plans with missing cost metadata, timestamps, or other data

4. EstimationHistoricalStatsProvider (ACMS integration)

  • Located at src/cleveragents/application/services/estimation_context_provider.py
  • Implements the ContextStrategy protocol from acms_service.py
  • Strategy name: estimation-historical-stats
  • Creates a ContextFragment with uko_node pointing to historical-stats://<action_name>
  • Returns 0.8 confidence when an action name is configured, 0.0 otherwise
  • Gracefully handles: service exceptions (returns unchanged fragments), empty history (returns unchanged fragments), budget exceeded (skips fragment)
  • Can be registered with ACMSPipeline.register_strategy() or ContextAssemblyPipeline

Test Coverage

  • Behave: 27 scenarios across model validation, service logic (empty/normal/edge cases), ACMS provider integration, and repository methods in features/historical_plan_stats.feature
  • Robot: 7 integration tests in robot/historical_plan_stats.robot with helper robot/helper_historical_plan_stats.py
  • Overall coverage: 98.6% (threshold: 97%)

Quality Gate Results

Stage Result
nox -s lint Pass
nox -s typecheck Pass (0 errors)
nox -s unit_tests Pass (556 features, 13798 scenarios, 0 failures)
nox -s integration_tests Pass (1875 tests, 0 failures)
nox -s e2e_tests Pass (64 tests, 63 passed, 1 skipped)
nox -s coverage_report Pass (98.6%, threshold 97%)
## Implementation Notes ### Design Decisions **1. `HistoricalPlanStats` (domain model)** - Located at `src/cleveragents/domain/models/core/historical_plan_stats.py` - Frozen Pydantic v2 value object (immutable once constructed) - Provides `empty()` class method factory for first-run scenarios — returns valid stats with `sample_size=0` and all metrics at 0.0 - `as_context_dict()` method produces a compact dict that omits zero-valued metrics, suitable for ACMS context fragment injection - Exported from `domain.models.core.__init__` alongside `EstimationResult` **2. `get_completed_plans_by_action()` (repository method)** - Added to `LifecyclePlanRepository` in `infrastructure/database/repositories.py` - Filters on terminal states: `complete`, `applied`, `errored`, `cancelled` - Uses `@database_retry` decorator consistent with other repository methods - Returns plans ordered by `created_at DESC` (most recent first) - Returns empty list for `limit < 1` without hitting the database **3. `HistoricalPlanStatsService` (application service)** - Located at `src/cleveragents/application/services/historical_plan_stats_service.py` - Stateless service (thread-safe) with injected `LifecyclePlanRepository` - `get_stats_for_action(action_name, limit=50)` is the primary entry point - Extracts cost from `cost_metadata.total_cost`, duration from `timestamps.created_at` to `timestamps.applied_at` (or `execute_completed_at`), step count from `last_completed_step`, child plan count from `subplan_statuses` - Uses `statistics.mean()` and `statistics.median()` from stdlib; custom `_percentile()` helper for p90 - Gracefully handles plans with missing cost metadata, timestamps, or other data **4. `EstimationHistoricalStatsProvider` (ACMS integration)** - Located at `src/cleveragents/application/services/estimation_context_provider.py` - Implements the `ContextStrategy` protocol from `acms_service.py` - Strategy name: `estimation-historical-stats` - Creates a `ContextFragment` with `uko_node` pointing to `historical-stats://<action_name>` - Returns 0.8 confidence when an action name is configured, 0.0 otherwise - Gracefully handles: service exceptions (returns unchanged fragments), empty history (returns unchanged fragments), budget exceeded (skips fragment) - Can be registered with `ACMSPipeline.register_strategy()` or `ContextAssemblyPipeline` ### Test Coverage - **Behave**: 27 scenarios across model validation, service logic (empty/normal/edge cases), ACMS provider integration, and repository methods in `features/historical_plan_stats.feature` - **Robot**: 7 integration tests in `robot/historical_plan_stats.robot` with helper `robot/helper_historical_plan_stats.py` - **Overall coverage**: 98.6% (threshold: 97%) ### Quality Gate Results | Stage | Result | |-------|--------| | `nox -s lint` | ✅ Pass | | `nox -s typecheck` | ✅ Pass (0 errors) | | `nox -s unit_tests` | ✅ Pass (556 features, 13798 scenarios, 0 failures) | | `nox -s integration_tests` | ✅ Pass (1875 tests, 0 failures) | | `nox -s e2e_tests` | ✅ Pass (64 tests, 63 passed, 1 skipped) | | `nox -s coverage_report` | ✅ Pass (98.6%, threshold 97%) |
Author
Owner

PR #1295 reviewed, approved, and merge scheduled (will merge when CI checks pass).

Review summary: All 6 acceptance criteria met. Implementation includes HistoricalPlanStats domain model, get_completed_plans_by_action() repository method, HistoricalPlanStatsService with statistical aggregation, and EstimationHistoricalStatsProvider ACMS integration. 27 Behave scenarios + 7 Robot integration tests provide comprehensive coverage.

PR #1295 reviewed, approved, and merge scheduled (will merge when CI checks pass). **Review summary**: All 6 acceptance criteria met. Implementation includes `HistoricalPlanStats` domain model, `get_completed_plans_by_action()` repository method, `HistoricalPlanStatsService` with statistical aggregation, and `EstimationHistoricalStatsProvider` ACMS integration. 27 Behave scenarios + 7 Robot integration tests provide comprehensive coverage.
Author
Owner

PR #1295 reviewed, approved, and merged.

All 6 acceptance criteria met. Implementation includes HistoricalPlanStats domain model, HistoricalPlanStatsService application service, EstimationHistoricalStatsProvider ACMS integration, and get_completed_plans_by_action() repository method. 27 Behave scenarios + 7 Robot integration tests. Coverage at 98.6%.

PR #1295 reviewed, approved, and merged. All 6 acceptance criteria met. Implementation includes `HistoricalPlanStats` domain model, `HistoricalPlanStatsService` application service, `EstimationHistoricalStatsProvider` ACMS integration, and `get_completed_plans_by_action()` repository method. 27 Behave scenarios + 7 Robot integration tests. Coverage at 98.6%.
Author
Owner

🤖 Backlog Groomer (groomer-1): Closing — this issue is labeled State/Completed, indicating the work has been finished. Open issues with State/Completed should be closed to keep the backlog accurate.

🤖 **Backlog Groomer (groomer-1):** Closing — this issue is labeled `State/Completed`, indicating the work has been finished. Open issues with `State/Completed` should be closed to keep the backlog accurate.
Author
Owner

🤖 Backlog Groomer — Closeable Issue Detected

This issue carries the State/Completed label and all subtasks are checked off, indicating the work has been fully implemented and merged. However, the issue is still open.

Closing this issue as the work is complete.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

🤖 **Backlog Groomer — Closeable Issue Detected** This issue carries the `State/Completed` label and all subtasks are checked off, indicating the work has been fully implemented and merged. However, the issue is still open. Closing this issue as the work is complete. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Author
Owner

Closing — this issue is marked State/Completed with all subtasks and Definition of Done criteria checked off, indicating the work has been implemented and merged. Issues with State/Completed should be closed per project conventions.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

Closing — this issue is marked `State/Completed` with all subtasks and Definition of Done criteria checked off, indicating the work has been implemented and merged. Issues with `State/Completed` should be closed per project conventions. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Author
Owner

Closing this issue — it carries the State/Completed label, indicating all work has been completed. The issue should be closed to keep the backlog clean.

If this was closed prematurely, please reopen and update the state label accordingly.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

Closing this issue — it carries the `State/Completed` label, indicating all work has been completed. The issue should be closed to keep the backlog clean. If this was closed prematurely, please reopen and update the state label accordingly. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Author
Owner

⚠️ Backlog Groomer Notice: This issue is marked State/Completed but is still open and cannot be closed due to open dependencies. Please review the dependency chain and close this issue once all dependencies are resolved.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

⚠️ **Backlog Groomer Notice**: This issue is marked `State/Completed` but is still open and cannot be closed due to open dependencies. Please review the dependency chain and close this issue once all dependencies are resolved. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Author
Owner

Closing — this issue is marked State/Completed with all subtasks checked off. The work has been implemented and merged.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

Closing — this issue is marked `State/Completed` with all subtasks checked off. The work has been implemented and merged. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#652
No description provided.