UAT: OperationalMetricKey enum has 6 wrong/missing metric keys and wrong metric types — spec defines 14 distinct named metrics #6844

Open
opened 2026-04-10 02:59:59 +00:00 by HAL9000 · 0 comments
Owner

Bug Report

What Was Tested

Code-level analysis of src/cleveragents/domain/models/observability/metrics.py against the Metrics Collection table in docs/specification.md §Observability → Metrics Collection (lines ~46151–46166).

Expected Behavior (from spec)

The specification defines exactly 14 operational metrics with specific string key names and metric types:

Spec Key Spec Type
plan.duration_seconds Histogram
plan.cost_usd Counter
plan.decisions_count Counter
plan.child_plans_count Counter
actor.invocation_duration_ms Histogram
actor.token_usage Counter
tool.invocation_duration_ms Histogram
tool.error_rate Counter
context.build_duration_ms Histogram
context.tokens_used Gauge
index.query_duration_ms Histogram
sandbox.operation_duration_ms Histogram
validation.duration_seconds Histogram
validation.pass_rate Counter

Actual Behavior

The OperationalMetricKey enum in src/cleveragents/domain/models/observability/metrics.py uses different key names, wrong metric types for some, and is missing 6 spec-required metrics entirely:

Spec Key Implementation Key Status
plan.duration_seconds plan_duration_ms Wrong name + wrong unit (seconds→ms)
plan.cost_usd plan_total_cost_usd Wrong name
plan.decisions_count plan_decision_count Wrong name (count vs count)
plan.child_plans_count subplan_count Wrong name
actor.invocation_duration_ms actor_latency_ms Wrong name
actor.token_usage (missing) MISSING
tool.invocation_duration_ms tool_invocation_count Wrong name + wrong kind (duration vs count)
tool.error_rate tool_error_rate (Gauge) Spec type=Counter, implementation=Gauge
context.build_duration_ms context_build_time_ms Wrong name
context.tokens_used context_token_count Wrong name
index.query_duration_ms (missing) MISSING
sandbox.operation_duration_ms (missing) MISSING
validation.duration_seconds (missing) MISSING
validation.pass_rate (missing) MISSING

The METRIC_DEFINITIONS registry (also in metrics.py) maps to these incorrect implementation keys, propagating the naming errors into downstream metric emission.

Steps to Reproduce

from cleveragents.domain.models.observability.metrics import OperationalMetricKey

keys = {k.value for k in OperationalMetricKey}

# These spec-required keys are all absent:
assert "actor.token_usage" not in keys
assert "tool.invocation_duration_ms" not in keys
assert "index.query_duration_ms" not in keys
assert "sandbox.operation_duration_ms" not in keys
assert "validation.duration_seconds" not in keys
assert "validation.pass_rate" not in keys

Code Location

  • File: src/cleveragents/domain/models/observability/metrics.py
  • Class: OperationalMetricKey (StrEnum)
  • Class: METRIC_DEFINITIONS registry

Impact

  • Metrics emitted to Prometheus (server mode) or structured logs use wrong key names, breaking any dashboards or alerts configured against spec-defined metric names
  • 6 spec-required metric categories are not tracked at all:
    • actor.token_usage: per-provider/model token counts not emitted
    • tool.invocation_duration_ms: tool execution latency histogram not tracked
    • index.query_duration_ms: index latency not tracked
    • sandbox.operation_duration_ms: sandbox performance not tracked
    • validation.duration_seconds: validation latency not tracked
    • validation.pass_rate: validation pass/fail ratio not tracked
  • tool.error_rate uses wrong Gauge type instead of Counter, making it non-monotonic in Prometheus

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report ### What Was Tested Code-level analysis of `src/cleveragents/domain/models/observability/metrics.py` against the Metrics Collection table in `docs/specification.md` §Observability → Metrics Collection (lines ~46151–46166). ### Expected Behavior (from spec) The specification defines exactly 14 operational metrics with specific string key names and metric types: | Spec Key | Spec Type | |---|---| | `plan.duration_seconds` | Histogram | | `plan.cost_usd` | Counter | | `plan.decisions_count` | Counter | | `plan.child_plans_count` | Counter | | `actor.invocation_duration_ms` | Histogram | | `actor.token_usage` | Counter | | `tool.invocation_duration_ms` | Histogram | | `tool.error_rate` | Counter | | `context.build_duration_ms` | Histogram | | `context.tokens_used` | Gauge | | `index.query_duration_ms` | Histogram | | `sandbox.operation_duration_ms` | Histogram | | `validation.duration_seconds` | Histogram | | `validation.pass_rate` | Counter | ### Actual Behavior The `OperationalMetricKey` enum in `src/cleveragents/domain/models/observability/metrics.py` uses different key names, wrong metric types for some, and is missing 6 spec-required metrics entirely: | Spec Key | Implementation Key | Status | |---|---|---| | `plan.duration_seconds` | `plan_duration_ms` | ❌ Wrong name + wrong unit (seconds→ms) | | `plan.cost_usd` | `plan_total_cost_usd` | ❌ Wrong name | | `plan.decisions_count` | `plan_decision_count` | ❌ Wrong name (count vs count) | | `plan.child_plans_count` | `subplan_count` | ❌ Wrong name | | `actor.invocation_duration_ms` | `actor_latency_ms` | ❌ Wrong name | | `actor.token_usage` | *(missing)* | ❌ **MISSING** | | `tool.invocation_duration_ms` | `tool_invocation_count` | ❌ Wrong name + wrong kind (duration vs count) | | `tool.error_rate` | `tool_error_rate` (Gauge) | ❌ Spec type=Counter, implementation=Gauge | | `context.build_duration_ms` | `context_build_time_ms` | ❌ Wrong name | | `context.tokens_used` | `context_token_count` | ❌ Wrong name | | `index.query_duration_ms` | *(missing)* | ❌ **MISSING** | | `sandbox.operation_duration_ms` | *(missing)* | ❌ **MISSING** | | `validation.duration_seconds` | *(missing)* | ❌ **MISSING** | | `validation.pass_rate` | *(missing)* | ❌ **MISSING** | The `METRIC_DEFINITIONS` registry (also in `metrics.py`) maps to these incorrect implementation keys, propagating the naming errors into downstream metric emission. ### Steps to Reproduce ```python from cleveragents.domain.models.observability.metrics import OperationalMetricKey keys = {k.value for k in OperationalMetricKey} # These spec-required keys are all absent: assert "actor.token_usage" not in keys assert "tool.invocation_duration_ms" not in keys assert "index.query_duration_ms" not in keys assert "sandbox.operation_duration_ms" not in keys assert "validation.duration_seconds" not in keys assert "validation.pass_rate" not in keys ``` ### Code Location - **File**: `src/cleveragents/domain/models/observability/metrics.py` - **Class**: `OperationalMetricKey` (StrEnum) - **Class**: `METRIC_DEFINITIONS` registry ### Impact - Metrics emitted to Prometheus (server mode) or structured logs use wrong key names, breaking any dashboards or alerts configured against spec-defined metric names - 6 spec-required metric categories are not tracked at all: - `actor.token_usage`: per-provider/model token counts not emitted - `tool.invocation_duration_ms`: tool execution latency histogram not tracked - `index.query_duration_ms`: index latency not tracked - `sandbox.operation_duration_ms`: sandbox performance not tracked - `validation.duration_seconds`: validation latency not tracked - `validation.pass_rate`: validation pass/fail ratio not tracked - `tool.error_rate` uses wrong Gauge type instead of Counter, making it non-monotonic in Prometheus --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.6.0 milestone 2026-04-10 03:00:38 +00:00
HAL9000 self-assigned this 2026-04-10 06:07:50 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6844
No description provided.