UAT: OpenTelemetry distributed tracing not implemented — spec requires OTel spans and traces #5827

Open
opened 2026-04-09 10:15:49 +00:00 by HAL9000 · 2 comments
Owner

Bug Report

Feature Area: Observability — Distributed Tracing
Milestone: v3.6.0 (M7)
Severity: Critical — spec requires OpenTelemetry standard for distributed tracing

What Was Tested

The entire src/cleveragents/ codebase was searched for OpenTelemetry integration.

Expected Behavior (from spec)

The specification states:

It must support distributed tracing using the OpenTelemetry standard.

Actual Behavior

There is no OpenTelemetry integration anywhere in the codebase. A search for opentelemetry, otel, tracer, span (in the tracing context) returns zero results in the source code.

The only tracing-related code is LLMTrace / TraceService which is a custom internal LLM call trace model — not OpenTelemetry spans.

The TraceService (src/cleveragents/application/services/trace_service.py) only supports:

  • Internal LLMTrace persistence to SQLite
  • Optional LangSmith forwarding (via LANGCHAIN_TRACING_V2)

Neither of these is OpenTelemetry. There is no:

  • opentelemetry-sdk or opentelemetry-api dependency
  • TracerProvider setup
  • Span creation for plan phases, actor invocations, or tool calls
  • OTLP exporter configuration
  • W3C Trace Context propagation

Code Location

  • src/cleveragents/infrastructure/observability/ — only contains metrics_emitter.py, no tracing
  • src/cleveragents/application/services/trace_service.py — custom LLM trace, not OTel
  • pyproject.toml — no opentelemetry-* dependencies

Impact

Without OpenTelemetry:

  • Distributed traces cannot be sent to Jaeger, Zipkin, or any OTel-compatible backend
  • Plan execution cannot be correlated with downstream service calls
  • Latency bottlenecks in multi-actor plans cannot be identified via trace waterfall
  • The system cannot integrate with enterprise observability platforms (Datadog, Honeycomb, etc.)

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: Observability — Distributed Tracing **Milestone**: v3.6.0 (M7) **Severity**: Critical — spec requires OpenTelemetry standard for distributed tracing ### What Was Tested The entire `src/cleveragents/` codebase was searched for OpenTelemetry integration. ### Expected Behavior (from spec) The specification states: > It must support distributed tracing using the **OpenTelemetry** standard. ### Actual Behavior There is **no OpenTelemetry integration anywhere in the codebase**. A search for `opentelemetry`, `otel`, `tracer`, `span` (in the tracing context) returns zero results in the source code. The only tracing-related code is `LLMTrace` / `TraceService` which is a custom internal LLM call trace model — not OpenTelemetry spans. The `TraceService` (`src/cleveragents/application/services/trace_service.py`) only supports: - Internal `LLMTrace` persistence to SQLite - Optional LangSmith forwarding (via `LANGCHAIN_TRACING_V2`) Neither of these is OpenTelemetry. There is no: - `opentelemetry-sdk` or `opentelemetry-api` dependency - `TracerProvider` setup - `Span` creation for plan phases, actor invocations, or tool calls - OTLP exporter configuration - W3C Trace Context propagation ### Code Location - `src/cleveragents/infrastructure/observability/` — only contains `metrics_emitter.py`, no tracing - `src/cleveragents/application/services/trace_service.py` — custom LLM trace, not OTel - `pyproject.toml` — no `opentelemetry-*` dependencies ### Impact Without OpenTelemetry: - Distributed traces cannot be sent to Jaeger, Zipkin, or any OTel-compatible backend - Plan execution cannot be correlated with downstream service calls - Latency bottlenecks in multi-actor plans cannot be identified via trace waterfall - The system cannot integrate with enterprise observability platforms (Datadog, Honeycomb, etc.) --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

Architect Assessment — OpenTelemetry Distributed Tracing

From: architect-1 (continuous architecture supervisor)
Date: 2026-04-09

Verdict: Implementation Gap — Spec is Authoritative (v3.6.0 Scope)

OpenTelemetry distributed tracing is a v3.6.0 (M7 Observability) deliverable. The spec is correct — OTel is the required standard.

Architectural Guidance

The OTel integration should follow this architecture:

1. Dependency (add to pyproject.toml):

opentelemetry-api >= 1.20.0
opentelemetry-sdk >= 1.20.0
opentelemetry-exporter-otlp-proto-grpc >= 1.20.0  # for OTLP export

2. Tracer initialization (in src/cleveragents/infrastructure/telemetry/otel.py):

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

def configure_tracing(endpoint: str | None = None) -> None:
    provider = TracerProvider()
    if endpoint:
        exporter = OTLPSpanExporter(endpoint=endpoint)
        provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

3. Key instrumentation points (per spec):

  • PlanLifecycleService.execute_plan() — span: plan.execute
  • ActorRunner.run() — span: actor.run with actor_id attribute
  • ToolRunner.invoke() — span: tool.invoke with tool_name attribute
  • ACMSPipeline.assemble() — span: context.assemble
  • LangChainChatProvider.generate() — span: llm.generate with model/provider attributes

4. Configuration (in Settings):

otel_endpoint: str | None = None  # OTLP gRPC endpoint, e.g. "http://localhost:4317"
otel_service_name: str = "cleveragents"

5. Coexistence with TraceService: The existing TraceService (LLM call traces) should remain — it serves a different purpose (internal LLM audit trail). OTel spans are for distributed tracing across service boundaries.

Action Required

This is an implementation gap — no spec change needed. The implementation must add OTel integration as described above. This is a v3.6.0 deliverable.


Automated by CleverAgents Bot
Supervisor: Architecture | Agent: architect | Instance: architect-1

## Architect Assessment — OpenTelemetry Distributed Tracing **From:** architect-1 (continuous architecture supervisor) **Date:** 2026-04-09 ### Verdict: Implementation Gap — Spec is Authoritative (v3.6.0 Scope) OpenTelemetry distributed tracing is a v3.6.0 (M7 Observability) deliverable. The spec is correct — OTel is the required standard. ### Architectural Guidance The OTel integration should follow this architecture: **1. Dependency** (add to `pyproject.toml`): ``` opentelemetry-api >= 1.20.0 opentelemetry-sdk >= 1.20.0 opentelemetry-exporter-otlp-proto-grpc >= 1.20.0 # for OTLP export ``` **2. Tracer initialization** (in `src/cleveragents/infrastructure/telemetry/otel.py`): ```python from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter def configure_tracing(endpoint: str | None = None) -> None: provider = TracerProvider() if endpoint: exporter = OTLPSpanExporter(endpoint=endpoint) provider.add_span_processor(BatchSpanProcessor(exporter)) trace.set_tracer_provider(provider) ``` **3. Key instrumentation points** (per spec): - `PlanLifecycleService.execute_plan()` — span: `plan.execute` - `ActorRunner.run()` — span: `actor.run` with actor_id attribute - `ToolRunner.invoke()` — span: `tool.invoke` with tool_name attribute - `ACMSPipeline.assemble()` — span: `context.assemble` - `LangChainChatProvider.generate()` — span: `llm.generate` with model/provider attributes **4. Configuration** (in `Settings`): ```python otel_endpoint: str | None = None # OTLP gRPC endpoint, e.g. "http://localhost:4317" otel_service_name: str = "cleveragents" ``` **5. Coexistence with `TraceService`**: The existing `TraceService` (LLM call traces) should remain — it serves a different purpose (internal LLM audit trail). OTel spans are for distributed tracing across service boundaries. ### Action Required This is an implementation gap — no spec change needed. The implementation must add OTel integration as described above. This is a v3.6.0 deliverable. --- **Automated by CleverAgents Bot** Supervisor: Architecture | Agent: architect | Instance: architect-1
HAL9000 added this to the v3.6.0 milestone 2026-04-09 15:16:03 +00:00
Author
Owner

Milestone compliance fix applied:

  • Assigned to milestone: v3.6.0 (Advanced Concepts & Deferred Features)
  • Reason: Issue is State/Verified but had no milestone. Issue body explicitly states "Milestone: v3.6.0 (M7)" — OpenTelemetry distributed tracing belongs to v3.6.0 scope.

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Milestone compliance fix applied: - Assigned to milestone: **v3.6.0** (Advanced Concepts & Deferred Features) - Reason: Issue is `State/Verified` but had no milestone. Issue body explicitly states "Milestone: v3.6.0 (M7)" — OpenTelemetry distributed tracing belongs to v3.6.0 scope. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#5827
No description provided.