Bug: Silent except Exception: return 0 in LangChainChatProvider token counting causes inaccurate cost tracking #10486

Open
opened 2026-04-18 10:06:56 +00:00 by HAL9000 · 2 comments
Owner

Metadata

Commit message (first line): fix(providers): propagate token-counting exceptions instead of silently returning 0
Branch: bugfix/mN-token-count-silent-failure
Blocked by: #10395 (TDD issue — must be merged first)

Background and Context

LangChainChatProvider in src/cleveragents/providers/llm/langchain_chat_provider.py contains a bare except Exception: return 0 clause in its token-counting logic (approximately line 272). When the underlying LLM client raises any exception during token estimation, the method silently swallows it and returns 0. This causes the cost tracker to record zero input or output tokens for the call, producing systematically under-reported cost estimates and violating the specification requirement that "cost tracking must be accurate."

Exact Code Evidence

File: src/cleveragents/providers/llm/langchain_chat_provider.py, approximately line 272

except Exception:
    return 0   # ← silently swallows ALL exceptions; returns 0 tokens

This pattern:

  1. Hides the root cause of token-counting failures from operators
  2. Records 0 tokens in the cost tracker, making cost estimates inaccurate
  3. Violates the project error-handling rule: "NEVER use bare except Exception: without re-raising unless you have SPECIFIC recovery logic"
  4. Violates the specification: "Cost tracking must be accurate"

Impact

  • Severity: HIGH
  • Affected module: cleveragents.providers.llm.langchain_chat_provider
  • Class: LangChainChatProvider
  • Reproducible in: Any scenario where the LLM client raises during token counting (e.g. model not loaded, API error during token estimation)
  • Consequence: Cost tracker records 0 tokens → cost estimates are wrong → budget enforcement is unreliable

Specification Violations

  1. "Cost tracking must be accurate" — returning 0 tokens on error produces incorrect cost records
  2. Error handling rule — bare except Exception: return 0 without re-raising or specific recovery logic is prohibited

Proposed Fix

Replace the silent swallow with either:

  • Re-raise the exception (let callers handle it), or
  • Log a warning and raise a specific, typed exception that callers can catch
except Exception as exc:
    raise TokenCountingError(
        f"Failed to count tokens for model {self._model_id}: {exc}"
    ) from exc

Expected Behavior

When LangChainChatProvider._count_tokens() encounters an exception during token counting, the exception should propagate to the caller rather than being silently swallowed. Cost tracking should never record a zero-token usage entry as a result of a suppressed error.

Acceptance Criteria

  • No except Exception: return 0 pattern remains in langchain_chat_provider.py
  • Token counting failures propagate as typed exceptions (e.g. TokenCountingError)
  • Exception messages do not expose API keys or credentials
  • @tdd_issue_10395 scenario passes without @tdd_expected_fail
  • All nox sessions pass: unit_tests, typecheck, coverage_report
  • Coverage remains ≥ 97%

Subtasks

  • Identify all except Exception: return 0 occurrences in langchain_chat_provider.py
  • Replace each with a typed exception raise or explicit error propagation
  • Ensure no API keys or credentials appear in exception messages (spec: "API keys must NEVER be logged or exposed in error messages")
  • Remove @tdd_expected_fail from @tdd_issue_10395 scenario
  • Verify nox -s unit_tests passes
  • Verify nox -s typecheck passes (Pyright strict)
  • Verify nox -s coverage_report ≥ 97%

Definition of Done

  • No except Exception: return 0 pattern remains in langchain_chat_provider.py
  • Token counting failures propagate as typed exceptions
  • @tdd_issue_10395 scenario passes without @tdd_expected_fail
  • All nox sessions green
  • PR closes this issue with "Closes #<this_issue_number>"

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata **Commit message (first line):** `fix(providers): propagate token-counting exceptions instead of silently returning 0` **Branch:** `bugfix/mN-token-count-silent-failure` **Blocked by:** #10395 (TDD issue — must be merged first) ## Background and Context `LangChainChatProvider` in `src/cleveragents/providers/llm/langchain_chat_provider.py` contains a bare `except Exception: return 0` clause in its token-counting logic (approximately line 272). When the underlying LLM client raises any exception during token estimation, the method silently swallows it and returns `0`. This causes the cost tracker to record zero input or output tokens for the call, producing systematically under-reported cost estimates and violating the specification requirement that "cost tracking must be accurate." ## Exact Code Evidence **File:** `src/cleveragents/providers/llm/langchain_chat_provider.py`, approximately line 272 ```python except Exception: return 0 # ← silently swallows ALL exceptions; returns 0 tokens ``` This pattern: 1. Hides the root cause of token-counting failures from operators 2. Records `0` tokens in the cost tracker, making cost estimates inaccurate 3. Violates the project error-handling rule: "NEVER use bare `except Exception:` without re-raising unless you have SPECIFIC recovery logic" 4. Violates the specification: "Cost tracking must be accurate" ## Impact - **Severity:** HIGH - **Affected module:** `cleveragents.providers.llm.langchain_chat_provider` - **Class:** `LangChainChatProvider` - **Reproducible in:** Any scenario where the LLM client raises during token counting (e.g. model not loaded, API error during token estimation) - **Consequence:** Cost tracker records 0 tokens → cost estimates are wrong → budget enforcement is unreliable ## Specification Violations 1. **"Cost tracking must be accurate"** — returning 0 tokens on error produces incorrect cost records 2. **Error handling rule** — bare `except Exception: return 0` without re-raising or specific recovery logic is prohibited ## Proposed Fix Replace the silent swallow with either: - Re-raise the exception (let callers handle it), or - Log a warning and raise a specific, typed exception that callers can catch ```python except Exception as exc: raise TokenCountingError( f"Failed to count tokens for model {self._model_id}: {exc}" ) from exc ``` ## Expected Behavior When `LangChainChatProvider._count_tokens()` encounters an exception during token counting, the exception should propagate to the caller rather than being silently swallowed. Cost tracking should never record a zero-token usage entry as a result of a suppressed error. ## Acceptance Criteria - [ ] No `except Exception: return 0` pattern remains in `langchain_chat_provider.py` - [ ] Token counting failures propagate as typed exceptions (e.g. `TokenCountingError`) - [ ] Exception messages do not expose API keys or credentials - [ ] `@tdd_issue_10395` scenario passes without `@tdd_expected_fail` - [ ] All nox sessions pass: `unit_tests`, `typecheck`, `coverage_report` - [ ] Coverage remains ≥ 97% ## Subtasks - [ ] Identify all `except Exception: return 0` occurrences in `langchain_chat_provider.py` - [ ] Replace each with a typed exception raise or explicit error propagation - [ ] Ensure no API keys or credentials appear in exception messages (spec: "API keys must NEVER be logged or exposed in error messages") - [ ] Remove `@tdd_expected_fail` from `@tdd_issue_10395` scenario - [ ] Verify `nox -s unit_tests` passes - [ ] Verify `nox -s typecheck` passes (Pyright strict) - [ ] Verify `nox -s coverage_report` ≥ 97% ## Definition of Done - [ ] No `except Exception: return 0` pattern remains in `langchain_chat_provider.py` - [ ] Token counting failures propagate as typed exceptions - [ ] `@tdd_issue_10395` scenario passes without `@tdd_expected_fail` - [ ] All nox sessions green - [ ] PR closes this issue with "Closes #<this_issue_number>" --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Author
Owner

[GROOMED] Quality Analysis Report

Issue Validity Assessment

Status: VALID AND ACTIONABLE

This is a legitimate Priority/Critical bug with clear evidence:

  • Exact location: src/cleveragents/providers/llm/langchain_chat_provider.py line ~272
  • Problem: Bare except Exception: return 0 silently swallows all token-counting exceptions
  • Impact: Cost tracker records 0 tokens on error → inaccurate cost estimates → unreliable budget enforcement
  • Specification violations:
    1. "Cost tracking must be accurate" — violated by returning 0 on error
    2. Error handling rule — bare except Exception: without re-raising is prohibited

Label Verification

All required labels present:

  • State/Unverified (id: 846) — appropriate for new issue
  • Type/Bug (id: 849) — correct classification
  • Priority/Critical (id: 858) — appropriate severity
  • MoSCoW/Must have (id: 883) — correctly applied

⚠️ Milestone Assignment

Status: NOT ASSIGNED

  • Current milestone: null
  • Recommendation: Assign to active milestone (v3.2.0 or v3.5.0) for prioritization
  • This is a critical bug affecting cost tracking accuracy and should be scheduled

🚨 CRITICAL PRIORITY FLAG

This issue requires immediate attention:

  • Priority/Critical with no State/In Progress
  • Blocks accurate cost tracking across all LLM provider calls
  • Affects budget enforcement reliability
  • Should be assigned to a developer and moved to State/In Progress immediately

📋 Acceptance Criteria Review

Well-defined acceptance criteria present:

  • No except Exception: return 0 pattern remains
  • Token counting failures propagate as typed exceptions
  • Exception messages do not expose API keys/credentials
  • @tdd_issue_10395 scenario passes without @tdd_expected_fail
  • All nox sessions pass (unit_tests, typecheck, coverage_report ≥ 97%)

🔍 Subtasks

Clear subtasks defined:

  1. Identify all except Exception: return 0 occurrences
  2. Replace with typed exception raises
  3. Ensure no credential exposure in error messages
  4. Remove @tdd_expected_fail from TDD scenario
  5. Verify all nox sessions pass

📌 Next Steps

  1. Assign to milestone (v3.2.0 or v3.5.0)
  2. Assign to developer for implementation
  3. Move to State/Verified (ready for work)
  4. Move to State/In Progress when work begins
  5. Link to blocking issue #10395 (TDD issue)

Summary

This is a well-documented, high-impact bug with clear reproduction path and proposed solution. Ready for assignment and implementation. The issue demonstrates good engineering practices with TDD scenario, acceptance criteria, and subtasks.


Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

## [GROOMED] Quality Analysis Report ### ✅ Issue Validity Assessment **Status:** VALID AND ACTIONABLE This is a legitimate Priority/Critical bug with clear evidence: - **Exact location:** `src/cleveragents/providers/llm/langchain_chat_provider.py` line ~272 - **Problem:** Bare `except Exception: return 0` silently swallows all token-counting exceptions - **Impact:** Cost tracker records 0 tokens on error → inaccurate cost estimates → unreliable budget enforcement - **Specification violations:** 1. "Cost tracking must be accurate" — violated by returning 0 on error 2. Error handling rule — bare `except Exception:` without re-raising is prohibited ### ✅ Label Verification All required labels present: - ✅ **State/Unverified** (id: 846) — appropriate for new issue - ✅ **Type/Bug** (id: 849) — correct classification - ✅ **Priority/Critical** (id: 858) — appropriate severity - ✅ **MoSCoW/Must have** (id: 883) — correctly applied ### ⚠️ Milestone Assignment **Status:** NOT ASSIGNED - Current milestone: `null` - **Recommendation:** Assign to active milestone (v3.2.0 or v3.5.0) for prioritization - This is a critical bug affecting cost tracking accuracy and should be scheduled ### 🚨 CRITICAL PRIORITY FLAG **This issue requires immediate attention:** - Priority/Critical with no State/In Progress - Blocks accurate cost tracking across all LLM provider calls - Affects budget enforcement reliability - Should be assigned to a developer and moved to State/In Progress immediately ### 📋 Acceptance Criteria Review Well-defined acceptance criteria present: - [ ] No `except Exception: return 0` pattern remains - [ ] Token counting failures propagate as typed exceptions - [ ] Exception messages do not expose API keys/credentials - [ ] `@tdd_issue_10395` scenario passes without `@tdd_expected_fail` - [ ] All nox sessions pass (unit_tests, typecheck, coverage_report ≥ 97%) ### 🔍 Subtasks Clear subtasks defined: 1. Identify all `except Exception: return 0` occurrences 2. Replace with typed exception raises 3. Ensure no credential exposure in error messages 4. Remove `@tdd_expected_fail` from TDD scenario 5. Verify all nox sessions pass ### 📌 Next Steps 1. **Assign to milestone** (v3.2.0 or v3.5.0) 2. **Assign to developer** for implementation 3. **Move to State/Verified** (ready for work) 4. **Move to State/In Progress** when work begins 5. **Link to blocking issue** #10395 (TDD issue) ### ✨ Summary This is a well-documented, high-impact bug with clear reproduction path and proposed solution. Ready for assignment and implementation. The issue demonstrates good engineering practices with TDD scenario, acceptance criteria, and subtasks. --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
Author
Owner

Label Update Status

Note: API restrictions prevent automated label updates. Manual action required:

  • Remove: State/Unverified (id: 846)
  • Add: State/Verified (id: 847)

This issue has passed all grooming validation checks and is ready for assignment and implementation.


Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

## Label Update Status **Note:** API restrictions prevent automated label updates. Manual action required: ### Recommended Label Changes - **Remove:** State/Unverified (id: 846) - **Add:** State/Verified (id: 847) This issue has passed all grooming validation checks and is ready for assignment and implementation. --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10486
No description provided.