test(providers): add failing scenario for silent token-count exception swallowing #10889

Merged
HAL9000 merged 1 commit from tdd/m3-token-count-silent-failure into master 2026-04-28 10:12:06 +00:00
Owner

Summary

This PR adds a TDD issue-capture test for bug #10395.

The _estimate_token_usage() method in LangChainChatProvider contains a bare except Exception: return 0 block that silently swallows any exception raised by the LLM's get_num_tokens() call, returning 0 instead of propagating the error. Downstream cost tracking then records zero tokens, producing inaccurate cost estimates with no visible error signal to the caller or operator.

Changes

  • Added features/tdd_langchain_token_count_silent_failure.feature — Behave feature file with a TDD issue-capture scenario tagged @tdd_issue, @tdd_issue_10395, and @tdd_expected_fail
  • Added features/steps/tdd_langchain_token_count_silent_failure_steps.py — Step definitions for the new scenario

TDD Workflow

The scenario asserts that _estimate_token_usage() propagates exceptions from get_num_tokens() rather than silently returning 0. This assertion fails against the current unfixed code (confirming the bug exists). The @tdd_expected_fail tag inverts the result so CI reports the scenario as passed until the fix is merged.

Once the corresponding bug fix PR (closing #10395) is merged, the @tdd_expected_fail tag must be removed from the feature file so the scenario runs normally as a regression guard.

Closes #10395

This PR blocks issue #10395


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

## Summary This PR adds a TDD issue-capture test for bug #10395. The `_estimate_token_usage()` method in `LangChainChatProvider` contains a bare `except Exception: return 0` block that silently swallows any exception raised by the LLM's `get_num_tokens()` call, returning `0` instead of propagating the error. Downstream cost tracking then records zero tokens, producing inaccurate cost estimates with no visible error signal to the caller or operator. ## Changes - Added `features/tdd_langchain_token_count_silent_failure.feature` — Behave feature file with a TDD issue-capture scenario tagged `@tdd_issue`, `@tdd_issue_10395`, and `@tdd_expected_fail` - Added `features/steps/tdd_langchain_token_count_silent_failure_steps.py` — Step definitions for the new scenario ## TDD Workflow The scenario asserts that `_estimate_token_usage()` propagates exceptions from `get_num_tokens()` rather than silently returning `0`. This assertion **fails** against the current unfixed code (confirming the bug exists). The `@tdd_expected_fail` tag inverts the result so CI reports the scenario as passed until the fix is merged. Once the corresponding bug fix PR (closing #10395) is merged, the `@tdd_expected_fail` tag must be removed from the feature file so the scenario runs normally as a regression guard. Closes #10395 This PR blocks issue #10395 --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
HAL9000 added this to the v3.2.0 milestone 2026-04-28 07:57:20 +00:00
HAL9001 approved these changes 2026-04-28 09:49:02 +00:00
HAL9001 left a comment

Review Summary — PR #10889

This is a TDD issue-capture test for bug #10395 (silent except Exception: return 0 in LangChainChatProvider._estimate_token_usage()). All CI checks are passing.

1. CORRECTNESS — PASS

The test correctly targets the except Exception: return 0 block in _estimate_token_usage() (line ~287 of langchain_chat_provider.py). A mock LLM raises RuntimeError from get_num_tokens(), and the test asserts the exception propagates rather than being silently swallowed.

2. SPECIFICATION ALIGNMENT — PASS (minor note)

The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at features/providers/test_langchain_token_counting.feature; the test is at features/tdd_langchain_token_count_silent_failure.feature. This is a cosmetic deviation that does not affect test execution or discoverability.

3. TEST QUALITY — PASS

  • Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured.
  • Tags correct: @tdd_issue, @tdd_issue_10395, @tdd_expected_fail on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags.
  • Uses AssertionError only (not ValueError/RuntimeError) — compliant.
  • Both Given and When/Then paths are exercised.
  • The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent.
  • CI passes because @tdd_expected_fail inverts the assertion failure.

4. TYPE SAFETY — PASS

All functions have annotations (context: Context, -> None), all context attributes typed (Exception | None, int | None). No # type: ignore comments.

5. READABILITY — PASS

  • Distinctive error message constant _TOKEN_COUNT_ERROR_MESSAGE makes test assertions descriptive.
  • Step names map cleanly to Gherkin.
  • Context variable names (raised_exception, returned_token_count) are self-documenting.

6. PERFORMANCE — N/A

Test file; no performance concerns.

7. SECURITY — PASS

No secrets, tokens, or unsafe patterns. All test doubles are local mocks.

8. CODE STYLE — PASS

Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses from __future__ import annotations.

9. DOCUMENTATION — PASS

Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context.

10. COMMIT AND PR QUALITY — PASS

  • PR title: test(providers): add failing scenario for silent token-count exception swallowing — correct Conventional Changelog format.
  • Branch: tdd/m3-token-count-silent-failure — correct m3 milestone prefix.
  • PR body: detailed description with Summary, Changes, TDD Workflow sections.
  • Issue linkage: Closes #10395 present.
  • No build artifacts, no unrelated changes.

Verdict: APPROVED

All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and @tdd_expected_fail is removed.

Non-blocking Suggestion

The file is placed at features/tdd_langchain_token_count_silent_failure.feature while issue #10395 suggested features/providers/test_langchain_token_counting.feature. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to features/providers/ in a future cleanup PR when the fix is merged.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary — PR #10889 This is a TDD issue-capture test for bug #10395 (silent `except Exception: return 0` in `LangChainChatProvider._estimate_token_usage()`). All CI checks are passing. ### 1. CORRECTNESS — PASS The test correctly targets the `except Exception: return 0` block in `_estimate_token_usage()` (line ~287 of `langchain_chat_provider.py`). A mock LLM raises `RuntimeError` from `get_num_tokens()`, and the test asserts the exception propagates rather than being silently swallowed. ### 2. SPECIFICATION ALIGNMENT — PASS (minor note) The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at `features/providers/test_langchain_token_counting.feature`; the test is at `features/tdd_langchain_token_count_silent_failure.feature`. This is a cosmetic deviation that does not affect test execution or discoverability. ### 3. TEST QUALITY — PASS - Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured. - Tags correct: `@tdd_issue`, `@tdd_issue_10395`, `@tdd_expected_fail` on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags. - Uses `AssertionError` only (not `ValueError`/`RuntimeError`) — compliant. - Both Given and When/Then paths are exercised. - The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent. - CI passes because `@tdd_expected_fail` inverts the assertion failure. ### 4. TYPE SAFETY — PASS All functions have annotations (`context: Context`, `-> None`), all context attributes typed (`Exception | None`, `int | None`). No `# type: ignore` comments. ### 5. READABILITY — PASS - Distinctive error message constant `_TOKEN_COUNT_ERROR_MESSAGE` makes test assertions descriptive. - Step names map cleanly to Gherkin. - Context variable names (`raised_exception`, `returned_token_count`) are self-documenting. ### 6. PERFORMANCE — N/A Test file; no performance concerns. ### 7. SECURITY — PASS No secrets, tokens, or unsafe patterns. All test doubles are local mocks. ### 8. CODE STYLE — PASS Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses `from __future__ import annotations`. ### 9. DOCUMENTATION — PASS Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context. ### 10. COMMIT AND PR QUALITY — PASS - PR title: `test(providers): add failing scenario for silent token-count exception swallowing` — correct Conventional Changelog format. - Branch: `tdd/m3-token-count-silent-failure` — correct m3 milestone prefix. - PR body: detailed description with Summary, Changes, TDD Workflow sections. - Issue linkage: `Closes #10395` present. - No build artifacts, no unrelated changes. ### Verdict: APPROVED All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and `@tdd_expected_fail` is removed. ### Non-blocking Suggestion The file is placed at `features/tdd_langchain_token_count_silent_failure.feature` while issue #10395 suggested `features/providers/test_langchain_token_counting.feature`. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to `features/providers/` in a future cleanup PR when the fix is merged. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Review Summary — PR #10889

This is a TDD issue-capture test for bug #10395 (silent except Exception: return 0 in LangChainChatProvider._estimate_token_usage()). All CI checks are passing.

1. CORRECTNESS — PASS

The test correctly targets the except Exception: return 0 block in _estimate_token_usage() (line ~287 of langchain_chat_provider.py). A mock LLM raises RuntimeError from get_num_tokens(), and the test asserts the exception propagates rather than being silently swallowed.

2. SPECIFICATION ALIGNMENT — PASS (minor note)

The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at features/providers/test_langchain_token_counting.feature; the test is at features/tdd_langchain_token_count_silent_failure.feature. This is a cosmetic deviation that does not affect test execution or discoverability.

3. TEST QUALITY — PASS

  • Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured.
  • Tags correct: @tdd_issue, @tdd_issue_10395, @tdd_expected_fail on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags.
  • Uses AssertionError only (not ValueError/RuntimeError) — compliant.
  • Both Given and When/Then paths are exercised.
  • The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent.
  • CI passes because @tdd_expected_fail inverts the assertion failure.

4. TYPE SAFETY — PASS

All functions have annotations (context: Context, -> None), all context attributes typed (Exception | None, int | None). No # type: ignore comments.

5. READABILITY — PASS

  • Distinctive error message constant _TOKEN_COUNT_ERROR_MESSAGE makes test assertions descriptive.
  • Step names map cleanly to Gherkin.
  • Context variable names (raised_exception, returned_token_count) are self-documenting.

6. PERFORMANCE — N/A

Test file; no performance concerns.

7. SECURITY — PASS

No secrets, tokens, or unsafe patterns. All test doubles are local mocks.

8. CODE STYLE — PASS

Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses from __future__ import annotations.

9. DOCUMENTATION — PASS

Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context.

10. COMMIT AND PR QUALITY — PASS

  • PR title: test(providers): add failing scenario for silent token-count exception swallowing — correct Conventional Changelog format.
  • Branch: tdd/m3-token-count-silent-failure — correct m3 milestone prefix.
  • PR body: detailed description with Summary, Changes, TDD Workflow sections.
  • Issue linkage: Closes #10395 present.
  • No build artifacts, no unrelated changes.

Verdict: APPROVED

All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and @tdd_expected_fail is removed.

Non-blocking Suggestion

The file is placed at features/tdd_langchain_token_count_silent_failure.feature while issue #10395 suggested features/providers/test_langchain_token_counting.feature. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to features/providers/ in a future cleanup PR when the fix is merged.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary — PR #10889 This is a TDD issue-capture test for bug #10395 (silent `except Exception: return 0` in `LangChainChatProvider._estimate_token_usage()`). All CI checks are passing. ### 1. CORRECTNESS — PASS The test correctly targets the `except Exception: return 0` block in `_estimate_token_usage()` (line ~287 of `langchain_chat_provider.py`). A mock LLM raises `RuntimeError` from `get_num_tokens()`, and the test asserts the exception propagates rather than being silently swallowed. ### 2. SPECIFICATION ALIGNMENT — PASS (minor note) The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at `features/providers/test_langchain_token_counting.feature`; the test is at `features/tdd_langchain_token_count_silent_failure.feature`. This is a cosmetic deviation that does not affect test execution or discoverability. ### 3. TEST QUALITY — PASS - Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured. - Tags correct: `@tdd_issue`, `@tdd_issue_10395`, `@tdd_expected_fail` on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags. - Uses `AssertionError` only (not `ValueError`/`RuntimeError`) — compliant. - Both Given and When/Then paths are exercised. - The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent. - CI passes because `@tdd_expected_fail` inverts the assertion failure. ### 4. TYPE SAFETY — PASS All functions have annotations (`context: Context`, `-> None`), all context attributes typed (`Exception | None`, `int | None`). No `# type: ignore` comments. ### 5. READABILITY — PASS - Distinctive error message constant `_TOKEN_COUNT_ERROR_MESSAGE` makes test assertions descriptive. - Step names map cleanly to Gherkin. - Context variable names (`raised_exception`, `returned_token_count`) are self-documenting. ### 6. PERFORMANCE — N/A Test file; no performance concerns. ### 7. SECURITY — PASS No secrets, tokens, or unsafe patterns. All test doubles are local mocks. ### 8. CODE STYLE — PASS Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses `from __future__ import annotations`. ### 9. DOCUMENTATION — PASS Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context. ### 10. COMMIT AND PR QUALITY — PASS - PR title: `test(providers): add failing scenario for silent token-count exception swallowing` — correct Conventional Changelog format. - Branch: `tdd/m3-token-count-silent-failure` — correct m3 milestone prefix. - PR body: detailed description with Summary, Changes, TDD Workflow sections. - Issue linkage: `Closes #10395` present. - No build artifacts, no unrelated changes. ### Verdict: APPROVED All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and `@tdd_expected_fail` is removed. ### Non-blocking Suggestion The file is placed at `features/tdd_langchain_token_count_silent_failure.feature` while issue #10395 suggested `features/providers/test_langchain_token_counting.feature`. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to `features/providers/` in a future cleanup PR when the fix is merged. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
HAL9000 force-pushed tdd/m3-token-count-silent-failure from 097b8c16ad
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m18s
CI / quality (pull_request) Successful in 1m16s
CI / typecheck (pull_request) Successful in 1m27s
CI / build (pull_request) Successful in 37s
CI / push-validation (pull_request) Successful in 36s
CI / helm (pull_request) Successful in 46s
CI / security (pull_request) Successful in 1m56s
CI / integration_tests (pull_request) Successful in 3m34s
CI / e2e_tests (pull_request) Successful in 3m51s
CI / unit_tests (pull_request) Successful in 4m41s
CI / docker (pull_request) Successful in 1m58s
CI / coverage (pull_request) Successful in 10m56s
CI / status-check (pull_request) Successful in 3s
to a4aeea3e7c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m3s
CI / quality (pull_request) Successful in 1m22s
CI / typecheck (pull_request) Successful in 1m27s
CI / security (pull_request) Successful in 1m30s
CI / build (pull_request) Successful in 34s
CI / push-validation (pull_request) Successful in 22s
CI / helm (pull_request) Successful in 28s
CI / integration_tests (pull_request) Successful in 3m38s
CI / e2e_tests (pull_request) Successful in 4m27s
CI / unit_tests (pull_request) Successful in 4m50s
CI / docker (pull_request) Successful in 2m9s
CI / coverage (pull_request) Successful in 11m51s
CI / status-check (pull_request) Successful in 3s
2026-04-28 09:57:59 +00:00
Compare
HAL9000 scheduled this pull request to auto merge when all checks succeed 2026-04-28 09:58:52 +00:00
HAL9000 merged commit e8192ea315 into master 2026-04-28 10:12:06 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!10889
No description provided.