test(providers): add failing scenario for silent token-count exception swallowing #10889

2026-04-28T07:57:20Z

HAL9000 commented

2026-04-28 07:57:20 +00:00

Summary

This PR adds a TDD issue-capture test for bug #10395.

The _estimate_token_usage() method in LangChainChatProvider contains a bare except Exception: return 0 block that silently swallows any exception raised by the LLM's get_num_tokens() call, returning 0 instead of propagating the error. Downstream cost tracking then records zero tokens, producing inaccurate cost estimates with no visible error signal to the caller or operator.

Changes

Added features/tdd_langchain_token_count_silent_failure.feature — Behave feature file with a TDD issue-capture scenario tagged @tdd_issue, @tdd_issue_10395, and @tdd_expected_fail
Added features/steps/tdd_langchain_token_count_silent_failure_steps.py — Step definitions for the new scenario

TDD Workflow

The scenario asserts that _estimate_token_usage() propagates exceptions from get_num_tokens() rather than silently returning 0. This assertion fails against the current unfixed code (confirming the bug exists). The @tdd_expected_fail tag inverts the result so CI reports the scenario as passed until the fix is merged.

Once the corresponding bug fix PR (closing #10395) is merged, the @tdd_expected_fail tag must be removed from the feature file so the scenario runs normally as a regression guard.

Closes #10395

This PR blocks issue #10395

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

## Summary This PR adds a TDD issue-capture test for bug #10395. The `_estimate_token_usage()` method in `LangChainChatProvider` contains a bare `except Exception: return 0` block that silently swallows any exception raised by the LLM's `get_num_tokens()` call, returning `0` instead of propagating the error. Downstream cost tracking then records zero tokens, producing inaccurate cost estimates with no visible error signal to the caller or operator. ## Changes - Added `features/tdd_langchain_token_count_silent_failure.feature` — Behave feature file with a TDD issue-capture scenario tagged `@tdd_issue`, `@tdd_issue_10395`, and `@tdd_expected_fail` - Added `features/steps/tdd_langchain_token_count_silent_failure_steps.py` — Step definitions for the new scenario ## TDD Workflow The scenario asserts that `_estimate_token_usage()` propagates exceptions from `get_num_tokens()` rather than silently returning `0`. This assertion **fails** against the current unfixed code (confirming the bug exists). The `@tdd_expected_fail` tag inverts the result so CI reports the scenario as passed until the fix is merged. Once the corresponding bug fix PR (closing #10395) is merged, the `@tdd_expected_fail` tag must be removed from the feature file so the scenario runs normally as a regression guard. Closes #10395 This PR blocks issue #10395 --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor

HAL9000 added this to the v3.2.0 milestone 2026-04-28 07:57:20 +00:00

HAL9000 referenced this pull request

2026-04-28 07:58:39 +00:00

TDD: Silent `except Exception: return 0` in `LangChainChatProvider._count_tokens()` causes inaccurate cost tracking #10395

HAL9001 approved these changes 2026-04-28 09:49:02 +00:00

HAL9001 left a comment

Review Summary — PR #10889

This is a TDD issue-capture test for bug #10395 (silent except Exception: return 0 in LangChainChatProvider._estimate_token_usage()). All CI checks are passing.

1. CORRECTNESS — PASS

The test correctly targets the except Exception: return 0 block in _estimate_token_usage() (line ~287 of langchain_chat_provider.py). A mock LLM raises RuntimeError from get_num_tokens(), and the test asserts the exception propagates rather than being silently swallowed.

2. SPECIFICATION ALIGNMENT — PASS (minor note)

The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at features/providers/test_langchain_token_counting.feature; the test is at features/tdd_langchain_token_count_silent_failure.feature. This is a cosmetic deviation that does not affect test execution or discoverability.

3. TEST QUALITY — PASS

Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured.
Tags correct: @tdd_issue, @tdd_issue_10395, @tdd_expected_fail on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags.
Uses AssertionError only (not ValueError/RuntimeError) — compliant.
Both Given and When/Then paths are exercised.
The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent.
CI passes because @tdd_expected_fail inverts the assertion failure.

4. TYPE SAFETY — PASS

All functions have annotations (context: Context, -> None), all context attributes typed (Exception | None, int | None). No # type: ignore comments.

5. READABILITY — PASS

Distinctive error message constant _TOKEN_COUNT_ERROR_MESSAGE makes test assertions descriptive.
Step names map cleanly to Gherkin.
Context variable names (raised_exception, returned_token_count) are self-documenting.

6. PERFORMANCE — N/A

Test file; no performance concerns.

7. SECURITY — PASS

No secrets, tokens, or unsafe patterns. All test doubles are local mocks.

8. CODE STYLE — PASS

Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses from __future__ import annotations.

9. DOCUMENTATION — PASS

Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context.

10. COMMIT AND PR QUALITY — PASS

PR title: test(providers): add failing scenario for silent token-count exception swallowing — correct Conventional Changelog format.
Branch: tdd/m3-token-count-silent-failure — correct m3 milestone prefix.
PR body: detailed description with Summary, Changes, TDD Workflow sections.
Issue linkage: Closes #10395 present.
No build artifacts, no unrelated changes.

Verdict: APPROVED

All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and @tdd_expected_fail is removed.

Non-blocking Suggestion

The file is placed at features/tdd_langchain_token_count_silent_failure.feature while issue #10395 suggested features/providers/test_langchain_token_counting.feature. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to features/providers/ in a future cleanup PR when the fix is merged.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary — PR #10889 This is a TDD issue-capture test for bug #10395 (silent `except Exception: return 0` in `LangChainChatProvider._estimate_token_usage()`). All CI checks are passing. ### 1. CORRECTNESS — PASS The test correctly targets the `except Exception: return 0` block in `_estimate_token_usage()` (line ~287 of `langchain_chat_provider.py`). A mock LLM raises `RuntimeError` from `get_num_tokens()`, and the test asserts the exception propagates rather than being silently swallowed. ### 2. SPECIFICATION ALIGNMENT — PASS (minor note) The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at `features/providers/test_langchain_token_counting.feature`; the test is at `features/tdd_langchain_token_count_silent_failure.feature`. This is a cosmetic deviation that does not affect test execution or discoverability. ### 3. TEST QUALITY — PASS - Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured. - Tags correct: `@tdd_issue`, `@tdd_issue_10395`, `@tdd_expected_fail` on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags. - Uses `AssertionError` only (not `ValueError`/`RuntimeError`) — compliant. - Both Given and When/Then paths are exercised. - The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent. - CI passes because `@tdd_expected_fail` inverts the assertion failure. ### 4. TYPE SAFETY — PASS All functions have annotations (`context: Context`, `-> None`), all context attributes typed (`Exception | None`, `int | None`). No `# type: ignore` comments. ### 5. READABILITY — PASS - Distinctive error message constant `_TOKEN_COUNT_ERROR_MESSAGE` makes test assertions descriptive. - Step names map cleanly to Gherkin. - Context variable names (`raised_exception`, `returned_token_count`) are self-documenting. ### 6. PERFORMANCE — N/A Test file; no performance concerns. ### 7. SECURITY — PASS No secrets, tokens, or unsafe patterns. All test doubles are local mocks. ### 8. CODE STYLE — PASS Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses `from __future__ import annotations`. ### 9. DOCUMENTATION — PASS Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context. ### 10. COMMIT AND PR QUALITY — PASS - PR title: `test(providers): add failing scenario for silent token-count exception swallowing` — correct Conventional Changelog format. - Branch: `tdd/m3-token-count-silent-failure` — correct m3 milestone prefix. - PR body: detailed description with Summary, Changes, TDD Workflow sections. - Issue linkage: `Closes #10395` present. - No build artifacts, no unrelated changes. ### Verdict: APPROVED All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and `@tdd_expected_fail` is removed. ### Non-blocking Suggestion The file is placed at `features/tdd_langchain_token_count_silent_failure.feature` while issue #10395 suggested `features/providers/test_langchain_token_counting.feature`. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to `features/providers/` in a future cleanup PR when the fix is merged. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 09:49:08 +00:00

Review Summary — PR #10889

This is a TDD issue-capture test for bug #10395 (silent except Exception: return 0 in LangChainChatProvider._estimate_token_usage()). All CI checks are passing.

1. CORRECTNESS — PASS

The test correctly targets the except Exception: return 0 block in _estimate_token_usage() (line ~287 of langchain_chat_provider.py). A mock LLM raises RuntimeError from get_num_tokens(), and the test asserts the exception propagates rather than being silently swallowed.

2. SPECIFICATION ALIGNMENT — PASS (minor note)

The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at features/providers/test_langchain_token_counting.feature; the test is at features/tdd_langchain_token_count_silent_failure.feature. This is a cosmetic deviation that does not affect test execution or discoverability.

3. TEST QUALITY — PASS

Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured.
Tags correct: @tdd_issue, @tdd_issue_10395, @tdd_expected_fail on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags.
Uses AssertionError only (not ValueError/RuntimeError) — compliant.
Both Given and When/Then paths are exercised.
The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent.
CI passes because @tdd_expected_fail inverts the assertion failure.

4. TYPE SAFETY — PASS

All functions have annotations (context: Context, -> None), all context attributes typed (Exception | None, int | None). No # type: ignore comments.

5. READABILITY — PASS

Distinctive error message constant _TOKEN_COUNT_ERROR_MESSAGE makes test assertions descriptive.
Step names map cleanly to Gherkin.
Context variable names (raised_exception, returned_token_count) are self-documenting.

6. PERFORMANCE — N/A

Test file; no performance concerns.

7. SECURITY — PASS

No secrets, tokens, or unsafe patterns. All test doubles are local mocks.

8. CODE STYLE — PASS

Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses from __future__ import annotations.

9. DOCUMENTATION — PASS

Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context.

10. COMMIT AND PR QUALITY — PASS

PR title: test(providers): add failing scenario for silent token-count exception swallowing — correct Conventional Changelog format.
Branch: tdd/m3-token-count-silent-failure — correct m3 milestone prefix.
PR body: detailed description with Summary, Changes, TDD Workflow sections.
Issue linkage: Closes #10395 present.
No build artifacts, no unrelated changes.

Verdict: APPROVED

All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and @tdd_expected_fail is removed.

Non-blocking Suggestion

The file is placed at features/tdd_langchain_token_count_silent_failure.feature while issue #10395 suggested features/providers/test_langchain_token_counting.feature. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to features/providers/ in a future cleanup PR when the fix is merged.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary — PR #10889 This is a TDD issue-capture test for bug #10395 (silent `except Exception: return 0` in `LangChainChatProvider._estimate_token_usage()`). All CI checks are passing. ### 1. CORRECTNESS — PASS The test correctly targets the `except Exception: return 0` block in `_estimate_token_usage()` (line ~287 of `langchain_chat_provider.py`). A mock LLM raises `RuntimeError` from `get_num_tokens()`, and the test asserts the exception propagates rather than being silently swallowed. ### 2. SPECIFICATION ALIGNMENT — PASS (minor note) The test follows the TDD issue-capture workflow per CONTRIBUTING.md. The linked issue #10395 prescribed the feature file at `features/providers/test_langchain_token_counting.feature`; the test is at `features/tdd_langchain_token_count_silent_failure.feature`. This is a cosmetic deviation that does not affect test execution or discoverability. ### 3. TEST QUALITY — PASS - Behave BDD scenario with four steps (Given/When/Then/And): present and well-structured. - Tags correct: `@tdd_issue`, `@tdd_issue_10395`, `@tdd_expected_fail` on the scenario — all three tags present per CONTRIBUTING.md > TDD Issue Test Tags. - Uses `AssertionError` only (not `ValueError`/`RuntimeError`) — compliant. - Both Given and When/Then paths are exercised. - The module-level docstring explains the bug, the purpose, and the TDD workflow — excellent. - CI passes because `@tdd_expected_fail` inverts the assertion failure. ### 4. TYPE SAFETY — PASS All functions have annotations (`context: Context`, `-> None`), all context attributes typed (`Exception | None`, `int | None`). No `# type: ignore` comments. ### 5. READABILITY — PASS - Distinctive error message constant `_TOKEN_COUNT_ERROR_MESSAGE` makes test assertions descriptive. - Step names map cleanly to Gherkin. - Context variable names (`raised_exception`, `returned_token_count`) are self-documenting. ### 6. PERFORMANCE — N/A Test file; no performance concerns. ### 7. SECURITY — PASS No secrets, tokens, or unsafe patterns. All test doubles are local mocks. ### 8. CODE STYLE — PASS Step definitions file is 115 lines (well under 500). Follows ruff conventions. Imports are top-level, uses `from __future__ import annotations`. ### 9. DOCUMENTATION — PASS Module docstring is comprehensive. Every function has a docstring explaining its purpose and TDD context. ### 10. COMMIT AND PR QUALITY — PASS - PR title: `test(providers): add failing scenario for silent token-count exception swallowing` — correct Conventional Changelog format. - Branch: `tdd/m3-token-count-silent-failure` — correct m3 milestone prefix. - PR body: detailed description with Summary, Changes, TDD Workflow sections. - Issue linkage: `Closes #10395` present. - No build artifacts, no unrelated changes. ### Verdict: APPROVED All checklist categories pass. No blocking issues found. The test is a clean, well-documented TDD issue-capture that will serve as a regression guard once the fix PR for #10395 is merged and `@tdd_expected_fail` is removed. ### Non-blocking Suggestion The file is placed at `features/tdd_langchain_token_count_silent_failure.feature` while issue #10395 suggested `features/providers/test_langchain_token_counting.feature`. For consistency with the issue spec and easier correlation between issue and test file, consider relocating the feature file to `features/providers/` in a future cleanup PR when the fix is merged. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9000 force-pushed tdd/m3-token-count-silent-failure from 097b8c16ad to a4aeea3e7c

2026-04-28 09:57:59 +00:00

Compare

HAL9000 scheduled this pull request to auto merge when all checks succeed 2026-04-28 09:58:52 +00:00

HAL9000 merged commit e8192ea315 into master

2026-04-28 10:12:06 +00:00

HAL9000 referenced this issue from a commit

2026-04-28 10:12:06 +00:00

test(providers): add failing scenario for silent token-count exception swallowing (#10889)

HAL9001 referenced this pull request

2026-04-28 10:31:36 +00:00

refactor(tests): clarify roles of behave and robot framework in test architecture #9219