test(session): add TDD failing tests for session list DI error #653
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!653
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "tdd/session-list-di-error"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
TDD bug-capture tests for bug #554 —
agents session listfails because_get_session_service()callscontainer.db()which does not exist on the DI Container class (AttributeError).Changes
features/tdd_session_list_di.feature: 3 Behave BDD scenarios tagged@tdd_bug @tdd_bug_554 @tdd_expected_failthat exercise the real DI path (no mocks)features/steps/tdd_session_list_di_steps.py: Step definitions usingCliRunneragainst the realsession_appfeatures/environment.py:_handle_tdd_expected_fail()infrastructure — tag validation + status inversion inafter_scenariohooknoxfile.py: behave-parallel exit logic changed to summary-based failure detection (compatible with TDD status inversion)robot/tdd_session_list_di.robot+robot/helper_tdd_session_list_di.py: Robot Framework integration smoke tests with self-inverting helperbenchmarks/tdd_session_list_di_bench.py: ASV benchmark baseline for session list command throughputHow It Works
The
@tdd_expected_failtag on scenarios tells the Behaveafter_scenariohook to invert the test result:Verification
nox -e typecheck— 0 errorsnox -e unit_tests— 350 features, 9702 scenarios, 37348 steps passednox -e lint— All checks passednox -e coverage_report— 99% (≥97% threshold met)Dependencies
ISSUES CLOSED: #630
brent.edwards referenced this pull request2026-03-09 21:24:15 +00:00
e91f2ac2fb8a9fb12ba8Review: TDD Bug #554 — Session List DI Error — Commit
2c8acf3cReviewer: Hamza Khyari
7 findings total: 1 P1 merge blocker, 3 P2 should-fix, 3 P3 nice-to-have.
F1 ·
P1:must-fix· Removingfailedfrom noxfile exit logic masks non-summary failuresFile:
noxfile.py:415(line 415 on master:if failed or _has_failures(total):→ changed toif _has_failures(total):)The
failedboolean fromrunner.run()captures failure signals that are not reflected in the scenario summary:before_allcrashes, import errors, hook exceptions, and cleanup failures._has_failures(total)at line 245-251 only checksfeatures.failed,features.errors,scenarios.failed, andscenarios.errors— all of which come from the summary counters.Concrete scenario: If
before_allinenvironment.pycrashes (e.g., import error), zero features/scenarios run. The summary dict is empty or has all-zero counters._has_failures({})returnsFalse. CI exits 0 — silently passing a totally broken test suite.This change also affects the parallel mode at line 401-415 where
failedis accumulated across worker processes. The comment says this is needed becauserunner.run()tracks step failures that hooks can't update, but the fix should be narrower — only suppress thefailedsignal for@tdd_expected_failscenarios, not globally.Impact: Affects ALL behave runs in CI, not just TDD tests. A broken
environment.pyor missing import would pass CI silently.Fix: Restore
failed or _has_failures(total)and instead fix the root cause: have the_handle_tdd_expected_failhook also update the runner's internal failure tracking, or filter thefailedsignal by checking if ALL failures came from@tdd_expected_failscenarios.F2 ·
P2:should-fix·Status.undefinedsteps silently masked as passedFile:
features/environment.py:167(new code in_handle_tdd_expected_fail)When a scenario fails AND has
Status.undefinedsteps (missing step definitions), this code overwritesundefinedtopassed. If a step definition file is deleted or renamed, the scenario fails, gets inverted to "passed", and the undefined steps are completely masked. This is especially dangerous for PR #654 which depends on the sharedGivenstep from this PR — if that step file is missing, all #654 scenarios will silently pass.Fix: Only mask
Status.failedandStatus.skipped. LeaveStatus.undefinedalone (or fail the scenario unconditionally when undefined steps exist):F3 ·
P2:should-fix· Bareexcept Exceptionin step doesn't validate specific error typeFile:
features/steps/tdd_session_list_di_steps.py:62-67The test is specifically for bug #554 where
container.db()raisesAttributeError. Catching bareExceptionmeans any unrelated error (e.g.,ImportError,TypeError) also gets captured and the scenario still "fails" (which the@tdd_expected_failhook inverts to "pass"). The test can't distinguish the expected bug from a completely different failure.Compare: the Robot helper
service_resolution()correctly catches onlyAttributeError.Fix: Catch
AttributeErrorspecifically, or at minimum validatecontext.service_errortype in theThenstep.F4 ·
P2:should-fix·importlib.reload(cleveragents)is ineffectiveFile:
benchmarks/tdd_session_list_di_bench.py:30importlib.reload(cleveragents)only re-executescleveragents/__init__.py. It does not reload submodules likecleveragents.cli.commands.sessionorcleveragents.domain.models.core.session— those retain whatever version was previously imported. Since the benchmark explicitly imports those submodules on lines 35-40, the reload is a no-op in practice.Fix: Remove the
importlib.reloadcall (and theimport importlib). Thesys.path.inserton line 26 is sufficient to ensure the local source tree is used.F5 ·
P3:nice-to-have· Benchmark docstring says "DI resolution overhead" but DI is mocked outFile:
benchmarks/tdd_session_list_di_bench.py:7The module docstring says "session list DI resolution overhead" but the
setup()method mockssession_mod._servicedirectly, bypassing DI entirely. The benchmark actually measures CLI/rendering throughput with a pre-injected mock service.Fix: Update docstring to "CLI/rendering throughput" or similar.
F6 ·
P3:nice-to-have· Robot dispatcher usesdict[str, object]instead of typed callableFile:
robot/helper_tdd_session_list_di.py:136This forces
cmd()on line 147 to need# type: ignore[operator]becauseobjectis not callable. Usingdict[str, Callable[[], None]]would eliminate the suppression.Fix:
F7 ·
P3:nice-to-have· Robot test doc says "resolves correctly" but test verifies failureFile:
robot/tdd_session_list_di.robot:28This test actually verifies that the service resolution fails (the helper checks for
AttributeError). The documentation is misleading.Fix: Update to something like
"Verify that _get_session_service() triggers the DI db error".2c8acf3cabb7d68ef6a8All 7 review findings addressed. Branch rebased onto current master (merge commit removed). Force-pushed.
Changes made:
_no_scenarios_ran()safety net in noxfile.py — fails CI if features were requested but zero scenarios ran (catchesbefore_allcrashes)Status.undefinedfrom step status inversion in_handle_tdd_expected_fail()— undefined steps are no longer silently maskedexcept Exceptiontoexcept AttributeErrorinstep_request_service()— now catches only the expected errorimport importlibandimportlib.reload(cleveragents)from benchmark (ineffective for submodules)dict[str, object]todict[str, Callable[[], None]]in robot helper dispatcher, removed# type: ignore[operator]Merge-gate results:
Review — PR #653 (Issue #630)
Review scope: Full review per
docs/development/review_playbook.md— lint, typecheck, security scan, manual logic review, cross-PR consistency, summary output verification.Merge-gate checks
ruff check(lint)ruff format --checkpyright(typecheck)nox -s security_scannox -s unit_tests(TDD features)nox -s integration_tests --include tdd_bugISSUES CLOSED: #630presentfcdf80f3)Findings
P2:should-fix —
failedvariable is dead code innoxfile.pyAfter replacing
if failed or _has_failures(total):withif _has_failures(total):, thefailedvariable (assigned at line 386 in sequential mode and lines 413/420 in parallel mode) is never read.worker_failedat line 415 is also only consumed byfailed. This is dead code.Suggested fix: Replace
failed, total = _run_features_inprocess(...)with_, total = ...in sequential mode, and remove thefailedaccumulation in parallel mode (keepworker_failedonly for destructuring the tuple). This makes the intent explicit and avoids confusion for future readers who might wonder whyfailedisn't checked.Everything else in PR #653 looks correct. The
_handle_tdd_expected_faillogic is sound, the_no_scenarios_ransafety net covers the right edge case, theexcept AttributeErrornarrowing is appropriate, theCallabletyping is correct, and the robot test docs accurately describe failure behavior. Well-structured implementation.P2:should-fix —
failedis assigned here but never read after the exit-logic change at line 434. This is dead code. Use_, total = _run_features_inprocess(...)to document the intentional discard.b7d68ef6a8b32a0e5d85Self-review: P2 fix applied
During self-review I identified a P2 finding: the
failedvariable innoxfile.pywas accumulated but never used for the exit decision (we use the summary-based_has_failures()check instead).Changes in this force-push (
b32a0e5d)noxfile.py— removed deadfailedvariable from both code paths:failed, total = ...→_, total = ...failed = Falseinitialization, changedworker_failed→_worker_failedin the loop destructuring, removedfailed = failed or worker_failedaccumulationMerge-gate verification (all passing)
nox -e typecheck)nox -e lint)nox -e unit_tests)nox -e security_scan)Stacked branches #654 and #655 have been rebased and force-pushed as well.
b32a0e5d8506bbe48a9cNew commits pushed, approval review dismissed automatically according to repository settings
agents session listcauses an error. #554agents session listcauses an error. #554