test(cli): add failing tests for session list DI container error (#554) #596
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Blocks
#554
agents session list causes an error.
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!596
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m3-fix-session-list-error"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
TDD regression tests for
agents session listDI container wiring error._get_session_service()callscontainer.db()but theContainerclass has nodbprovider, raisingAttributeError.Changes
Tests & benchmarks (no production code changes):
features/session_list_error.feature) —@tdd_bug @tdd_bug_554 @tdd_expected_failbehave.runner.Contexttyping,scoped_sessionfor pre-populated data, all imports at module level, private API access documentedrobot/session_list_error.robot) —tdd_bug tdd_bug_554 tdd_expected_failbenchmarks/session_list_bench.py) with per-method setup/teardown isolation@tdd_expected_failinfrastructure (duplicated from PR #595 for independent buildability):features/environment.py—after_scenariohook inverts FAIL→PASS / PASS→FAIL for@tdd_expected_failscenariosrobot/tdd_expected_fail_listener.py— Robot Listener API v3 with identical semanticsnoxfile.py—--listener robot/tdd_expected_fail_listener.pyregistrationTag migration (18 existing TDD scenarios across 5 feature files):
@tdd @bug522→@tdd_bug @tdd_bug_522(5 scenarios incli_init_yes_flag.feature)@tdd @bug523→@tdd_bug @tdd_bug_523(3 scenarios inresource_type_bootstrap_fs.feature)@tdd @bug524→@tdd_bug @tdd_bug_524(3 scenarios inresource_type_bootstrap_git.feature)@tdd @bug589→@tdd_bug @tdd_bug_589(4 scenarios inproject_create_persist.feature)@tdd @bug590→@tdd_bug @tdd_bug_590(3 scenarios inproject_show_after_create.feature)Process
051e96d0, rebased ontomaster(83f2f3a0)test(cli): add failing TDD tests for session list DI container errorwithRefs: #554footernox -s lint✓ /nox -s typecheck✓ (0 errors)Refs: #554
agents session listcauses an error. #554Self-review of TDD failing-test PR for bug #554.
Test design: Tests exercise the real
_get_session_service()DI path by resetting_service = None. A file-based SQLite database is prepared with all tables, andCLEVERAGENTS_DATABASE_URLis overridden so the commands can reach the database once the fix is applied.Step pattern prefixes: All step patterns use the
session-list-errorprefix to avoid AmbiguousStep collisions with existingsession_cli_steps.py.Expected behavior: All 3 scenarios are expected to FAIL because
container.db()raisesAttributeError. The fix author should branch from this TDD branch so the tests are inherited.Quality gates:
nox -s lintPASS (0 errors),nox -s typecheckPASS (0 errors).agents session listcauses an error. #554PM Review — Day 25
Status
Context
TDD test PR for #554 (session list DI container error). Same root cause as #570 — missing
dbprovider in the DI container. Tests exercise the real_get_session_service()path.Action Item
@CoreRasurae: Please review this PR alongside #595. Same DI container root cause, sibling test suites.
Priority
Medium — blocks M3 acceptance gate. Linked to #570/#595.
Code Review —
test(cli): add failing tests for session list DI container errorCommit:
652b21f| Issue: #554 | Branch:feature/m3-fix-session-list-errorReviewed against the issue #554 acceptance criteria, the specification (
docs/specification.mdsession commands and output rendering sections), project conventions (CONTRIBUTING.md,docs/development/testing.md), and the production source code (session.py,container.py,repositories.py).The TDD intent is sound — writing expected-behavior tests before the fix — but there are two high-severity issues that need to be addressed before merge, plus several medium and low items.
Findings Summary
@wipscenarios not excluded from CI — will breakunit_testsandcoverage_reportfor all devsfix/m3-session-list-errorvsfeature/m3-fix-session-list-error)_get_session_service()entirely@unittag on Behave scenarios (project uses@unit @domain)[Tags]on Robot Framework test casessys.pathpattern (missingimportlib.reload)Anytype for context parameter instead ofContextfrombehave.runnerM1 — Branch Name Mismatch (MEDIUM)
Issue #554 metadata specifies
Branch: fix/m3-session-list-error, but this PR usesfeature/m3-fix-session-list-error. Two discrepancies:feature/instead offix/— semantically incorrect for a bug fix.m3-fix-session-list-errorvsm3-session-list-error.The issue's Definition of Done states: "The commit is pushed to the remote on the branch matching the Branch in Metadata exactly."
L1–L4 — Convention Deviations (LOW)
@unit @<domain>tags (e.g.@unit @cost). The new scenarios use@tdd @bug554 @wipwithout@unit.[Tags]per test case for filtering. Omitted here.try/exceptimport fallback forsys.path. Project convention (seeaction_cli_bench.py,actor_loading_bench.py) uses module-level_SRCandimportlib.reload(cleveragents)to guarantee the local source is loaded.context: Anyinstead ofcontext: Context(frombehave.runner) per project convention in other step files.@ -0,0 +1,92 @@"""ASV benchmarks for session list DI wiring (bug #554).M3 — Benchmark does not exercise the DI wiring bug path (MEDIUM)
The docstring says this benchmarks "session list DI wiring (bug #554)", but
_make_service()constructsPersistentSessionServicedirectly — manually creating the engine, sessionmaker, and repositories. It never calls_get_session_service()or touchescontainer.db(), which is the actual root cause of bug #554.This benchmark will pass successfully even while the bug exists, making the "bug #554" claim misleading.
Recommendation: Either:
_get_session_service()through the DI container path.@ -0,0 +7,4 @@Background:Given a session-list-error CLI runner using the real DI path@tdd @bug554 @wipH1 —
@wipscenarios will break CI (HIGH)The
@wiptag has no automatic exclusion anywhere in the test infrastructure:behave.ini— no--tagsconfignoxfile.pyunit_testssession (line ~476) — no--tags=-wipnoxfile.pycoverage_reportsession (line ~664) — only excludes@discoveryfeatures/environment.py— no@wiphandlingSince these scenarios are designed to fail until the fix is applied, pushing this commit will cause
nox -s unit_testsandnox -s coverage_reportto fail project-wide, blocking CI for all developers — not just this feature branch.Recommendation: Either:
"--tags=-wip"to theunit_testsandcoverage_reportnox sessions, or@ -0,0 +21,4 @@And the session-list-error output should contain "Sessions"@tdd @bug554 @wipScenario: Session list works with JSON output formatM4 — JSON format scenario hits a separate production bug (MEDIUM)
This scenario has no pre-populated data. After the #554 fix,
service.list()will return[], hitting the early-return path atsession.py:181-184:This path outputs plain text regardless of
--format json— thefmtparameter is never consulted on the empty-list path. Sojson.loads(result.output)will raiseJSONDecodeError, and this scenario will fail for a different reason than the DI bug.If the fix is also meant to address empty-list format handling, that should be documented. Otherwise, either:
_session_list_dict()is reached and JSON is actually emitted, or@ -0,0 +24,4 @@Scenario: Session list works with JSON output formatWhen I invoke session-list-error list with format "json"Then the session-list-error command should exit successfullyAnd the session-list-error output should be valid JSON containing "sessions"M2 — Missing output format coverage (MEDIUM)
Issue #554 acceptance criteria states: "All output formats (rich, json, yaml, plain) work correctly."
Only 2 out of 6 formats are tested:
rich)jsonMissing:
yaml,plain,table,color.Per the specification, every CLI command must work across all 6 output formats. Consider adding at minimum
yamlandplainscenarios to match the acceptance criteria.@ -0,0 +106,4 @@engine = create_engine(db_url, echo=False)factory = sessionmaker(bind=engine, expire_on_commit=False)repo = SessionRepository(session_factory=factory)msg_repo = SessionMessageRepository(session_factory=factory)H2 — Data persistence bug: session data is never committed (HIGH)
svc.create()callsSessionRepository.create()which doesflush()but notcommit()(per the repo's documented contract in ADR-007: "All mutating methods flush but do NOT commit").The subsequent
db_session = factory()creates a new SQLAlchemy session on a different connection. This new session has no pending changes, sodb_session.commit()is a no-op. The flushed data fromsvc.create()remains uncommitted and is lost whenengine.dispose()is called.The "Session list returns sessions after creation via service" scenario will fail even after the fix is applied because no session data ever reaches the database.
Fix: Ensure the same SQLAlchemy session used by the repository is committed:
Or better, obtain and commit via a
UnitOfWork, or callfactory()once and pass it through so repo and commit use the same session instance.Addressed review #1981 findings —
cfcbab5eThanks @CoreRasurae for the thorough review. Here's what was addressed:
Fixed
@wipscenarios will break CI@wip. Added a TDD workflow comment in the feature file explaining that@wiptags should be removed once the fix lands. This is a TDD branch that stays open — it won't merge to master in a failing state.sessionmakertoscoped_sessionso the service and the commit share the same underlying DB session.plainandyamlformat scenarios (with pre-populated sessions). Remaining formats (table,color) deferred to follow-up.[Tags]on Robot tests[Tags] tdd bug554 wipto both Robot test cases.sys.pathpattern in benchmark_SRC/importlib.reload(cleveragents)pattern.Anytype for context parameterAnyannotations withbehave.runner.Context.Acknowledged (no code change)
fix/vsfeature/)feature/prefix was chosen. Cannot rename without recreating the PR. Will coordinate with PM if needed.@unittagCI status:
nox -s lint✓ /nox -s typecheck✓ (0 errors, 0 warnings)Code Review —
test(cli): add failing tests for session list DI container error(Follow-up)Commit:
cfcbab5e| Issue: #554 | Branch:feature/m3-fix-session-list-errorReviewer: Aditya Chhabra | Review Date: Day 26
Reviewed against issue #554 acceptance criteria, project conventions (
CONTRIBUTING.md,docs/development/testing.md), ADR-021 (CLI and output rendering), and the specification (docs/reference/session_cli.md). This is a follow-up review after Core's review #1981 and Brent's response in commitcfcbab5e.Overall, the response to Core's review is thorough and the majority of high-severity issues have been addressed. However, there remain a few items that need attention before merge.
Findings Summary
|| ID | Severity | Category | Description |
||----|----------|----------|-------------|
|| M1 | MEDIUM | Test Coverage | Acceptance criteria requires "all output formats" — only 4 of 6 formats tested (missing
tableandcolor) ||| M2 | MEDIUM | Convention | Branch name still doesn't match issue metadata (
feature/vsfix/) — acknowledged but not resolved ||| L1 | LOW | Convention | Missing
@unittag on Behave scenarios — project convention shows@unit @<domain>pattern ||| L2 | LOW | Test Design | Robot test validates JSON format output but doesn't verify structure — only checks exit code |
Detailed Findings
M1 — Incomplete Output Format Coverage (MEDIUM)
Finding:
Issue #554 acceptance criteria (line 24) states: "All output formats (rich, json, yaml, plain) work correctly"
However, per ADR-021 and
docs/reference/output_rendering.md, the CLI supports six formats:rich,color,table,plain,json,yaml.The current test suite covers:
json(line 31-35 of feature file)plain(line 37-42)yaml(line 44-49)rich(line 14-17, 20-24)table— not testedcolor— not testedBrent's response (line 211 of PR-review-596) acknowledges this: "Remaining formats (
table,color) deferred to follow-up."Impact:
The acceptance criteria says "all output formats" but the issue description only lists 4 formats. This is ambiguous. If the acceptance criteria means "all formats defined in ADR-021," then 2 formats are untested. If it means "the 4 formats explicitly mentioned in the issue," then coverage is complete.
Recommendation:
Either:
tableandcolorformats to match ADR-021, ORtableandcolorare deferredThis should be clarified with the PM before merge.
M2 — Branch Name Mismatch Not Resolved (MEDIUM)
Finding:
Core's review finding M1 (line 184-190 of PR-review-596) noted:
Brent's response (line 221): "Deliberate — this is a test-only PR so
feature/prefix was chosen. Cannot rename without recreating the PR. Will coordinate with PM if needed."Impact:
The issue's Definition of Done (line 30) explicitly requires: "The commit is pushed to the remote on the branch matching the Branch in Metadata exactly."
This is a TDD test PR, not the fix itself. The branch name mismatch may be acceptable if the PM agrees, but the Definition of Done as written is not satisfied.
Recommendation:
This should be explicitly acknowledged and approved by the PM (@jeffrey) before merge. If the branch naming convention for TDD test PRs differs from the convention for fix PRs, this should be documented somewhere (perhaps in
CONTRIBUTING.mdor the issue template).L1 — Missing
@unitTag (LOW)Finding:
Brent's response (line 222-223): "Deferred — project convention is inconsistent across step files. Will address in a tagging cleanup pass."
However, checking existing feature files (e.g.,
features/cost_controls.featurelines 8, 14, 21, 27), the@unit @<domain>pattern is widely used for Behave scenarios.The new scenarios use
@tdd @bug554 @wipbut omit@unit.Impact:
Low — the scenarios will run in CI once
@wipis removed, but they won't be included in filtered@unitruns. This may cause confusion during selective test runs.Recommendation:
Add
@unitalongside the existing tags, e.g.:This aligns with established project convention and makes the scenarios discoverable via
@unitfiltering.L2 — Robot Test JSON Validation Incomplete (LOW)
Finding:
robot/session_list_error.robotline 30-42 testssession list --format json, but only verifies exit code 0. It does not validate that the output is actually valid JSON.Compare this to the Behave scenario (line 159-167 of
session_list_error_steps.py), which parses the JSON and checks for the expected"sessions"key.Impact:
Low — the Robot test is a smoke test and may not require deep validation. However, the test name "Session List JSON Format Returns Valid JSON" (line 30) promises JSON validation that isn't delivered.
Recommendation:
Either:
Positive Observations
behave.runner.Contexttyping (L4 fixed)scoped_sessionto ensure commit visibility (H2 fixed)[Tags](L2 fixed)_SRC/importlib.reloadpattern (L3 fixed)Summary
This is a well-constructed TDD test suite that faithfully exercises the broken DI path and establishes correct expected behavior. The response to Core's review is comprehensive.
Blocking items:
Non-blocking items:
Recommendation: Request PM clarification on M1 and M2, then approve for merge once those are resolved.
CI status per PR description:
nox -s lint✓ /nox -s typecheck✓Code Review —
test(cli): add failing tests for session list DI container error(Round 3)Commit:
26c1f4dc| Issue: #554 | Branch:feature/m3-fix-session-list-errorReviewer: Rui Hu (@hurui200320) | Review Date: Day 26
Reviewed against issue #554 acceptance criteria, project conventions (
CONTRIBUTING.md), specification (docs/specification.mdsession commands and output rendering sections), ADR-021, and the production source code (session.py,container.py). This is a third-round review following Core's review #1981, Brent's response incfcbab5e, and Aditya's follow-up.The TDD intent is sound — writing expected-behavior tests before the fix — and the test design correctly exercises the real
_get_session_service()DI path. However, there are several critical process and structural violations that block merge.Findings Summary
ISSUES CLOSED: #554in PR body but bug is not fixed@wipscenarios not excluded from CI — will breakunit_testsfor all devsType/Testingis not a recognized label per CONTRIBUTING.mdtableandcolor)# type: ignore[attr-defined]in benchmark file@unittag on Behave scenarios_make_service()helperDetailed Findings
C1 — Merge Commit in Branch History (CRITICAL)
Finding:
Commit
26c1f4dcisMerge branch 'master-latest' into feature/m3-fix-session-list-error. CONTRIBUTING.md explicitly states:Recommendation:
Interactive rebase to remove the merge commit. Squash the two feature commits first (see C2), then rebase onto current
master(09d92ac6).C2 — Multiple Commits for Same Issue (CRITICAL)
Finding:
The branch has two commits both addressing #554:
652b21f1—test(cli): add failing tests for session list DI container errorcfcbab5e—fix(test): address CoreRasurae review #1981 findings on session list error testsCONTRIBUTING.md states: "Every commit must completely implement an issue and close it. At no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch."
The second commit is explicitly a fix-up of the first.
Recommendation:
Squash
652b21f1andcfcbab5einto a single commit.C3 —
ISSUES CLOSED: #554Is Incorrect (CRITICAL)Finding:
The PR body contains
ISSUES CLOSED: #554, but this PR only adds failing tests. None of the 4 acceptance criteria are met:agents session liststill errors afteragents initThe actual bug (missing
dbprovider in the DI container) is not fixed by this PR.Recommendation:
Change
ISSUES CLOSED: #554toRefs: #554in both the commit message footer and the PR body. This PR references the issue but does not close it.C4 —
@wipScenarios Will Break CI (CRITICAL)Finding:
The noxfile's
unit_testssession invokes Behave without any@wipexclusion:There is no
--tags=-wipargument.behave.inialso has no tag filtering. This means all 5@wipscenarios infeatures/session_list_error.featurewill execute and fail duringnox -s unit_tests, breaking CI for every developer.The PR description mentions
nox -s lint✓ andnox -s typecheck✓ but notably does not mentionnox -s unit_testspassing.Recommendation:
Either:
--tags=-wipto the Behave invocation innoxfile.py(preferred — establishes a project-wide convention for TDD work), orH1 — Branch Name Mismatch (HIGH)
Finding:
Issue #554 metadata specifies
Branch: fix/m3-session-list-errorbut this PR usesfeature/m3-fix-session-list-error. Two discrepancies:feature/instead offix/m3-fix-session-list-errorvsm3-session-list-errorThe Definition of Done states: "The commit is pushed to the remote on the branch matching the Branch in Metadata exactly."
This was raised in Core's review (M1) and Aditya's follow-up (M2). Brent acknowledged but did not resolve.
Recommendation:
Either recreate the branch with the correct name, or (if this TDD work is split into a separate issue) update the new issue's metadata to match this branch name.
H2 — Commit Message Doesn't Match Issue Metadata (HIGH)
Finding:
Issue #554 specifies
Commit Message: fix(cli): handle missing database in session list commandbut the commit usestest(cli): add failing tests for session list DI container error.The Definition of Done requires: "the first line of the commit message matches the Commit Message in Metadata exactly."
Recommendation:
Same resolution path as H1 — if this TDD work is split into a separate issue, that issue should carry the appropriate commit message. If kept under #554, the commit message must match exactly.
H3 —
Type/TestingIs Not a Recognized Label (HIGH)Finding:
The PR carries
Type/Testing, but CONTRIBUTING.md only defines:Type/Bug,Type/Feature,Type/Task,Type/Epic,Type/Legendary.Recommendation:
Replace with
Type/Bug(matching the issue) orType/Task(if treating test-only work as technical work).H4 — PR Is Not Mergeable (HIGH)
Finding:
Forgejo reports
mergeable: false. Master has advanced to09d92ac6while the branch's merge base isd990fc1b.Recommendation:
Rebase onto current master (addresses both this and C1).
M1 — Incomplete Output Format Coverage (MEDIUM)
Finding:
The specification (ADR-021) and
OutputFormatenum define 6 formats:rich,color,table,plain,json,yaml. The test suite covers 4 (missingtableandcolor).Issue #554 acceptance criteria says "All output formats (rich, json, yaml, plain)" — which lists only 4. This is ambiguous.
Recommendation:
Clarify with PM whether "all output formats" means the 4 listed or all 6 per ADR-021. Either add the missing 2 scenarios or update the acceptance criteria to explicitly scope to 4.
M2 — Unnecessary
# type: ignore[attr-defined](MEDIUM)Finding:
benchmarks/session_list_error_bench.pyline 100:CONTRIBUTING.md prohibits
# type: ignore. Moreover,benchmarks/is not included in the Pyright type checking scope — bothpyrightconfig.jsonandpyproject.tomlsetinclude: ["src"]only. This suppression comment is inert and serves no purpose. Having it present sets a bad precedent and may mislead future contributors.Recommendation:
Remove the
# type: ignore[attr-defined]comment entirely. Since the file is outside the type checking scope, it has no effect.M3 — Robot Test Name Misleading (MEDIUM)
Finding:
robot/session_list_error.robotline 30: test case "Session List JSON Format Returns Valid JSON" only checks exit code 0. It does not parse or validate JSON output. Compare to the Behave step (session_list_error_steps.py:159-167) which properly parses JSON and checks for the"sessions"key.Recommendation:
Either add JSON validation to the Robot test or rename to "Session List JSON Format Does Not Error".
M4 — Missing
@unitTag (MEDIUM)Finding:
Scenarios use
@tdd @bug554 @wipbut omit@unit. Many existing feature files (e.g.,repositories_coverage.feature,cost_controls.feature,database_models_new_coverage.feature) use the@unit @<domain>pattern.Recommendation:
Add
@unitalongside existing tags:@unit @tdd @bug554 @wip.L1 — Environment Variable Pollution Risk (LOW)
Finding:
features/steps/session_list_error_steps.pyline 60:_setup_real_di_pathsetsCLEVERAGENTS_DATABASE_URLat the process level. In parallel test execution (Behave with--parallel-processes), this could affect other tests running concurrently.Recommendation:
Consider using a more isolated approach if parallel Behave execution is enabled. The current
context.add_cleanup()is good for sequential runs.L2 — Engine Not Disposed in Benchmark Helper (LOW)
Finding:
benchmarks/session_list_error_bench.pylines 46-54:_make_service()creates a new SQLAlchemy engine per call but never disposes it. In timed benchmarks (time_list_empty,time_list_after_create), this leaks file handles across iterations.Recommendation:
Either dispose the engine after use or restructure to reuse a single engine per benchmark setup.
Positive Observations
_get_session_service()DI path by resetting_service = Nonecontext.add_cleanup()for teardownsession-list-error) correctly avoidsAmbiguousStepcollisionsscoped_sessionfor commit visibility (H2 from Core's review fixed)[Tags](L2 from Core's review fixed)_SRC/importlib.reloadpattern (L3 from Core's review fixed)behave.runner.Contexttyping (L4 from Core's review fixed)Summary
This is a well-designed TDD test suite that faithfully exercises the broken DI path. The response to Core's review was thorough and most prior findings have been addressed. However, the 4 critical items (C1–C4) are blocking:
@wipscenarios will break CI for the entire teamRecommendation: Resolve C1–C4 and H1–H3 before re-requesting review. The H1/H2 issues may be best resolved by splitting TDD test work into a separate issue from the fix itself, with its own branch name and commit message metadata.
agents session listcauses an error.26c1f4dcc2252cf1293cSquashed into single commit
252cf129, rebased ontomaster— all review findings addressed.Disposition of findings
hurui200320's review (#2009)
ISSUES CLOSED: #554→Refs: #554in both PR body and commit footer.@wipCI exclusion is handled by PR #566 (behave.iniaddstags = ~@wip; noxfile adds--exclude wipfor Robot). Both PRs are in the merge queue.Type/Testinglabel kept — accurately describes test-only PR.master— PR is now mergeable.table/colordeferred.# type: ignorefrom benchmark (outside Pyright scope).@unitto all 5 scenarios.setup(), disposed inteardown().Aditya's review (comment #55067)
table/colordeferred to follow-up.@unittag to all scenarios.Additional
_serviceviacontext.sle_original_serviceVerification
nox -s lint— passed (All checks passed!)nox -s typecheck— passed (0 errors, 0 warnings, 0 informations)PM Note (Day 26 — M3 PR Triage):
TDD test suite for session list DI container error. All review findings addressed across ~4 rounds. Latest review stale after squash. Has a merge conflict.
@brent.edwards — Please rebase onto
master. Needs fresh approval after rebase.Note: PM clarification on table/color output format coverage (4/6 formats tested) — this is acceptable for v3.2.0 scope. The remaining formats can be covered in a follow-up if needed.
PR #596 Unified Review —
test(cli): add failing tests for session list DI container error (#554)Reviewer: hamza-2 | Commit:
252cf129| Author: brent.edwardsPrior reviews: CoreRasurae (#1981 REQUEST_CHANGES), hurui200320 (#2009 REQUEST_CHANGES)
P1 — Must Fix
fix(cli): handle missing database in session list commandbut this PR is test-only (TDD). Thefix(cli)type signals a production fix and will generate a misleading CHANGELOG entry — readers will think bug #554 is fixed. The PR title itself correctly saystest(cli). Change commit subject totest(cli): add failing TDD tests for session list DI container error. Reserve the prescribedfix(cli)message for the actual fix PR.23803f14; current master isfde86186. Needs rebase. (hurui200320 H4 re-regression.)session listworks without error afteragents init". The Robot file tests this (line 14), but the Behave feature file — the canonical spec artifact — has no scenario that runsagents initfirst. Add a scenario exercising this path.richformat not explicitly tested. AC says "All output formats (rich, json, yaml, plain) work correctly." The feature file tests json, plain, yaml explicitly and exercises rich only implicitly via "default format". No rich-specific assertion (e.g., table markup) exists. The claim "4/6 formats tested" is actually 3 explicit + 1 implicit. Add an explicit--format richscenario or at minimum assert rich-specific output markers.time_list_after_createaccumulates state across ASV iterations.setup()/teardown()run once per suite, not pertime_*call. Each iteration inserts another row, making later iterations progressively slower.track_list_after_create_countwill also return 2, 3, ... instead of 1. Fix: use per-method setup (setup_time_list_after_create), or setnumber = 1+repeat = 1.P2 — Should Fix
importlib.reload(cleveragents)in benchmark is fragile. (session_list_error_bench.py:23-25). Reloading the top-level package at import time can break module identity for other benchmark suites running in the same process. Thesys.path.insertalready ensures correct resolution. Remove the reload or add a defensive comment explaining why it's necessary.session_list_error_steps.py:68-72)._cleanup_sle()restores_serviceand pops the env var but does not callreset_container(). Ifget_container()was triggered during the scenario, the singleton persists with the test DB URL, polluting subsequent scenarios.Should Not Contain ${list.stderr} AttributeError(line 25), but the Behave feature has zero assertions about stderr or error messages. Add at minimum a@thenstep checking noAttributeErrorin stderr. Ideally, add a characterization test proving the bug exists before the fix.@unittag is misleading. All 5 scenarios are tagged@unitbut use "the real DI path" with actual SQLite databases — these are integration-level tests. Use@integrationor@regressioninstead.benchmarks/session_list_bench.py; actual file isbenchmarks/session_list_error_bench.py. Rename to match.step_pre_populate_session. (session_list_error_steps.py:106-115). Ifsvc.create()throws,engine.dispose()is never called (no try/finally). Wrap in try/finally.P3 — Nit / Improvement
context.sle_list_result = None(line 65) — never read anywhere. Remove.Scenario OutlinewithExamplestable.session-list-errorprefix in step patterns is verbose. Consider tag-based scoping.@wipon all scenarios — acceptable for TDD workflow but needs explicit tracking. Add a subtask on #554: "Remove @wip tags after fix lands."Prior Review Verification
Refs: #554notISSUES CLOSED# type: ignorein benchmark@uniton all 5 scenariosSummary: 5 P1 + 6 P2 + 4 P3 = 15 findings. F1 (commit type) and F2 (stale rebase) are process-blocking. F3-F5 are correctness/coverage gaps that should be fixed before merge.
Unified TDD Test Toggle Convention
After auditing all 7 PRs that add failing BDD tests (#566, #567, #568, #591, #593, #594, #596), we are standardizing on a single
@wiptag mechanism.The standard
.feature).robot)@wiptag above the Scenario[Tags] wipto the test case@wiptagwipfrom[Tags]behave features/foo.feature:42(by line number)robot --include wip robot/foo.robotThe infrastructure (
tags = ~@wipinbehave.ini+--exclude wipinnoxfile.py) is established by PR #566. This PR must be merged after #566.Status of this PR
Behave: Correct. All 5 scenarios are tagged
@wip(the bug is not fixed in this PR).Robot: Correct. Both test cases have
[Tags] ... wip.Dependency: This PR does NOT modify
behave.iniornoxfile.py. It relies entirely on PR #566 being merged first. If #566 is not merged, the@wiptags are inert and all 5 Behave scenarios + 2 Robot tests will fail in CI.Required action
Either:
behave.ini+noxfile.pychanges from #566 to this PR to make it self-contained.Fixes: review findings, CI failures, and master merge
Root cause of CI failures
Both unit test (5 Behave failures) and integration test (2 Robot failures) were caused by the same issue: the branch was missing the
@wipexclusion infrastructure from PR #566.behave.inilackedtags = ~@wip, so all 7@wip-tagged Behave scenarios ran and hit the realcontainer.db()bugnoxfile.pylacked--exclude wipin the pabot args, so both[Tags] wipRobot tests ran and hit the sameAttributeError: 'DynamicContainer' object has no attribute 'db'Fix: Merged latest
master(5348e3dd), which brings in the@wipexclusion from PR #566. The 7 Behave scenarios are now properly skipped (1 feature skipped) and the 2 Robot tests are excluded.Review findings addressed (12 of 15 from #55540)
Commit
9f94a0ea:test(cli):prefix (notfix(cli):)--format richscenario with rich-specific assertiontime_list_after_createandtrack_list_after_create_countnow use ASV per-methodsetup_<name>/teardown_<name>hooks with isolated engines, so each iteration gets a fresh DBimportlib.reload(cleveragents)from benchmarkreset_container()call to_cleanup_sle()so the DI singleton doesn't leak between scenarios"should not contain 'AttributeError'"and"should not contain 'INTERNAL'"assertions to init-then-list and rich format scenarios@unittag to@regressionon all scenariosbenchmarks/session_list_error_bench.py→benchmarks/session_list_bench.py(matches ticket spec)step_pre_populate_sessionengine usage in try/finally soengine.dispose()always runscontext.sle_list_result = NoneassignmentNot addressed (cosmetic/process): F13 (Scenario Outline refactor), F14 (step prefix style), F15 (
@wiptracking subtask on #554).Nox verification (all pass)
unit_tests@wip)typechecklintcoverage_reportsecurity_scanCode Review Report — PR #596 / Commit
9f94a0eaReviewed commit:
9f94a0ea— test(cli): address review findings for session list TDD tests (#554)Branch:
feature/m3-fix-session-list-errorIssue: #554 —
agents session listcauses an errorReviewed against:
docs/specification.md(session list spec, output format spec), issue #554 acceptance criteriaReview scope: Test coverage, test flaws, bug detection, performance, security, spec compliance
Summary
The commit addresses 12 of 15 prior review findings (F1–F12) and demonstrates good engineering discipline:
try/finallyfor resource cleanup, DI container reset in teardown, dead code removal, tag corrections, and benchmark isolation via per-method setup/teardown. However, 4 rounds of iterative analysis uncovered 16 findings across 5 categories, including 1 critical, 2 high, 5 medium, and 8 low-severity items.The most critical issue is that all 7 Behave scenarios carry
@wiptags whilebehave.inihastags = ~@wip— these tests are excluded from CI and provide zero regression coverage. A secondary high-severity finding reveals a production bug in the empty-list code path that bypasses--format, which the current test suite cannot detect because empty-list scenarios only exercise the default format.CRITICAL — P1
F1:
@wiptags exclude all Behave scenarios from CICategory: Test Flaw | Files:
features/session_list_error.feature:11,18,25,32,40,47,54All 7 scenarios are tagged
@wip. The project'sbehave.ini(line 8) containstags = ~@wip, which globally excludes@wipscenarios from every Behave run, including CI. These TDD regression tests will never execute in CI, providing zero automated coverage for bug #554.The feature file header (line 1–2) says "expected to fail until the DI container fix lands. Once the fix is applied, remove the @wip tags". Since this is a TDD branch, the
@wiptags are intentional for the pre-fix phase. However, the PR description and commit message do not address a plan for removing them. If this PR is merged before the fix commit, or if the fix commit forgets to remove@wip, the tests become permanent dead code.Recommendation: Either (a) add a
@wipremoval commit to this branch alongside the fix commit before merge, or (b) document explicitly in the PR that the fix commit (separate PR) MUST remove@wip, and track this with a subtask on issue #554. Consider adding a CI lint rule that warns on@wipscenarios merged tomaster.HIGH — P2
F2: Empty session list bypasses
--formatparameter (untested production bug)Category: Bug Detection / Test Coverage Gap | Files:
src/cleveragents/cli/commands/session.py:181-184When no sessions exist,
list_sessions()unconditionally prints via Rich console:This code path executes regardless of
--format. Runningagents session list --format jsonwith an empty database returns Rich-marked text with ANSI color codes — not valid JSON. Same for--format yamland--format plain.The Behave test suite only tests the empty-list path with the default format (scenarios at lines 12 and 19), so this bug is invisible to the current tests.
Recommendation: Add Behave scenarios for
session list --format json(and yaml/plain) when no sessions exist. These should assert valid JSON/YAML output or at minimum assert the absence of ANSI escape codes. The production fix should route the empty case throughformat_output()just like the non-empty case.F3: No empty-list scenarios for non-default output formats
Category: Test Coverage Gap | Files:
features/session_list_error.featureIssue #554 AC #4 states "All output formats (rich, json, yaml, plain) work correctly." The current scenarios test the empty list only with the default (rich) format. All format-specific scenarios (rich, json, plain, yaml) use the
Given a session-list-error service with a pre-populated sessionstep, meaning they only test the non-empty code path.This gap directly enables F2 above to go undetected.
Recommendation: Add at least one empty-list scenario per structured format (json, yaml) to validate that the empty case produces valid structured output.
MEDIUM — P3
F4:
Namecolumn missing from session list table vs specCategory: Spec Compliance | Files:
src/cleveragents/cli/commands/session.py:194-197,docs/specification.md:1610The specification shows 5 columns in the session list table: ID, Name, Actor, Messages, Updated. The implementation only renders 4 columns: ID, Actor, Messages, Updated — the
Namecolumn is absent. The Behave tests do not assert on table structure, so this discrepancy is untested. While this is a production issue (not introduced by this commit), the TDD tests should be written to catch it.F5: Summary panel incomplete vs spec
Category: Spec Compliance | Files:
src/cleveragents/cli/commands/session.py:210-214,docs/specification.md:1616-1622The specification shows a summary with: Total, Most Recent, Oldest, Total Messages, Storage. The implementation only renders Total Sessions and Total Messages. The
_session_list_dict()function at line 107-110 returns{"sessions": items, "total": N}but the spec's JSON envelope expectsdata.sessionsanddata.summarywith 5 fields. The tests do not validate summary completeness.F6: JSON/YAML assertions are structurally shallow
Category: Test Flaw | Files:
features/steps/session_list_error_steps.py:178-198The JSON step (
step_output_json_key) only asserts that a top-level key"sessions"exists in the parsed output. It does not validate:id,actor,messages,updated)command,status,exit_code,data,timing,messages)data.totalordata.summaryis presentThe YAML step (
step_output_yaml_key) similarly only checks for the key's presence. This means malformed output (e.g.,{"sessions": "not a list"}) would pass both assertions.Recommendation: Add at minimum a structural assertion that
data["sessions"]is a list and that each element contains the expected keys.F7: Missing
tableandcolorformat test coverageCategory: Test Coverage Gap | Files:
features/session_list_error.featureThe specification defines 6 output formats:
rich,color,table,plain,json,yaml. Issue #554 AC lists 4 (rich, json, yaml, plain). The Behave test covers 5 (default/rich, explicit rich, json, plain, yaml). Thetableandcolorformats are untested. Sinceformat_output()handles both (tablemaps to_format_table,colormaps to_format_plain), they should have minimal coverage.F8:
os.environmanipulation is not thread-safeCategory: Security / Reliability | Files:
features/steps/session_list_error_steps.py:61,72os.environ["CLEVERAGENTS_DATABASE_URL"] = db_urlmodifies the process-global environment. If Behave scenarios run with a parallel runner (e.g.,--parallel), concurrent scenarios could overwrite each other's database URLs, causing flaky test failures or data corruption. The cleanup at line 72 (os.environ.pop(...)) is also race-prone.Recommendation: If parallel test execution is ever planned, switch to patching the environment via
unittest.mock.patch.dict(os.environ, ...)scoped to each scenario, or inject the database URL through the DI container override rather than the environment.F9: Robot test asserts on
stderrbut error may appear instdoutCategory: Test Flaw | Files:
robot/session_list_error.robot:25The assertion
Should Not Contain ${list.stderr} AttributeErrorchecks only stderr. However, Typer's exception wrapping (the"Wrapping unexpected exception"message from bug #554) may be written to stdout by the CLI's error handler. The test should also assertShould Not Contain ${list.stdout} AttributeError. Additionally, the Robot test does not verify thatstdoutactually contains "No sessions found" in the success case.LOW — P4
F10: Duplicate scenarios with marginal value difference
Category: Test Flaw | Files:
features/session_list_error.feature:11-16,18-23Scenario "Session list returns empty list when no sessions exist" (line 12) and "Session list after init does not raise DI error" (line 19) share the same background, the same
Whenstep, and 2 of 3Thenassertions. The first addsshould contain "No sessions found"andshould not contain "AttributeError"; the second addsshould not contain "AttributeError"andshould not contain "INTERNAL". These could be merged into a single scenario with all 3 distinct assertions, reducing redundant test execution.F11: Substring assertion
"Sessions"is fragileCategory: Test Flaw | Files:
features/session_list_error.feature:30,37,52The assertion
the session-list-error output should contain "Sessions"is a substring match. It matches the table title "Sessions (1 total)", but could also false-match on "No sessions found" (note: capitalization differs, so this specific case is safe, but the pattern is fragile for future changes). Consider asserting on a more specific string like"Sessions ("or on the actual session ID.F12: Benchmark recreates service on every
time_list_emptyiterationCategory: Performance | Files:
benchmarks/session_list_bench.py:124-127time_list_emptycalls_make_service()every iteration, which constructs newSessionRepositoryandSessionMessageRepositoryobjects. This measures service construction + list cost combined, adding noise to the list-only measurement. Consider creating the service once in asetup_time_list_emptymethod.F13: File-based SQLite in benchmarks adds I/O variance
Category: Performance | Files:
benchmarks/session_list_bench.py:54The shared engine uses
sqlite:///{path}(file-based) rather thansqlite://(in-memory). File I/O introduces disk-dependent variance in benchmark measurements. For a pure service-layer performance baseline, in-memory SQLite would be more stable.F14: YAML/JSON tests don't validate against spec envelope structure
Category: Test Coverage Gap | Files:
features/steps/session_list_error_steps.py:178-198Closely related to F6. The spec (lines 1650-1682) shows JSON output wrapped in an envelope:
{command, status, exit_code, data: {sessions, summary}, timing, messages}. Theformat_output()function does NOT add this envelope — it serializes the raw dict. So the actual JSON output is{"sessions": [...], "total": N}, which matches neither the spec nor what the tests validate. This is a systemic spec compliance gap.F15: Dynamic attribute assignment for ASV
unitCategory: Test Flaw (minor) | Files:
benchmarks/session_list_bench.py:147SessionListDISuite.track_list_after_create_count.unit = "sessions"sets a dynamic attribute on a method object. This works for ASV but causes type-checker warnings. Consider using ASV's class-levelunitattribute or a type-safe annotation pattern.F16: Benchmark explicitly bypasses the DI container path
Category: Test Coverage Gap | Files:
benchmarks/session_list_bench.py:1-8,44-45The benchmark docstring acknowledges it does "not exercise
_get_session_service()or the DI container wiring." This means the benchmark does not measure the actual code path that triggers bug #554. While the Behave and Robot tests cover the DI path, the benchmark provides no performance baseline for the DI resolution itself.Findings Summary
@wiptags exclude all 7 Behave scenarios from CI--format(untested production bug)Namecolumn vs spectableandcolorformat coverageos.environmanipulation is not thread-safeunitannotationVerdict: Requesting changes primarily for F1 (critical — tests don't run in CI) and F2/F3 (high — untested production bug in the empty-list format path). The medium-severity items (F4–F9) should be tracked as follow-ups. Low-severity items (F10–F16) are advisory.
Reviewed by: automated code review agent — 4 iterative analysis rounds against commit
9f94a0ea, issue #554, anddocs/specification.md.Code Review Report v2 — PR #596 / Commit
9f94a0ea(SUPERSEDES previous review)Reviewed commit:
9f94a0ea— test(cli): address review findings for session list TDD tests (#554)Branch:
feature/m3-fix-session-list-error(3 commits:252cf129initial TDD tests,5348e3ddmerge,9f94a0eareview fixes)Issue: #554 —
agents session listcauses an errorReviewed against:
docs/specification.md, issue #554 acceptance criteria, production source codeReview scope: Test coverage, test flaws, bug detection, performance, security, spec compliance
Rounds: 8 iterative analysis passes
Summary
The commit addresses 12 of 15 prior review findings (F1-F12) and demonstrates good engineering discipline:
try/finallyfor resource cleanup, DI container reset in teardown, dead code removal, tag corrections, and benchmark isolation via per-method setup/teardown. However, 8 rounds of iterative analysis uncovered 21 findings across 6 categories, including 1 critical, 2 high, 7 medium, and 11 low-severity items.The most critical issue is that all 7 Behave scenarios AND both Robot test cases carry
@wip/wiptags, excluded from CI by bothbehave.ini(tags = ~@wip) andnoxfile.py(--exclude wip). This means zero tests from this PR run in CI. A secondary high-severity finding reveals an untested production bug in the empty-list code path.CRITICAL — P1
F1:
@wiptags exclude ALL tests from CI (Behave AND Robot)Category: Test Flaw
Files:
features/session_list_error.feature:11,18,25,32,40,47,54|robot/session_list_error.robot:17,32All 7 Behave scenarios are tagged
@wip. Both Robot test cases have[Tags] tdd bug554 wip. The project excludes these from CI at two levels:behave.iniline 8 —tags = ~@wipnoxfile.pyline 577-578 —"--exclude", "wip"This means no test from this entire PR executes in CI, across both Behave and Robot layers. The tests provide zero automated regression coverage for bug #554.
The feature file header says "expected to fail until the DI container fix lands", and the branch contains no production code fix (only TDD tests). If this PR merges before the fix, or the fix commit forgets to remove
@wip, these tests become permanent dead code.Recommendation: (a) Track
@wipremoval as an explicit subtask on issue #554 and reference it in the PR description, or (b) add the production fix to this branch so@wipcan be removed before merge. Consider a CI lint rule that flags@wipscenarios merged tomaster.HIGH — P2
F2: Empty session list bypasses
--formatparameter (untested production bug)Category: Bug Detection / Test Coverage Gap
Files:
src/cleveragents/cli/commands/session.py:181-184When no sessions exist,
list_sessions()unconditionally prints via Rich console:This executes regardless of
--format. Runningagents session list --format jsonwith an empty DB returns Rich-marked text — not valid JSON. Same for--format yamland--format plain.The Behave suite only tests the empty-list path with the default format (scenarios at lines 12 and 19), so this bug is invisible to the current tests.
Recommendation: Add scenarios for empty-list with
--format jsonand--format yaml. The production fix should route the empty case throughformat_output().F3: No empty-list scenarios for non-default output formats
Category: Test Coverage Gap
Files:
features/session_list_error.featureIssue #554 AC #4 states "All output formats (rich, json, yaml, plain) work correctly." All format-specific scenarios use
Given a session-list-error service with a pre-populated session, meaning they only test the non-empty code path. The empty list is tested only with the default format.This gap directly enables F2 to go undetected.
MEDIUM — P3
F4:
Namecolumn missing from session list table vs specCategory: Spec Compliance
Files:
src/cleveragents/cli/commands/session.py:194-197|docs/specification.md:1610The spec shows 5 columns: ID, Name, Actor, Messages, Updated. The implementation renders 4: ID, Actor, Messages, Updated —
Nameis absent. Tests do not assert on table structure.F5: Summary panel incomplete vs spec
Category: Spec Compliance
Files:
src/cleveragents/cli/commands/session.py:210-214|docs/specification.md:1616-1622The spec shows summary with: Total, Most Recent, Oldest, Total Messages, Storage. Implementation only renders Total Sessions and Total Messages. The
_session_list_dict()returns{"sessions": items, "total": N}but the spec expectsdata.summarywith 5 fields.F6: JSON/YAML assertions are structurally shallow
Category: Test Flaw
Files:
features/steps/session_list_error_steps.py:178-198The JSON step only asserts a top-level key
"sessions"exists. It does not validate:id,actor,messages,updated)command,status,data,timing,messages)data["sessions"]is a list (malformed{"sessions": "not a list"}would pass)The YAML step has the same weakness.
Recommendation: Add
assert isinstance(data["sessions"], list)and check at least one element's keys.F7: Missing
tableandcolorformat test coverageCategory: Test Coverage Gap
Files:
features/session_list_error.featureThe spec defines 6 output formats. The Behave tests cover 5 (default/rich, explicit rich, json, plain, yaml).
tableandcolorare untested. Both are handled byformat_output()and should have minimal smoke coverage.F8:
os.environmanipulation is not thread-safeCategory: Security / Reliability
Files:
features/steps/session_list_error_steps.py:61,72os.environ["CLEVERAGENTS_DATABASE_URL"] = db_urlmodifies the process-global environment. If Behave scenarios ever run in parallel, concurrent scenarios could overwrite each other's database URLs. The cleanup pop is also race-prone.F9: Robot test asserts on
stderrbut error may appear instdoutCategory: Test Flaw
Files:
robot/session_list_error.robot:25Should Not Contain ${list.stderr} AttributeErrorchecks only stderr. Typer's exception wrapping may write to stdout. The test should also assert on${list.stdout}. Additionally, the Robot test doesn't verify stdout contains "No sessions found" in the success case.F19: Robot tests don't verify
agents initexit code (NEW)Category: Test Flaw
Files:
robot/session_list_error.robot:19-20,34-35In both test cases,
Run Processforagents initis executed but its return code is never checked. Ifinitfails, the subsequentsession listmay fail for a different reason than the DI bug, producing a misleading test failure.Recommendation: Capture the init result and assert
Should Be Equal As Integers ${init.rc} 0before proceeding.LOW — P4
F10: Duplicate scenarios with marginal value difference
Category: Test Flaw
Files:
features/session_list_error.feature:11-16,18-23"Session list returns empty list when no sessions exist" and "Session list after init does not raise DI error" share the same background and
Whenstep. They differ only in which specific text is checked (No sessions found+AttributeErrorvsAttributeError+INTERNAL). Could be merged into a single scenario.F11: Substring assertion
"Sessions"is fragileCategory: Test Flaw
Files:
features/session_list_error.feature:30,37,52should contain "Sessions"is a substring match that could false-match on future output changes. Consider"Sessions ("or the actual session ID for more precision.F12: Benchmark recreates service on every
time_list_emptyiterationCategory: Performance
Files:
benchmarks/session_list_bench.py:124-127_make_service()on every iteration adds construction overhead to the list-only measurement. Consider creating the service in asetup_time_list_emptymethod.F13: File-based SQLite in benchmarks adds I/O variance
Category: Performance
Files:
benchmarks/session_list_bench.py:54Uses
sqlite:///{path}(file-based) instead ofsqlite://(in-memory). File I/O introduces disk-dependent measurement variance.F14: JSON/YAML output doesn't match spec envelope structure
Category: Test Coverage Gap
Files:
features/steps/session_list_error_steps.py:178-198The spec shows JSON wrapped in
{command, status, exit_code, data: {sessions, summary}, timing, messages}. Theformat_output()function does NOT add this envelope — it serializes the raw dict directly. So actual output is{"sessions": [...], "total": N}, which doesn't match the spec. This is a systemic gap.F15: Dynamic attribute assignment for ASV
unitCategory: Test Flaw (minor)
Files:
benchmarks/session_list_bench.py:147SessionListDISuite.track_list_after_create_count.unit = "sessions"sets a dynamic attribute on a method, causing type-checker warnings.F16: Benchmark explicitly bypasses the DI container path
Category: Test Coverage Gap
Files:
benchmarks/session_list_bench.py:1-8,44-45The benchmark docstring acknowledges it "does not exercise
_get_session_service()or the DI container wiring." No performance baseline exists for the DI path itself.F17: Behave setup does not reset DI container before setting env var (NEW)
Category: Test Flaw
Files:
features/steps/session_list_error_steps.py:40-66_setup_real_di_path()setsCLEVERAGENTS_DATABASE_URLat line 61 but does not callreset_container()first. If a stale container singleton exists from a prior test suite, any singleton providers that cached the database URL would use the old value. Mitigated by the fact thatdatabase_urlis aproviders.Callable(lazy), but singleton dependents could still freeze a stale URL.F18: Benchmark
_dispose_freshlacks defensive error handling (NEW)Category: Test Flaw
Files:
benchmarks/session_list_bench.py:96-98If
_fresh_engine()fails after creatingself._mut_tmpdirbut before assigningself._mut_engine(e.g.,create_engineraises), the ASV teardown calls_dispose_fresh()which raisesAttributeErroronself._mut_engine.dispose(), preventingshutil.rmtreefrom cleaning up the temp directory.F20: Robot JSON format test lacks
AttributeErrorassertion (NEW)Category: Test Flaw
Files:
robot/session_list_error.robot:29-40The second test case ("Session List JSON Format Does Not Error") only asserts
rc == 0but does not checkShould Not Contain ${list.stderr} AttributeErrorlike the first test case does (line 25). Inconsistent error-path coverage.F21: Temp directory leak if
_setup_real_di_path()fails aftermkdtemp(NEW)Category: Test Flaw
Files:
features/steps/session_list_error_steps.py:45,88-90tempfile.mkdtemp()at line 45 creates a temp directory, butcontext.add_cleanup()is registered at line 90 — outside_setup_real_di_path(). If_setup_real_di_pathraises aftermkdtempbut before returning (e.g.,Base.metadata.create_all()fails), the cleanup is never registered and the temp directory leaks.Recommendation: Register cleanup immediately after
mkdtemp, or wrap the setup body in atry/exceptthat cleans up on failure.Findings Summary
@wipexcludes ALL tests (Behave + Robot) from CI--format(untested production bug)Namecolumn vs spectableandcolorformat coverageos.environmanipulation is not thread-safeagents initexit codeunitannotationAttributeErrorassertion_setup_real_di_path()fails after mkdtempTotals: 1 Critical, 2 High, 7 Medium, 11 Low = 21 findings
Verdict: Requesting changes primarily for F1 (critical — no tests run in CI) and F2/F3 (high — untested production bug in empty-list format path). The medium-severity items (F4-F9, F19) should be addressed or tracked. Low-severity items are advisory.
Reviewed by: automated code review agent — 8 iterative analysis rounds against commit
9f94a0ea, branchfeature/m3-fix-session-list-error, issue #554, anddocs/specification.md.9f94a0ea26cd900fddb4Disposition of CoreRasurae review #2038 — commit
cd900fddRebased onto master (
620adfee), single commit, all prior review fixes preserved plus 11 new code fixes.P1 — F1:
@wiptags exclude all tests from CINo code change. The
@wipmechanism is working exactly as designed for a TDD test PR:behave.inihastags = ~@wip(PR #567) andnoxfile.pyhas--exclude wip(PR #568). Both are merged.@wiptags serve as a handoff signal to the fix author: when they branch from this TDD branch to write the production fix, they remove@wipand the tests activate in CI automatically.Tracking: The fix author's PR must remove
@wipas part of closing issue #554. This is implicit in the feature file header comment (lines 1–2) and the PR description.P2 — Fixed
--format@wipBehave scenarios exercising the empty-list code path with--format json,--format yaml, and--format plain. These document the expected behaviour and will catch the format-bypass bug once the fix lands.P3 — Fixed / Acknowledged
Namecolumn missing from table vs specisinstance(data, dict)andisinstance(data[key], list)checks to both JSON and YAML steps. Malformed output (e.g.,{"sessions": "not a list"}) now fails.tableandcolorformat coverageos.environmanipulation not thread-safeShould Not Containon both${list.stderr}and${list.stdout}forAttributeError.agents initexit codeShould Be Equal As Integers ${init.rc} 0.P4 — Fixed / Acknowledged
"Sessions"is fragile"Sessions ("which is more specific and matches the table title pattern"Sessions (N total)".time_list_emptyiterationsetup_time_list_emptymethod that builds the service once, reused across iterations.format_output()doesn't wrap in spec envelope). Out of scope for test PR.unit_setup_real_di_path()now callsreset_container()before setting the env var, ensuring no stale DI singleton carries over._dispose_freshlacks defensive handlinghasattrchecks for_mut_engineand_mut_tmpdirso partial setup failures don't cascade.AttributeErrorassertionShould Not Containfor both stdout and stderr.context.add_cleanup()is now registered immediately aftermkdtemp, before any further setup that could fail.Summary
21 findings: 11 fixed in code, 1 responded (F1 — no change by design), 9 acknowledged/deferred.
Verification
nox -s lintnox -s typechecknox -s unit_tests@wip)cd900fdd— no merge commits620adfeeagents session listDI container missingdbprovider test #630Planning Authority Review — PR #596
Branch Naming Convention:
This PR uses the
feature/prefix (feature/m3-tdd-session-list-test), but per CONTRIBUTING.md, TDD-related issues should use thetdd/prefix (e.g.,tdd/m3-session-list-test). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent TDD branches.TDD Counterpart Tracking:
TDD counterpart issue #630 has been created for the originating bug #554. This ensures the TDD cycle is properly tracked alongside the fix.
Review Status:
This PR is in advanced review with 14+ comments already addressed. No additional blocking concerns from planning at this time.
PM Status Check — Day 29
Author: @brent.edwards | Milestone: v3.2.0 (M3) | Mergeable: Yes | Assigned Reviewer: @CoreRasurae
Current State
TDD test suite for
agents session listDI container error (#554). All review findings addressed across rounds from hurui200320 (#2009), Aditya (#55067), and CoreRasurae (#1981). Rebased onto master, single squashed commit.Status: READY FOR MERGE
Same pattern as PR #595 — all findings resolved, previous approvals stale after rebase.
Merge Order
This should merge after PR #595 (session create TDD) to avoid conflicts in shared test infrastructure.
Action Required
This is a TDD test PR for bug #554 (session list DI error, M3). Bug fixes are top priority.
Code Review Report — PR #596 / Issue #554
Reviewer: Automated Deep Review (test coverage, test flaws, performance, bug detection, security, spec compliance)
Commit:
cd900fdd—test(cli): add failing TDD tests for session list DI container errorBranch:
feature/m3-fix-session-list-errorFiles reviewed:
features/session_list_error.feature,features/steps/session_list_error_steps.py,benchmarks/session_list_bench.py,robot/session_list_error.robot,CHANGELOG.mdCross-referenced against: Issue #554 acceptance criteria,
docs/specification.md(session list spec §1591-1714),src/cleveragents/cli/commands/session.py,src/cleveragents/application/container.py,behave.ini,noxfile.py, CI workflowsSummary
The PR delivers well-structured TDD tests with solid cleanup patterns and thorough documentation. However, there are 2 critical issues that must be resolved before merge — one will break CI, and the other represents a significant discrepancy between the PR description and the actual change scope.
Critical
C1:
@wipTag Not Excluded from Behave CI — Will Breakunit_testsCategory: CI / Test Infrastructure
Files:
behave.ini,noxfile.py(line 476-483),.forgejo/workflows/ci.ymlThe PR body claims "C4:
@wipCI exclusion is handled by PR #566 (addstags = ~@wiptobehave.ini)". However,behave.inicontains no tags configuration:The
noxfile.pyunit_testssession runs behave with no--tagsargument. Thecoverage_reportsession only excludes@discovery. The CI workflow delegates directly to these nox sessions. This means the 10@wip-tagged scenarios will execute and fail, breaking theunit_testspipeline.Fix: Add
tags = ~@wiptobehave.ini, or pass--tags=~@wipin the nox behave invocations.C2: PR Body Claims "5 Behave BDD scenarios" — Actual Count Is 10
Category: Process / Documentation
The PR body "Changes" section states:
The feature file contains 10 scenarios (the commit message correctly says 10). Additionally, the PR body references
@unittags but the actual file uses@regression— these claims appear stale from before the F2/F3 review feedback added 5 empty-list format scenarios. Reviewers relying on the PR body will have an incorrect understanding of the change scope.Fix: Update the PR body to reflect 10 scenarios with
@regression @tdd @bug554 @wiptags.High
H1: Scenarios 8-10 (Empty-List Format) Will Fail Even After the DI Fix
Category: Test Correctness
Files:
features/session_list_error.feature:65-84,src/cleveragents/cli/commands/session.py:181-184Scenarios 8-10 assert that the empty-list code path respects
--format(e.g., valid JSON for--format json). However,session.py:181-184always prints viaconsole.print("[yellow]No sessions found.[/yellow]")regardless of--format, then returns without callingformat_output. This means:@wiptag, making it impossible to distinguish which fix unblocks which tests.The feature file acknowledges this (lines 61-63), but conflating two distinct bugs under one
@wipgate is a design concern.Recommendation: Consider separate tags (e.g.,
@wip_divs@wip_format) or split into two feature files so each bug's fix can be validated independently.H2: Robot Tests May Hang Without
--yesFlag onagents initCategory: Test Flaws / Robustness
File:
robot/session_list_error.robot:19, 38Both Robot test cases run
agents init sle-test(andagents init sle-json) without the--yesflag. Whileinitlikely won't prompt in a fresh directory, the absence of--yesin a headless CI/subprocess environment (stdin is/dev/null) is fragile. Ifinitever adds confirmation prompts for new scenarios, these tests will hang until the 60s timeout.Fix: Change to
agents init --yes sle-testin both test cases.H3: Branch Name Deviates from Issue #554 Metadata
Category: Process
Files: Issue #554 metadata, branch name
Issue #554 metadata prescribes branch
fix/m3-session-list-error, but the actual branch isfeature/m3-fix-session-list-error. While usingfeature/for a TDD test PR is reasonable, the deviation from issue metadata may cause tracking confusion.H4: Global
os.environMutation — Parallel Test Interference RiskCategory: Test Flaws / Test Isolation
File:
features/steps/session_list_error_steps.py:72_setup_real_di_pathsetsos.environ["CLEVERAGENTS_DATABASE_URL"]globally. Thenoxfile.pyunit_testssession usesbehave-parallel, which runs scenarios in parallel subprocesses. While subprocesses get a copy of the parent environment at fork time, if any other scenario in the same subprocess readsCLEVERAGENTS_DATABASE_URLduring the window where this test has overridden it, cross-scenario contamination is possible.Recommendation: Consider using
runner.invoke(session_app, ["list"], env={"CLEVERAGENTS_DATABASE_URL": db_url})ormonkeypatch-style isolation if supported.Medium
M1: Scenarios 1 and 2 Are Nearly Duplicated
Category: Test Design
File:
features/session_list_error.feature:11-23Both scenarios invoke
list with default format, assertexit code 0, and assert no"AttributeError". Scenario 1 additionally checks"No sessions found", Scenario 2 additionally checks no"INTERNAL". These could be consolidated into one scenario with both positive and negative assertions, reducing test execution time without losing coverage.M2: Robot Tests Don't Verify Positive Output ("No sessions found")
Category: Test Coverage Gap
File:
robot/session_list_error.robot:14-31The Behave tests verify that empty-list output contains
"No sessions found", but the Robot testSession List After Init Should Not Erroronly verifies the absence of"AttributeError"— it doesn't check for the expected output. A positive assertion (Should Contain ${list.stdout} No sessions found) would strengthen the Robot coverage.M3: Benchmark
time_list_after_createIncludes Service Construction in Timed RegionCategory: Performance / Benchmark Accuracy
File:
benchmarks/session_list_bench.py:133-137time_list_after_createcallsself._make_mut_service()inside the timed method, meaning the benchmark measures service object construction +create()+list(), not just the session operations. For an accurate performance baseline, service construction should be moved tosetup_time_list_after_create.M4: Missing
teardown_time_list_emptyBenchmark HookCategory: Benchmark Design
File:
benchmarks/session_list_bench.py:124-125setup_time_list_emptycreatesself._empty_svcbut there is no correspondingteardown_time_list_empty. While the shared engine is disposed in the suite-levelteardown(), this is asymmetric withtime_list_after_createandtrack_list_after_create_count, which both have explicit teardown hooks.M5:
@regressionTag Is Premature on TDD TestsCategory: Test Design / Tagging
File:
features/session_list_error.featureAll 10 scenarios are tagged
@regression @tdd. The@regressiontag conventionally denotes tests for already-fixed bugs. Since these are TDD tests written before the fix,@regressionis premature and could confuse tag-based test selection. Consider using only@tdd @bug554 @wipnow, then replacing@tdd @wipwith@regressionafter the fix lands.M6:
tableandcolorOutput Formats Not TestedCategory: Spec Compliance / Test Coverage Gap
File:
features/session_list_error.featureThe specification (
docs/specification.md) defines 6 output formats:rich,color,table,plain,json,yaml. The PR tests 4 of these. While the issue AC only lists 4, the spec gap should be tracked as a follow-up item.Low
L1: PR Body Claims
@unitTag — Actual File Uses@regressionCategory: Documentation
File: PR #596 body vs
features/session_list_error.featureThe PR body states "M4: Added
@unittag to all 5 scenarios" and "L1: Added@unittag to all scenarios". The actual file uses@regression, not@unit.L2: Inconsistent Test Actor Names Across Test Suites
Category: Test Consistency
Files:
features/steps/session_list_error_steps.py:131,benchmarks/session_list_bench.py:146Behave pre-populated sessions use
actor_name="openai/gpt-4", benchmarks useactor_name="bench/test". Consistent test data aids debugging.L3: Robot
Cleanup Test EnvironmentIs a No-OpCategory: Test Design (pre-existing)
File:
robot/common.resource:30-32The suite-level
Cleanup Test Environmentkeyword only logs a message. While each test case has its own[Teardown], the suite teardown could unsetCLEVERAGENTS_HOME. This is a pre-existing pattern, not introduced by this PR.L4: Benchmark
sys.pathManipulationCategory: Code Quality
File:
benchmarks/session_list_bench.py:18-20Direct
sys.path.insert(0, _SRC)is standard for ASV benchmarks but bypasses normal package installation. Noted for awareness — no action needed.L5: PR Body Claims "No Merge Commits" — Branch Contains One
Category: Documentation
File: PR body vs git log
The PR process section states "Single squashed commit, rebased onto master (no merge commits)". The branch contains commit
c1e77a30 Merge branch 'master' into feature/m3-fix-session-list-error.Recommendations
@wipexclusion) — this is a CI-breaking issue.@wiptags for distinct bugs) and H2 (add--yesto Robot init).@ -0,0 +130,4 @@"""List sessions when DB is empty."""self._empty_svc.list()def time_list_after_create(self) -> None:[M3]
_make_mut_service()is called inside the timed method, so the benchmark measures service construction + create + list, not just the session operations. Consider moving service construction tosetup_time_list_after_createfor a more accurate performance baseline.[M4] There is no
teardown_time_list_emptycorresponding tosetup_time_list_empty(line 124). While the shared engine is disposed in suite-levelteardown(), this is asymmetric with the other benchmarks that have explicit teardown hooks.@ -0,0 +8,4 @@Background:Given a session-list-error CLI runner using the real DI path@regression @tdd @bug554 @wip[C1] All 10 scenarios use
@wipbutbehave.inihas notags = ~@wipconfiguration andnoxfile.pyunit_testssession passes no--tagsargument to behave. These scenarios will execute and fail in CI, breaking theunit_testspipeline. The PR body claims PR #566 handles this, but the currentbehave.inidoes not contain the exclusion.[M5] The
@regressiontag is conventionally for already-fixed bugs. Since these are TDD tests written before the fix, consider using only@tdd @bug554 @wipnow and adding@regressionafter the fix lands.@ -0,0 +16,4 @@And the session-list-error output should not contain "AttributeError"@regression @tdd @bug554 @wipScenario: Session list after init does not raise DI error[M1] Scenarios 1 and 2 are nearly duplicated. Both invoke list with default format, assert exit 0, and assert no "AttributeError". The only differences are "No sessions found" (S1) and no "INTERNAL" (S2). Consider consolidating into a single scenario to reduce redundancy.
@ -0,0 +63,4 @@# --format for empty lists, so these document the expected behaviour.@regression @tdd @bug554 @wipScenario: Empty session list with JSON format produces valid JSON[H1] Scenarios 8-10 assert that the empty-list code path respects
--format. However,session.py:181-184always prints viaconsole.print()regardless of--formatand returns before callingformat_output. After the DI fix, these 3 scenarios will still fail because they test a separate bug (format bypass). Consider using distinct@wipsub-tags (e.g.,@wip_divs@wip_format) so each fix can be validated independently.@ -0,0 +69,4 @@engine.dispose()# Override the container's database_url so real DI can find the DB.os.environ["CLEVERAGENTS_DATABASE_URL"] = db_url[H4] Setting
os.environ["CLEVERAGENTS_DATABASE_URL"]globally is risky whenbehave-parallelruns scenarios in parallel subprocesses. If another scenario in the same subprocess reads this env var during the test window, cross-contamination is possible. Consider isolating the env var more tightly (e.g., passing it through CliRunner'senvparameter if supported, or using a process-local override mechanism).@ -0,0 +16,4 @@... "No sessions found" rather than a DI AttributeError.[Tags] tdd bug554 wip${tmpdir}= Evaluate __import__('tempfile').mkdtemp(prefix='sle_554_')${init}= Run Process ${PYTHON} -m cleveragents init sle-test[H2]
agents init sle-testis missing the--yesflag. Whileinitlikely won't prompt in a fresh directory, running without--yesin a headless subprocess environment is fragile. Add--yesfor robustness.[M2] This test only verifies the absence of
AttributeErrorbut does not verify the expected output (No sessions found). AddingShould Contain ${list.stdout} No sessions foundwould strengthen coverage.PM Status Check — Day 29 (Update)
Author: @brent.edwards | Milestone: v3.2.0 (M3) | Reviews: Stale (needs re-approval after rebase)
Current State
All review findings addressed across multiple rounds. Jeff marked this READY FOR MERGE on Day 26 pending re-approval after rebase. PR has a merge conflict — needs rebase onto current master.
Action Required
This TDD PR blocks the bug fix for #554 (session list DI error). High priority — same category as PR #595.
c1e77a306241438eb195Rebase onto master —
41438eb1Rebased onto current master (
a808c395) per PM request (#56798). Single commit, no merge commits. CHANGELOG conflict resolved (PR entry first, then master's #590 and #589 entries).No code changes — only the CHANGELOG conflict resolution.
Verification
nox -s lintnox -s typecheckReady for re-approval from @CoreRasurae.
Code Review — PR #596
feature/m3-fix-session-list-errorReviewed commit:
41438eb1on basea808c395Review type: Test review + Architecture review
Verdict: REQUEST CHANGES — 3× P1, 3× P2, 2× P3
P1 — Must fix before merge
F1 ·
P1:must-fix— TDD bug tags use wrong convention (feature file)features/session_list_error.featureTags
@regression @tdd @bug554 @wipmust use@tdd_bug @tdd_bug_554 @tdd_expected_failper CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The@wiptag causesbehave.ini'stags = ~@wipfilter to silently exclude all 10 scenarios from CI, meaning these tests are never run in the pipeline.F2 ·
P1:must-fix— TDD bug tags use wrong convention (Robot file)robot/session_list_error.robotSame issue in the Robot test metadata/tags. Should use the
@tdd_bug/@tdd_bug_554convention for consistency with CONTRIBUTING.md.F3 ·
P1:must-fix— Six import blocks buried inside functionsfeatures/steps/session_list_error_steps.pyThere are six separate import blocks inside function bodies throughout the step definitions file. CONTRIBUTING.md §Import Guidelines (lines 1287-1296) requires all imports at the top of the file, with the sole exception of
if TYPE_CHECKING:blocks. Each of these must be moved to module level.P2 — Should fix (follow-up within 3 days)
F4 ·
P2:should-fix— CHANGELOG references wrong tag namesThe CHANGELOG entry mentions
@tdd @bug554 @wiptags. Once F1 is fixed, the CHANGELOG should reference the corrected tag names (@tdd_bug,@tdd_bug_554,@tdd_expected_fail).F5 ·
P2:should-fix— Feature file header comment references@wipfeatures/session_list_error.featureThe feature file has a header comment explaining the TDD workflow that references
@wipas the tag to use for expected-fail scenarios. This should reference@tdd_expected_failinstead, to match the project convention documented in CONTRIBUTING.md.F6 ·
P2:should-fix— Commit body references wrong tagsThe commit message body mentions the old-style tags (
@tdd,@bug554,@wip). Should be updated to match the corrected convention if the commit is rewritten as part of fixing F1-F3.P3 — Nits (author discretion)
F7 ·
P3:nit—@regressiontag arguably redundant with@tdd_bugThe
@regressiontag is applied alongside the TDD bug tags. If@tdd_bugalready implies this is a regression test for a known bug, the@regressiontag may be redundant. However, keeping it may be useful for filtering regression tests specifically. Author's call.F8 ·
P3:nit— Benchmark sys.path style differs from sibling PRsbenchmarks/session_list_bench.pyThe sys.path manipulation pattern differs slightly from the pattern used in PR #594's benchmark. Consider aligning with the team's emerging convention for benchmark imports.
Summary
This PR adds comprehensive TDD test coverage for the session-list error path with 10 BDD scenarios, Robot smoke tests, and ASV benchmarks. The test design is thorough. However, the three P1 findings are critical: the wrong TDD tags mean all 10 BDD scenarios are silently excluded from CI, and the six buried import blocks are a clear CONTRIBUTING.md violation. Both issues must be resolved before merge.
PM Note (Day 29) — Reassignment
Changes:
Rationale: Brent has the active branch
feature/m3-fix-session-list-errorand the open PR for this work. The issue assignee should match the developer who has the active branch and PR. Luis was nominally assigned but Brent is doing the actual implementation.agents session listDI container missingdbprovider test #630PM Compliance Audit — CONTRIBUTING.md Checklist
Closes #554Merge blockers: 5 stale REQUEST_CHANGES reviews. Brent addressed all findings (commit
41438eb1). Reviewers need to re-review or withdraw.41438eb195051e96d0c1Disposition of all review findings — commit
051e96d0Rebased onto master (
83f2f3a0), single squashed commit, all review findings addressed. This commit implements the@tdd_expected_failinfrastructure (duplicated from PR #595 for independent buildability), migrates all 18 existing TDD scenarios to the new tag convention, and fixes all self-review findings.Self-review (#56917) — 8 findings, all fixed
@regression @tdd @bug554 @wipto@tdd_bug @tdd_bug_554 @tdd_expected_fail. The@tdd_expected_failtag replaces@wipwith proper result inversion (FAIL→PASS when bug exists, PASS→FAIL when bug is fixed but tag not removed).[Tags] tdd bug554 wipto[Tags] tdd_bug tdd_bug_554 tdd_expected_fail.reset_container,create_engine,Base,scoped_session/sessionmaker,PersistentSessionService,SessionRepository/SessionMessageRepository) to top-level imports. Added private API access documentation to module docstring explaining whysession_mod._serviceaccess is intentional.@tdd_bug @tdd_bug_554 @tdd_expected_failand added description of infrastructure + tag migration.@wip@tdd_expected_fail.@tdd_bug @tdd_bug_554 @tdd_expected_failand describes infrastructure + migration.@regressiontag arguably redundant@regressionsince@tdd_bugalready conveys the regression-test intent._SRC/sys.path.insertpattern is the standard ASV convention used across all benchmark files in this project. No change needed.CoreRasurae review #2038 — 21 findings
@wipexcludes all tests from CI@wipwith@tdd_expected_fail. Implemented the@tdd_expected_failinfrastructure: Behaveafter_scenariohook inenvironment.pyinverts test results, Robot listener intdd_expected_fail_listener.pydoes the same, registered via--listenerinnoxfile.py. Tests now run in CI — failures are inverted to passes (expected since bug #554 is not fixed).--formatNamecolumn vs specisinstancechecks added in earlier round.table/colorformat coverageos.environnot thread-safeShould Be Equal As Integers ${init.rc} 0added."Sessions (".setup_time_list_emptyadded.reset_container()added at top of_setup_real_di_path()._dispose_freshlacks defensive handlinghasattrguards added.AttributeErrorassertioncontext.add_cleanup()registered immediately aftermkdtemp.CoreRasurae review #2057 — 17 findings
@wipnot excluded from Behave CI@wipwith@tdd_expected_failinfrastructure. No longer depends onbehave.iniexclusion — tests run and results are inverted.@tdd_expected_failtag handles the inversion cleanly. When the DI fix and format fix land, both sets of tests will start passing and the tag must be removed.--yesagents initdoes not prompt when run with a project name argument. The--yesflag is itself the subject of bug #522 and does not exist yet.os.environparallel risk@regression, format gap)@regressiontag removed in this commit.hurui200320 review #2009 — all findings previously addressed
All C1-C4, H1-H4, M1-M4, L1-L2 findings were addressed in earlier rounds and remain addressed.
Aditya review #55067 — all findings previously addressed
M1-M2 and L1-L2 were addressed in earlier rounds.
hamza.khyari review #55540 — 15 findings previously addressed
F1-F12 fixed in code. F13-F15 acknowledged as cosmetic.
Additional changes in this commit
@tdd_expected_failinfrastructure — Behaveafter_scenariohook + Robot listener + noxfile registration. Replaces@wipwith proper result inversion.@tdd @bugNNNto@tdd_bug @tdd_bug_NNNper CONTRIBUTING.md § TDD Bug Test Tags.Verification
nox -s lintnox -s typecheck051e96d0— no merge commits83f2f3a0051e96d0c16221daf225Self-Review — PR #596
test(cli): add failing TDD tests for session list DI container errorReviewed commit:
6221daf2on base83f2f3a0Review type: Self-review (test + architecture + infrastructure)
Verdict: APPROVE with comments — 0 P0, 0 P1 (fixed), 3 P2, 2 P3
P1 findings — Fixed in this push
Two P1 issues identified during self-review (shared with PR #595 infrastructure) have been fixed in commit
6221daf2:F1 (was P1, now fixed) —
slow_integration_testsmissing--listenernoxfile.py— Theslow_integration_testsnox session ranrobotoverrobot/without--listener robot/tdd_expected_fail_listener.py. Theintegration_testssession had it, butslow_integration_testsdid not.@tdd_expected_failtests would fail unexpectedly in that session.Resolution: Added
--listener robot/tdd_expected_fail_listener.pyto theslow_integration_testssession.F2 (was P1, now fixed) — "Unexpected pass" branch sets no error message
features/environment.py— When a@tdd_expected_failscenario unexpectedly passes, the Behave hook calledscenario.set_status(Status.failed)with no diagnostic text. Developer would seeFAILEDwith all stepsPASSEDand zero explanation. The Robot listener correctly setsresult.messagebut the Behave hook didn't.Resolution: Added
scenario.error_messagewith the same diagnostic message pattern used by the Robot listener.P2 — Should fix (follow-up within 3 days)
F3 ·
P2:should-fix— Benchmark bypasses DI container entirelybenchmarks/session_list_bench.pyThe benchmark constructs
PersistentSessionServicedirectly using a hand-builtsessionmaker, bypassing_get_session_service()and the DI container. The class-level docstring correctly notes this measures service-layer performance, but the method-level docstrings could be clearer about not exercising the DI path.F4 ·
P2:should-fix—importlib.reload()in step setup is fragilefeatures/steps/session_list_error_steps.pyUsing
importlib.reload()to reset module state between tests is fragile. Works for forcing_service = Noneto exercise the real DI path, but if the module grows side effects on import, it may not fully reset state. A brief comment explaining the rationale would help.F5 ·
P2:should-fix— RobotShould Containcase sensitivityrobot/session_list_error.robotRobot Framework's
Should Containis case-sensitive by default. If the production code changes output casing, tests will false-fail. Consider usingShould Containwith explicit case-aware matching or documenting the expected output format.P3 — Nits
F6 ·
P3:nit— Step definitions access private_serviceattributefeatures/steps/session_list_error_steps.pySteps access
session_mod._serviceand callsession_mod._reset_session_service(). Defensible for DI integration testing, but a brief comment would improve maintainability.F7 ·
P3:nit— 10 scenarios is thorough but some overlapScenarios 3-6 (format validation for JSON/YAML/plain/rich) all follow the same pattern — invoke with
--format X, check exit code, check output keyword. Consider a Scenario Outline to reduce duplication. Author's discretion since the current form is explicit and readable.Quality Gates
nox -s lintnox -s typecheck83f2f3a0)Refs: #554(not ISSUES CLOSED)Summary
This TDD test PR provides comprehensive coverage of the session list DI wiring bug (#554) with 10 BDD scenarios, Robot smoke tests, and ASV benchmarks. The latest push fixes both P1 infrastructure gaps shared with PR #595 (missing
--listenerinslow_integration_testsand missing diagnostic message in the Behave unexpected-pass branch). The remaining P2/P3 findings are minor and can be addressed in follow-up. Ready for merge (after PR #595 merges first per merge order).6221daf2258301a21888Disposition — Self-Review #2073 Findings
Addressed all P2/P3 findings from self-review in commit
8301a218. Rebased onto latestmaster(fcdf80f3).F1 (was P1, fixed previously) —
slow_integration_testsmissing--listenerAlready fixed in prior push. Confirmed still present in
8301a218.F2 (was P1, fixed previously) — "Unexpected pass" branch sets no error message
Already fixed in prior push. Confirmed still present in
8301a218.F3 (P2) — Benchmark bypasses DI container entirely
Fixed. Updated method-level docstrings for
time_list_after_createandtrack_list_after_create_countto explicitly state they bypass_get_session_service()/ DI container and measure only the service-layer round-trip cost, not the DI wiring path.track_list_after_create_countnow notes it "does not reproduce bug #554."F4 (P2) —
importlib.reload()/ module reset is fragileFixed. Added inline comment at the
session_mod._service = Noneline in_setup_real_di_path()explaining the fragility and referencing the module docstring for full rationale.F5 (P2) — Robot
Should Containcase sensitivityFixed. Added a
*** Comments ***section to the Robot file documenting thatShould Contain/Should Not Containare case-sensitive by default and that assertions rely on exact output format (e.g."AttributeError"is Python exception class name, title-case).F6 (P3) — Private
_serviceaccessAcknowledged. Already documented in module docstring (lines 16-22) with a dedicated "Private API access" section. No further action needed.
F7 (P3) — 10 scenarios overlap
Acknowledged. Author's discretion — the current explicit form is readable and each scenario tests a distinct format/lifecycle combination. A Scenario Outline could reduce duplication but would obscure individual test intent.
Quality gates
nox -s typecheck— 0 errors ✓nox -s lint— 0 errors ✓fcdf80f3) ✓Review — PR #596 (ticket #554)
Scope: TDD test-only PR — 10 Behave scenarios, Robot smoke tests, ASV benchmarks,
@tdd_expected_failinfrastructure, 18-scenario tag migration.Prior review concerns
This PR has been through 6+ review rounds with 80+ findings from multiple reviewers. I have verified the latest commit (
8301a218) against all previously raised concerns. The vast majority have been addressed in code. Remaining deferred items (branch naming, 4/6 format coverage) were PM-approved.Findings
F1 (BLOCKING): Rebase required
The PR's merge base is
fcdf80f3but master has advanced to1b4e0216. Forgejo marks the PR as not mergeable. Please rebase onto current master and force-push.F2 (LOW): Conditional import in
features/environment.py:309from behave.model import Statusis imported inside theif "tdd_expected_fail" in scenario.tags:block rather than at module level. The rest of the file uses module-level imports exclusively, and the round-7 response stated imports were moved to module level — but this one was not. Minor style inconsistency; no functional impact.Recommendation: Move
from behave.model import Statusto the top-level imports.Items verified as correct
@tdd_expected_failBehave hook (environment.py:298–327): Correctly inverts FAIL→PASS and PASS→FAIL with proper step-status reset.@tdd_expected_failRobot listener (tdd_expected_fail_listener.py): Clean Listener API v3 implementation with correct status-string comparisons.noxfile.py:579–580, 603–604): Listener added to bothintegration_testsandslow_integration_tests.session_list_error_steps.py):scoped_sessionfor pre-populated data,context.add_cleanupregistered immediately aftermkdtemp, engine disposed infinallyblocks, container reset in both setup and cleanup.@tdd @bugNNNto@tdd_bug @tdd_bug_NNN. None incorrectly carry@tdd_expected_fail.session_list_bench.py): Per-method setup/teardown prevents row accumulation. Documented DI bypass (by design).test(cli):),Refs: #554(correct — TDD PR does not close the bug).Verdict
APPROVED — conditional on F1 (rebase onto current master). F2 is a minor nit at author's discretion.
8301a21888283054d122New commits pushed, approval review dismissed automatically according to repository settings
283054d1224c5589da4eagents session listcauses an error. #554agents session listcauses an error. #554