test(cli): add failing tests for session create DI container error (#570) #595
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Blocks
#570
agents session create causes an error.
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!595
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m3-fix-session-create-error"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Refs #570
Add TDD regression tests for
agents session createDI container wiring error._get_session_service()callscontainer.db()but the DIContainerclass has nodbprovider, raisingAttributeError. Same root cause as #554.This is a test-only PR — the bug fix will be implemented separately by the fix assignee, who will remove the
@tdd_expected_failtags.Changes
TDD regression tests:
features/session_create_error.feature) tagged@tdd_bug @tdd_bug_570 @tdd_expected_failrobot/session_create_error.robot) with--format plainandtdd_expected_failtagsbenchmarks/session_create_error_bench.py)--format plainfor reliableCliRunnercapture (addresses F1)@tdd_expected_failinfrastructure (implements CONTRIBUTING.md § TDD Bug Test Tags):after_scenariohook infeatures/environment.pyinverts pass/fail for@tdd_expected_failscenariosrobot/tdd_expected_fail_listener.py(Listener API v3) performs the same inversionnoxfile.py: registers listener via--listenerinintegration_testssessionTag migration (18 existing scenarios across 5 feature files):
features/cli_init_yes_flag.feature— 5 scenarios:@tdd @bug522→@tdd_bug @tdd_bug_522features/resource_type_bootstrap_fs.feature— 3 scenarios:@tdd @bug523→@tdd_bug @tdd_bug_523features/resource_type_bootstrap_git.feature— 3 scenarios:@tdd @bug524→@tdd_bug @tdd_bug_524features/project_create_persist.feature— 4 scenarios:@tdd @bug589→@tdd_bug @tdd_bug_589features/project_show_after_create.feature— 3 scenarios:@tdd @bug590→@tdd_bug @tdd_bug_590Review feedback addressed
From CoreRasurae (Review #1980)
--format plain--format plainavoids Rich output path_counterremovedsetup()tdd_bug tdd_bug_570 tdd_expected_failFrom hurui200320 (Review #2008)
Refs: #570(notISSUES CLOSED)Type/Testingis appropriate for test-only PR--format plaintrack_create_persistsdocstring updatedcontext.sce_original_service# type: ignorein benchmarkFrom hamza.khyari (Review #2034)
@tdd_expected_failinversion infrastructure replaces@wipexclusion need; scenarios pass CI via inversiontdd_expected_faillistener handles inversion;--exclude wipalready in noxfiletest(cli):prefixtotal:_cleanup_scecalls_reset_session_service()thenreset_container()Type/Testinglabel is appropriate for test-only PR_setup_real_di_pathFrom freemo (Review #2060)
@tdd_expected_failtag added with full inversion infrastructure_serviceaccess rationale# type: ignorein benchmarkRefs: #570Process
master(83f2f3a0)Refs: #570footer (test-only, does not close issue)nox -s lint✓ |nox -s typecheck0 errors ✓Refs: #570
agents session createcauses an error. #570Self-review of TDD failing-test PR for bug #570.
Test design: Tests exercise the real
_get_session_service()DI path by resetting_service = None. A file-based SQLite database is prepared with all tables, andCLEVERAGENTS_DATABASE_URLis overridden so the commands can reach the database once the fix is applied.Step pattern prefixes: All step patterns use the
session-create-errorprefix to avoid AmbiguousStep collisions with existingsession_cli_steps.py.Expected behavior: All 3 scenarios are expected to FAIL because
container.db()raisesAttributeError. Same root cause as #554 - the fix for both issues is the same code change. The fix author should branch from this TDD branch so the tests are inherited.Quality gates:
nox -s lintPASS (0 errors),nox -s typecheckPASS (0 errors).agents session createcauses an error. #570PM Review — Day 25
Status
Context
TDD test PR for #570 (session create DI container error). Same root cause as #554 — missing
dbprovider in the DI container. Tests exercise the real_get_session_service()path.Action Item
@CoreRasurae: Please review this PR. The fix will be implemented by @freemo on the container side.
Priority
Medium — blocks M3 acceptance gate. Linked to #554/#596.
Code Review --
test(cli): add failing tests for session create DI container errorCommit:
9161f6c7Issue: #570
Spec reference:
docs/specification.md--agents session create(line 1457+)Summary
The TDD approach is sound -- writing failing tests first before the fix is the right methodology. The Behave, Robot, and ASV test scaffolding covers the core happy-path scenarios. However, there are two high-severity test design flaws that will cause the tests to produce unreliable signals once the fix is applied, and several medium/low findings around spec alignment, coverage gaps, and process.
Findings
F1 -- Rich Console Output Will Not Be Captured by CliRunner [HIGH -- Test Flaw]
Files:
features/steps/session_create_error_steps.py:92,features/session_create_error.feature:14The Behave steps invoke the session CLI without a
--formatflag:This means the
createcommand takes the defaultfmt="rich"path (session.py:144), which uses the module-levelconsole = Console()(session.py:35) to print output viaconsole.print(Panel(...)).The problem:
Console()is constructed at module import time, capturing a reference tosys.stdoutat that point. WhenCliRunnerlater replacessys.stdoutwith a capture buffer, theConsoleinstance still writes to the original stdout. This meansresult.outputwill not contain the Rich Panel output ("Session Created"), and the assertion on feature line 14 will fail even after the bug is fixed.Recommendation: Invoke with
--format plainor--format jsonso that the output goes throughtyper.echo()(which CliRunner captures reliably), and adjust assertions to match the plain/json output format. This affects all three Behave scenarios.F2 -- Benchmarks Bypass the Actual Bug Path [HIGH -- Test Flaw]
File:
benchmarks/session_create_error_bench.py:44-52The docstring states these benchmarks are "expected to error or produce anomalous timings until the fix is applied", but
_make_service()constructsPersistentSessionServicedirectly using a hand-builtsessionmaker. This completely bypasses_get_session_service()and the DI container -- the exact code path where the bug lives. All three benchmark methods will pass successfully regardless of whether bug #570 is present, directly contradicting the docstring.Recommendation: Either (a) restructure the benchmark to exercise the real
_get_session_service()path (resetting_service = Nonelike the Behave tests do), or (b) update the docstring to accurately state that these benchmarks measure the service layer performance baseline and are not gated on the bug fix.F3 -- Missing Error Handling Scenario [MEDIUM -- Test Coverage]
File:
features/session_create_error.featureIssue #570 acceptance criteria explicitly require:
The feature file has 3 scenarios (create success, persistence, create-with-actor), but none covers error handling. Missing scenarios include:
F4 -- Branch Name Mismatch With Issue Metadata [MEDIUM -- Process]
Issue #570 metadata specifies branch
fix/m3-session-create-error. The actual branch isfeature/m3-fix-session-create-error. The issue's Definition of Done states the commit must be pushed to the branch matching the Metadata exactly.Recommendation: Either rename the branch or update the issue metadata.
F5 -- Specification Alignment: Panel Title and Output String [MEDIUM -- Spec Compliance]
File:
features/session_create_error.feature:14The test asserts
"Session Created"which matches the current code (session.py:154panel title). However, the spec (specification.md:1475) shows the Rich panel title as"Session"(not"Session Created"), and the status message as"Session created"(lowercasec). The test asserts against current code rather than the spec. If the code is later aligned with the spec, these tests break.Recommendation: Clarify which is canonical and ensure consistency.
F6 -- Dead Code in Benchmark [LOW -- Code Quality]
File:
benchmarks/session_create_error_bench.py:66self._counter = 0is initialized insetup()but never read or modified anywhere inSessionCreateDISuite.F7 -- Benchmark Recreates Engine and Schema Per Call [LOW -- Performance]
File:
benchmarks/session_create_error_bench.py:46-47Each
time_*method calls_make_service()which runscreate_engine()+Base.metadata.create_all()on every invocation. Sincesetup()already creates the schema, the repeatedcreate_all()calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead.Recommendation: Create the engine/sessionmaker once in
setup()and store as instance attributes.F8 -- Robot Tests Missing
[Tags][LOW -- Test Organization]File:
robot/session_create_error.robotThe Behave scenarios are tagged
@tdd @bug570 @wipfor selective execution. The Robot test cases have no[Tags], making it impossible to selectively skip these known-failing tests in CI.Recommendation: Add
[Tags] tdd bug570 wipto each test case.F9 -- Benchmark Filename Mismatch With Issue Subtask [LOW -- Process]
Issue #570 subtask specifies
benchmarks/session_create_bench.py. Actual filename isbenchmarks/session_create_error_bench.py. Minor naming discrepancy.Severity Summary
_counterin benchmark[Tags]Requesting changes on F1 and F2 -- without addressing these, the tests will produce false failures (F1) or false passes (F2) once the fix is applied, defeating the purpose of the TDD cycle.
@ -0,0 +1,93 @@"""ASV benchmarks for session create DI wiring (bug #570).F2 [HIGH]:
_make_service()constructsPersistentSessionServicedirectly by building its owncreate_engine()+sessionmaker(), completely bypassing_get_session_service()and the DI container. The bug (#570) is thatcontainer.db()raisesAttributeError-- but this function never callscontainer.db(). All three benchmark methods (time_create_session,time_create_with_actor,track_create_persists) will pass successfully regardless of whether the bug is present, contradicting the module docstring claim that they are "expected to error or produce anomalous timings until the fix is applied".The Behave tests correctly exercise the broken DI path by resetting
_service = None. These benchmarks should either do the same, or the docstring should be corrected to state they measure the service-layer baseline only.@ -0,0 +1,93 @@"""ASV benchmarks for session create DI wiring (bug #570).Measures the cost of creating a session through the ``PersistentSessionService``,F7 [LOW -- Performance]:
_make_service()runscreate_engine()+Base.metadata.create_all()on everytime_*call. Sincesetup()already creates the schema (lines 63-64), the repeatedcreate_all()calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead.Consider building the engine and
sessionmakeronce insetup()and storing them as instance attributes, then using them in eachtime_*method.@ -0,0 +20,4 @@from sqlalchemy import create_enginefrom sqlalchemy.orm import sessionmakertry:F6 [LOW -- Dead code]:
self._counteris initialized here but never read or modified anywhere in the class. Remove it.@ -0,0 +11,4 @@Scenario: Session create produces a new sessionWhen I invoke session-create-error create with no argumentsThen the session-create-error command should exit successfullyAnd the session-create-error output should contain "Session Created"F5 [MEDIUM -- Spec Compliance]: The assertion checks for
"Session Created"which matches the current code's panel title (session.py:154). However, the spec atspecification.md:1475shows the Rich panel title as"Session"(not"Session Created"), and the status message as"Session created"(lowercasec). If the code is later aligned with the spec, this assertion breaks.Also see F1: even setting aside the string value, this assertion will likely never match because Rich Console output isn't captured by CliRunner (see comment on
session_create_error_steps.py).@ -0,0 +23,4 @@Scenario: Session create with custom actor succeedsWhen I invoke session-create-error create with actor "openai/gpt-4"Then the session-create-error command should exit successfullyAnd the session-create-error output should contain "openai/gpt-4"F3 [MEDIUM -- Test Coverage]: Issue #570 acceptance criteria require this feature to cover "create, list after create, error handling". There is no error-handling scenario. Consider adding at least:
@ -0,0 +8,4 @@~~~~~~~~~~~~~~~~The bug is that ``_get_session_service()`` calls ``container.db()`` but theDI ``Container`` class has no ``db`` provider. This raises``AttributeError: 'DynamicContainer' object has no attribute 'db'`` on everyF1 [HIGH]: This invocation uses the default
fmt="rich"path, which writes output via the module-levelconsole = Console()insession.py:35. ThatConsolecapturessys.stdoutat import time -- beforeCliRunnerreplaces it with a capture buffer. As a result,result.outputwill not contain the Rich Panel text ("Session Created"), and the assertion in the feature file (line 14) will produce a false failure even after the bug is fixed.Same issue applies to the
create with actorstep at line 97.Fix: Pass
--format plain(orjson) so the output goes throughtyper.echo(), which CliRunner captures reliably:Then adjust the feature assertions to match the plain output format (e.g.,
"[OK] Session created"per the spec).@ -0,0 +12,4 @@Suite Teardown Cleanup Test Environment*** Test Cases ***Session Create After Init Should Not ErrorF8 [LOW -- Test Organization]: The Behave scenarios are tagged
@tdd @bug570 @wipfor selective execution/filtering. These Robot test cases have no[Tags], making it impossible to selectively skip these known-failing tests in CI.Suggestion:
Thanks @CoreRasurae for the thorough review — all findings addressed in commit
da92f2e6.Disposition of findings (review #1980)
Fixed
--format plainto allrunner.invoke()calls (create,list). Rich Console capturessys.stdoutat import time, before CliRunner replaces it. Plain format usestyper.echo()which CliRunner captures reliably. Feature assertions updated to match plain output format (session_id:instead ofSession Created).total:and absence oftotal: 0.self._counter = 0fromsetup().setup(), stored as instance attributes._make_service()is now an instance method reusing them.teardown()disposes the engine.[Tags] tdd bug570 wipto both Robot test cases._SRC/importlib.reload(cleveragents)pattern.Acknowledged (no code change)
fix/vsfeature/). Deliberate — this is a test-only PR sofeature/prefix was chosen. Cannot rename without recreating the PR. Will coordinate with PM if needed.session_create_error_bench.pyis more descriptive than the issue subtask'ssession_create_bench.py. Kept as-is.Verification
nox -s lint— passed (All checks passed!)nox -s typecheck— passed (0 errors, 0 warnings, 0 informations)Code Review --
test(cli): add failing tests for session create DI container errorCommit:
da92f2e6(post-resolution of Core's review #1980)Issue: #570
Branch:
feature/m3-fix-session-create-errorReviewer: Aditya Chhabra
Summary
Test PR follows solid TDD methodology with good responses to Core's initial review. Tests properly exercise the DI path with
_service = Nonereset and use--format plainto ensure CliRunner can capture output. Benchmark docstring was clarified, and Robot tests now have proper tags. However, there are two critical test design issues that will cause false signals once the fix is applied, plus minor coverage and documentation gaps.Findings
F1 -- Test Scenario #4 Assumes Non-Existent Actor Validation [HIGH -- Test Logic Error]
File:
features/session_create_error.feature:29-32The fourth scenario asserts:
The scenario name says "fails gracefully" but the assertions require exit code 0 and the actor name to appear in output. This is internally contradictory.
More critically:
PersistentSessionService.create()(src/cleveragents/application/services/session_service.py:57-71) does not validate whether the actor exists. It accepts anyactor_namestring and creates the session. The CLI command (session.py:138-140) callsservice.create(actor_name=actor)with no validation wrapper.Result: The test will pass (exit 0, actor name echoed) regardless of whether the actor exists, making the scenario name misleading and the test uninformative about error handling.
Recommendation: Either (a) remove this scenario entirely since actor validation is not part of the session create contract, or (b) rename it to "Session create with arbitrary actor name succeeds" and update the description to clarify that actor existence is not validated at session creation time.
F2 -- Missing Exit Code Assertion in Scenario #2 [MEDIUM -- Test Rigor]
File:
features/session_create_error.feature:17-20Scenario "Created session persists and can be retrieved" calls
createthenlist, but only asserts on the list result. If thecreatecommand fails (non-zero exit), the test will still proceed to list and may pass vacuously if there are leftover sessions from prior test runs or shared test database state.Recommendation: Add
Then the session-create-error command should exit successfullyafter line 18 to assert the create succeeded before checking persistence.F3 -- Robot Tests Do Not Use
--format plain[MEDIUM -- Test Consistency]File:
robot/session_create_error.robot:23, 38The Robot Framework tests invoke
session createwithout--formatflag, meaning they will use the default Rich output format. This is inconsistent with the Behave tests (which use--format plainper Core's F1 fix) and may produce flaky assertions if Rich panel rendering changes.Line 27 asserts
Should Not Contain ${create.stdout} AttributeErrorwhich is fragile — if the bug is present, the error message may appear in stderr, not stdout.Recommendation:
--format plainto bothsession createinvocations (lines 23, 38)Should Not Contain ${create.stderr} AttributeErrorF4 -- Benchmark
track_create_persistsDocstring Misleading [LOW -- Documentation]File:
benchmarks/session_create_error_bench.py:84-89The tracker docstring states:
But the benchmark constructs
PersistentSessionServicedirectly, bypassing the DI container. Bug #570 is in_get_session_service() → container.db()which this code never touches. The tracker will always return 1 (number of sessions created), regardless of whether the DI bug is present.Core's F2 fix updated the class docstring to clarify the benchmark measures service-layer performance, but the
track_create_persistsdocstring was not updated and still implies it detects the bug.Recommendation: Revise the docstring:
F5 -- Scenario #3 Does Not Assert on
session_id:[LOW -- Test Coverage]File:
features/session_create_error.feature:23-26Scenario "Session create with custom actor succeeds" only asserts that the output contains the actor name. It does not verify that a session was actually created (i.e., no
session_id:check like Scenario #1 line 14).This makes the test weaker — if the command prints the actor name in an error message but doesn't create a session, the test would still pass.
Recommendation: Add a second assertion:
F6 -- Cleanup Does Not Restore Original
_serviceState [LOW -- Test Isolation]File:
features/steps/session_create_error_steps.py:72-76_cleanup_sce()setssession_mod._service = Noneunconditionally. If a previous test had set_serviceto a mock, this cleanup will clobber it rather than restoring the original state.Impact: Low — only matters if tests run in unexpected order and share module state, which is unlikely given the
@wiptag isolation.Recommendation: Store the original
_servicevalue in_setup_real_di_path()and restore it in_cleanup_sce():Severity Summary
--format plainsession_id:assertion_serviceRequesting changes on F1 (misleading test scenario) and F2 (missing assertion that could cause false passes). F3 recommended for consistency with Behave approach.
Code Review —
test(cli): add failing tests for session create DI container errorCommit:
e46266ae(HEAD)Issue: #570
Branch:
feature/m3-fix-session-create-errorReviewer: Rui Hu
Summary
The TDD approach is methodologically sound and the Behave/Robot/ASV scaffolding is well-structured. The response to CoreRasurae's review #1980 addressed most of the original findings. However, there are four HIGH-severity process violations that must be resolved before this can be approved, plus several unresolved findings from Aditya's review.
Findings
F1 — Merge commit on branch [HIGH — Process Violation]
File: branch history, commit
e46266aeThe branch contains a merge commit:
Merge branch 'master-latest' into feature/m3-fix-session-create-error. Per CONTRIBUTING.md: "Each branch must not contain merge commits — instead as master drifts the branches should always align with master via rebase."Recommendation: Rebase onto
origin/masterto linearize the branch history, eliminating the merge commit.F2 — Multiple commits for the same issue [HIGH — Process Violation]
File: branch history, commits
9161f6c7andda92f2e6The branch has two non-merge commits addressing the same issue:
9161f6c7 test(cli): add failing tests for session create DI container errorda92f2e6 fix(test): address CoreRasurae review #1980 findings on session create error testsPer CONTRIBUTING.md: "Every commit must completely implement an issue and close it. At no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch." The review fixes should be squashed into the original commit.
Recommendation: Squash
9161f6c7andda92f2e6into a single commit with the correct commit message.F3 — Branch name mismatch with issue metadata [HIGH — Process Violation]
Issue #570 Metadata specifies branch
fix/m3-session-create-error. The actual branch isfeature/m3-fix-session-create-error. The issue's Definition of Done states the commit must be pushed to the branch "matching the Branch in Metadata exactly." This was acknowledged in the response to Core's F4 but dismissed as intentional — the DoD rule does not have an exception for test-only PRs.Recommendation: Either rename the branch to
fix/m3-session-create-error, or update the issue Metadata to match the current branch name.F4 — PR claims
ISSUES CLOSED: #570but does not fix the bug [HIGH — Scope/Completeness]File: PR description body
The PR is a TDD-only PR that adds failing tests but does not implement the actual bug fix (the DI container wiring). Yet the PR description states
ISSUES CLOSED: #570. Per CONTRIBUTING.md: "One issue = one commit. Atomic, self-contained, buildable, testable" and "Do not commit half-done work. Only commit when fully implemented and tested."All acceptance criteria in issue #570 remain unchecked:
[ ]Fix DI container wiring[ ]Verify session create command[ ]Fix should address #554 too[ ]Docs, Tests, QualityRecommendation: Change
ISSUES CLOSED: #570toRefs: #570in both the PR description and commit message footer. The issue should only be closed by the PR that actually fixes the bug.F5 — PR Type label mismatch [MEDIUM — Process]
Issue #570 has
Type/Bug. The PR hasType/Testing. Per CONTRIBUTING.md: "AType/label matching the issue type." If this PR is linked to close #570, the label should beType/Bug. If it is a standalone test PR not closing #570 (per F4 recommendation), thenType/Testingis acceptable, butISSUES CLOSED: #570must be removed.Recommendation: Align the label with the decision on F4.
F6 — Scenario #4 is misleading ("fails gracefully" but asserts success) [MEDIUM — Test Logic]
File:
features/session_create_error.feature:29-32The scenario name says "fails gracefully" but the assertions expect exit code 0.
PersistentSessionService.create()does not validate actor names — it accepts any string. This scenario is functionally identical to Scenario #3 and provides no additional error handling coverage. (Also raised by Aditya, F1.)Recommendation: Either rename to "Session create with arbitrary actor name succeeds" to accurately reflect what it tests, or replace with a genuine error handling scenario (e.g., database unreachable).
F7 — Scenario #2 missing create exit code assertion [MEDIUM — Test Rigor]
File:
features/session_create_error.feature:17-20The "Created session persists and can be retrieved" scenario calls
createthenlist, but never asserts thatcreatesucceeded. Ifcreatefails silently, the test could pass vacuously. (Also raised by Aditya, F2.)Recommendation: Add
Then the session-create-error command should exit successfullybetween lines 18 and 19.F8 — Robot tests don't use
--format plain[MEDIUM — Test Consistency]File:
robot/session_create_error.robot:23, 38The Behave tests were fixed (per Core's F1) to use
--format plain, but the Robot tests still invokesession createwithout--format. This inconsistency means Robot tests will hit the Rich output path. Additionally, line 27 checks${create.stdout}forAttributeError, but the error will likely appear instderr. (Also raised by Aditya, F3.)Recommendation:
--formatplainto both Robotsession createinvocations.${create.stderr}forAttributeError.F9 — Benchmark
track_create_persistsdocstring still misleading [LOW — Documentation]File:
benchmarks/session_create_error_bench.py:85-88The benchmark constructs
PersistentSessionServicedirectly, bypassing the DI container entirely. The tracker will always return the session count regardless of bug #570 status. The class-level docstring was correctly updated (per Core's F2), but this method-level docstring was missed. (Also raised by Aditya, F4.)Recommendation: Update to: "Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path."
F10 — Cleanup does not restore original
_servicestate [LOW — Test Isolation]File:
features/steps/session_create_error_steps.py:72-74_cleanup_sce()setssession_mod._service = Noneunconditionally rather than restoring the original value. (Also raised by Aditya, F6.)Recommendation: Store the original
_servicevalue in_setup_real_di_path()and restore it in_cleanup_sce():F11 — Unnecessary
# type: ignore[attr-defined]in benchmark [LOW — Code Quality]File:
benchmarks/session_create_error_bench.py:96Pyright is configured with
include = ["src"](in bothpyrightconfig.jsonandpyproject.toml). Thebenchmarks/directory is outside the type-checking scope entirely — Pyright never processes this file. The# type: ignorecomment is therefore dead noise and should be removed.Recommendation: Remove the
# type: ignore[attr-defined]comment since it has no effect.Severity Summary
--format plain_service# type: ignorein benchmarkRequesting changes on F1–F4 (process violations and scope mismatch). F6–F8 are unresolved findings from Aditya's review that should also be addressed in the same pass.
agents session createcauses an error.e46266ae47d79ceff177Squashed into single commit
d79ceff1, rebased ontomaster— all review findings addressed.Disposition of findings
Aditya's review (comment #55060)
Then the session-create-error command should exit successfullyafter the create step in scenario #2.--format plainon bothsession createinvocations.AttributeErrorcheck moved to${create.stderr}.track_create_persistsdocstring: "Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path."session_id:in output._serviceviacontext.sce_original_service.hurui200320's review (#2008)
ISSUES CLOSED: #570→Refs: #570in both PR body and commit message footer.Type/Testinglabel kept — this is a TDD-only PR usingRefs:notCloses:.# type: ignorefrom benchmark (outside Pyright scope).Additional
@unittag to all 4 scenarios for selective test filteringVerification
nox -s lint— passed (All checks passed!)nox -s typecheck— passed (0 errors, 0 warnings, 0 informations)PM Note (Day 26 — M3 PR Triage):
TDD test suite for session create DI container error. All review findings addressed across ~4 rounds. Latest review stale after squash. Has a merge conflict.
@brent.edwards — Please rebase onto
master. Needs fresh approval after rebase.PR #595 Review —
test(cli): add failing tests for session create DI container error (#570)Commit:
d79ceff1| 5 files, +340/-0 | Issue: #570 | Test-only PRThe TDD approach is sound and most prior review findings have been addressed. However, there are CI-blocking gaps (
@wipexclusion missing), a commit message type mismatch, a Robot assertion bug that will produce false failures after the fix, and a resource leak in the test setup that will manifest once the DI fix lands.Prior Review Resolution
CoreRasurae #1980 — 6/9 resolved:
--format plain_counterin benchmarksetup()[Tags]hurui200320 #2008 — 8/11 resolved:
feature/m3-fix-session-create-error— see PROC-4ISSUES CLOSEDbut doesn't fixRefs: #570Type/Testing— see PROC-5--format plain--format plain_servicecontext.sce_original_service# type: ignorein benchmarkFindings
PROC-1 (Major) —
behave.inimissingtags = ~@wip:@wipscenarios will fail in CIAll 4 BDD scenarios are tagged
@wip(features/session_create_error.feature:11,17,24,31).behave.inicontains:No
tags = ~@wipdirective. Whennox -s unit_testsruns, these scenarios execute and fail, blocking CI. This is the same cross-cutting issue as PR #567 (which adds the exclusion but isn't merged yet). Either add the exclusion here or declare an explicit merge-order dependency on #567.PROC-2 (Major) —
noxfile.pymissing--exclude wip: Robotwiptests will fail in CIRobot tests at
robot/session_create_error.robot:18,32are tagged[Tags] tdd bug570 wip. Thenoxfile.pyintegration_testssession excludesslow,discovery,code_blocksbut notwip. Same cross-cutting gap as PROC-1.PROC-3 (Major) — Commit message
fix(cli):on test-only PRCommit subject:
fix(cli): resolve DI container wiring for session create commandThis PR contains zero production code changes — only test files. Per Conventional Changelog,
fixdenotes a bug fix in production code; a test-only commit should usetest(cli):. The issue #570 metadata prescribesfix(cli):for the actual fix commit. CONTRIBUTING.md requires: "When an issue specifies a commit message in its Metadata section, that prescribed text is the first line and must be used exactly as written." However, the prescribed message was intended for a commit that includes the fix, not a TDD-only PR that ships failing tests ahead of it. Using the fix commit message on a test-only PR misrepresents the scope ingit logand CHANGELOG tooling.TEST-1 (Medium) — Robot
Should Containcase mismatch: will fail after fixrobot/session_create_error.robot:44:But
--format plainoutput producessessions:(lowercase). Confirmed via_session_list_dict()atsession.py:108— the dict key is"sessions"(lowercase), and_format_plain_dict()renders keys verbatim. Robot'sShould Containis case-sensitive by default. After the DI bug is fixed, this test will still fail due to the case mismatch, producing a false negative unrelated to bug #570.Fix: Change
Sessionstosessionson line 44.SEC-1 (Medium) — DI-created engine never disposed after fix lands
features/steps/session_create_error_steps.py:70-74_cleanup_sce()restores_serviceand removes the env var, but never disposes the engine that_get_session_service()creates viacontainer.db()atsession.py:66. The setup engine (line 56-58) is properly disposed, but the DI container creates a separate engine when the CLI command executes. After the fix lands, each scenario will leak a SQLAlchemy engine + open SQLite file handle.shutil.rmtreeat line 74 will attempt to delete the DB file while the leaked engine holds it open.Fix: In
_cleanup_sce(), reset the container or dispose the DI-created engine before removing the temp dir.SEC-2 (Medium) —
_get_session_service()passesdbwheresession_factorycallable expectedsession.py:66-69:SessionRepository.__init__expectssession_factory: Callable[[], Session](asessionmaker). The benchmark does it correctly:SessionRepository(session_factory=self._session_factory). Whencontainer.db()eventually returns a value, it must be asessionmakercallable, not a raw engine or session. This is a latent type mismatch that the fix author must handle — worth flagging here since the TDD tests will need to validate this contract.PROC-4 (Medium) — Branch name mismatch (still open)
Issue #570 prescribes
fix/m3-session-create-error. Actual:feature/m3-fix-session-create-error. Two discrepancies: prefix (feature/vsfix/) and name ordering (m3-fix-session-create-errorvsm3-session-create-error). (Originally CoreRasurae F4 + hurui200320 F3.)PROC-5 (Medium) —
Type/Testinglabel not in CONTRIBUTING.md taxonomyThe valid
Type/labels per CONTRIBUTING.md are:Type/Bug,Type/Feature,Type/Task,Type/Epic,Type/Legendary.Type/Testingis not defined. If the PR references issue #570 (Type/Bug), the label should beType/Bug. (Originally hurui200320 F5.)TEST-2 (Medium) — No error handling scenarios
Issue #570 subtasks require: "Tests (Behave): Add
features/session_create_error.featurecovering: create, list after create, error handling." The feature file has 4 success-path scenarios but no error handling scenario (e.g., verifying the error message when the DI container is misconfigured, or a non-zero exit code on failure). (Originally CoreRasurae F3.)SEC-3 (Low) — Env var set before cleanup registered
features/steps/session_create_error_steps.py:61,85—CLEVERAGENTS_DATABASE_URLis set at line 61 inside_setup_real_di_path(), butcontext.add_cleanupis registered at line 85 after the function returns. If any line between 62-84 raises, cleanup is never registered and the env var leaks to subsequent scenarios.BENCH-1 (Low) —
importlib.reloadpartial: submodules not reloadedbenchmarks/session_create_error_bench.py:29—importlib.reload(cleveragents)only reloads top-level__init__.py. Submodules likecleveragents.application.services.session_servicemay still reference stale bytecode from an installed package.PROC-6 (Low) — Benchmark filename mismatch with issue subtask
Issue #570 prescribes
benchmarks/session_create_bench.py. Actual:benchmarks/session_create_error_bench.py. (Originally CoreRasurae F9.)Summary
12 findings — 3 Major, 6 Medium, 3 Low. The test design is solid after multiple review rounds. The major blockers are the
@wipexclusion gaps (PROC-1/2) and the commit message type mismatch (PROC-3). TEST-1 (Robot case mismatch) will produce a false failure after the fix lands and should be corrected now.d79ceff1771c9fe9e454Rebase + Review Fixes —
1c9fe9e4Rebased onto
master(272f9078) as a single squashed commit with no merge commits.Disposition of hamza.khyari review #2034
Fixed
@wipscenarios now excluded bybehave.initags = ~@wipinherited from master (merged via PR #567). All 4@wipscenarios are correctly skipped in CI.noxfile.pynow has--exclude wipfor Robot tests, inherited from master (merged via PR #568). Bothwip-tagged Robot tests are correctly excluded.fix(cli): resolve DI container wiring for session create commandtotest(cli): add failing tests for session create DI container error— this is a test-only PR with no production code.Should Contain ${list.stdout} Sessions→sessions(lowercase) to match--format plainoutput from_session_list_dict()._cleanup_sce()now callssession_mod._reset_session_service()before restoring original_service, releasing any DI-created engine beforeshutil.rmtreeremoves the temp dir.context.add_cleanup()now registered immediately after storing original state in_setup_real_di_path(), before env var override and DB creation. If any subsequent setup line raises, cleanup still runs.Acknowledged (no code change needed)
_get_session_service()—container.db()must return asessionmakercallable, not a raw engine. This is a note for the fix author (PR implementing thedbprovider). Not actionable in this test-only PR.feature/m3-fix-session-create-errorvs issue metadatafix/m3-session-create-error. Cannot rename without recreating PR. This is a test-only PR usingRefs:notCloses:, so thefeature/prefix is appropriate.Type/Testinglabel is correct — this is a TDD-only PR that usesRefs: #570, notCloses: #570. The issueType/Buglabel applies to the fix PR, not this test PR.@wiptag is removed after the fix. Adding an explicit "verify the error message" scenario would test the broken state, not the fixed state, contradicting TDD methodology.session_create_error_bench.pyis more descriptive than the issue'ssession_create_bench.py. Kept as-is.importlib.reloadonly reloads top-level__init__.py. This is a known ASV limitation — submodule caching is unavoidable without restarting the interpreter.Quality gates
nox -s lintnox -s typechecknox -s unit_tests@wipfeature)PR is
mergeable: true. Single commit, rebased onto master, no merge commits.agents session createDI container missingdbprovider test #631Planning Authority Review — PR #595
Branch Naming Convention:
This PR uses the
feature/prefix (feature/m3-tdd-session-create-test), but per CONTRIBUTING.md, TDD-related issues should use thetdd/prefix (e.g.,tdd/m3-session-create-test). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent TDD branches.TDD Counterpart Tracking:
TDD counterpart issue #631 has been created for the originating bug #570. This ensures the TDD cycle is properly tracked alongside the fix.
Review Status:
This PR is in advanced review with 10+ comments already addressed. No additional blocking concerns from planning at this time.
PM Status Check — Day 29
Author: @brent.edwards | Milestone: v3.2.0 (M3) | Mergeable: Yes | Assigned Reviewer: @CoreRasurae
Current State
Brent has addressed all findings from 4 review rounds (CoreRasurae #1980, Aditya #55060, hurui200320 #2008, hamza.khyari #2034). Final commit
1c9fe9e4is rebased onto master, single squashed commit, no merge conflicts. Quality gates: lint pass, typecheck 0 errors, 9,029 scenarios 0 failures.Status: READY FOR MERGE
All review findings addressed. No outstanding objections. Previous approvals are stale after rebase.
Action Required
This is a TDD test PR for bug #570 (session create DI error, M3). The fix PR cannot proceed until these tests are merged. Bug fixes are top priority per CONTRIBUTING.md.
Code Review — PR #595:
test(cli): add failing tests for session create DI container errorCommit:
test(cli): add failing tests for session create DI container errorIssue: #570
Branch:
feature/m3-fix-session-create-errorReviewer: Aditya Chhabra
Previous Review Resolution — F1–F6 (#55060)
All six findings from the prior review cycle have been correctly resolved:
createin Scenario #2Then the session-create-error command should exit successfullyadded before the list step--format plain; AttributeError checked on stdout instead of stderr--format plainon both invocations; check moved to${create.stderr}track_create_persistsdocstring implied it detected bug #570session_id:assertionAnd the session-create-error output should contain "session_id:"added_servicecontext.sce_original_serviceand restored; additionally improved with_reset_session_service()call to release DI-created engine before restoreScope
TDD-only PR — adds failing tests that will pass once the DI container fix (#554 root cause) lands.
Findings
F1 —
@unitTag Misclassifies Integration-Level Tests [HIGH]File:
features/session_create_error.feature:11,17,24,31All four scenarios are tagged
@unit @tdd @bug570 @wip. This is incorrect. The tests:tempfile.mkdtemp()directorycreate_engine+Base.metadata.create_allCLEVERAGENTS_DATABASE_URLenvironment variable_get_session_service()(by design)These are integration-level tests.
@unitis reserved for in-memory, isolated, fast unit scenarios. Using@unithere misleads other developers about test characteristics, and will pull these (inherently slower, filesystem-dependent) tests into unit test budget expectations.Evidence from repo: A directly analogous TDD scenario in
features/resource_type_bootstrap_git.feature(bug #524, same TDD pattern) uses only@tdd @bug524 @wip— no@unit.Recommendation: Remove
@unitfrom all four scenarios. The correct tagging is:F2 — Robot Tests Do Not Assert on
agents initExit Code [MEDIUM]File:
robot/session_create_error.robot:20–21, 34–35Both test cases call
agents initbut silently discard the result:If
initfails (database setup error, migration failure, disk full, pre-existing path conflict), the test continues and the subsequentsession createfailure becomes ambiguous — the test would either fail on the RC assertion or, worse,initmay return RC 0 but produce a corrupted state.Evidence from repo: Every other Robot test that calls
initcaptures the result and asserts. Fromrobot/core_cli_commands.robot:34–37:Recommendation: Capture and assert on the
initresult in both test cases:F3 — Robot
listAssertion Is Too Weak to Verify Persistence [MEDIUM]File:
robot/session_create_error.robot:44The string
"sessions"will appear in plain-format output even for an empty list because_format_plain_dictrenders the keysessions:regardless of content (it would still outputsessions:with an empty list body if the create had failed silently). The assertion gives a false green ifsession createactually failed but the list command still ran and printed its header.Compare with the Behave equivalent (
step_list_shows_sessions), which correctly checks both:assert "total:" in result.outputassert "total: 0" not in result.outputRecommendation: Replace the weak string check with assertions that verify an actual session was created:
F4 — DI Container Not Reset Between Scenarios [MEDIUM]
File:
features/steps/session_create_error_steps.py:42–74_setup_real_di_path()setsCLEVERAGENTS_DATABASE_URLand resetssession_mod._service = None, but never callsreset_container()fromcleveragents.application.container. TheContaineris a global singleton accessed viaget_container(). If an earlier test has already calledget_container()and initialized anySingletonproviders against a different database URL (e.g.,context_tier_service,autonomy_controller, or any other Singleton-scoped provider), those cached instances persist for the lifetime of the process.After the DI fix lands, if the implementation resolves the session factory through the container — even if
CLEVERAGENTS_DATABASE_URLis correctly set — any Singleton providers already resolved against the old URL will not be re-initialized. While the test currently passes (the bug causes immediate failure before any Singleton is resolved), this will be a latent isolation problem once the fix lands.Recommendation: Add
reset_container()to both setup and cleanup:F5 —
_cleanup_sceComment Implies Engine Disposal That Does Not Occur [LOW]File:
features/steps/session_create_error_steps.py:251–252_reset_session_service()doesglobal _service; _service = None. It sets the cached reference toNone— it does not call.dispose()on any SQLAlchemy engine, and the current_get_session_service()implementation never stores its result in_service(the DI path returns without caching). So the comment is misleading: no engine is released by this call; Python's garbage collector handles cleanup if and when the refcount drops to zero.After the DI fix lands, if
_get_session_service()is also corrected to cache the result in_service, then this call will correctly clear the cached reference. But the comment overstates the guarantee today.Recommendation: Revise the comment to be accurate:
F6 — Benchmark
timeoutIs Float Instead of Int [LOW]File:
benchmarks/session_create_error_bench.py:76Every other benchmark suite in this repo uses
timeout = 60(integer). The float works in ASV, but the inconsistency is a style deviation. Additionally, 30 seconds is half the standard timeout. For a file-based SQLite round-trip that includes engine construction (done insetup()), 30 seconds may be tight in CI under load.Recommendation: Align with the codebase standard:
F7 —
list_sessionsEmpty-Path Bug Not Covered by Any Test [LOW]File:
src/cleveragents/cli/commands/session.py:181–184(pre-existing, not introduced by PR)When the session list is empty,
list_sessionsusesconsole.printregardless of--format:With
--format plain, this outputs Rich markup text[yellow]No sessions found.[/yellow]to the captured output instead of plain text. Only the non-empty path correctly routes throughtyper.echo(format_output(data, fmt)).The PR test avoids this path entirely (it always creates a session before listing), so this is not a failure introduced by this PR. However, the subtask "Tests (Behave): Add features/session_create_error.feature covering: create, list after create, error handling" (issue #570) implies error handling should be covered. An explicit scenario for the empty-list path would both document this behavior and surface the format inconsistency.
Recommendation: Add a fifth scenario (tagged
@tdd @bug570 @wipwithout@unit):This is a separate bug report concern and can be tracked separately if the team prefers to keep this PR narrowly scoped.
F8 — Robot Tests Use
Evaluate __import__('tempfile').mkdtemp()Anti-Pattern [LOW]File:
robot/session_create_error.robot:19, 33This is a code smell. The Robot Framework standard for creating temporary directories is via a helper keyword or the
OperatingSystemlibrary. TheEvaluatewith__import__is brittle (depends on string-level Python eval), harder to read, and doesn't follow the pattern used elsewhere in the robot suite.Evidence from repo: Other tests that need temporary directories use
Create Directorywith explicit paths or delegate temp-dir creation tocommon.resourcekeywords.Recommendation: Create a helper keyword in
common.resourceor use a Python helper file (e.g.,robot/helper_session_create_error.py) with aCreate Temp Dirkeyword:Or at minimum, use a named variable to make the intent clear.
Severity Summary
@unittag misclassifies integration-style testssession_create_error.featureagents initexit code not assertedsession_create_error.robotlistassertion too weak to verify persistencesession_create_error.robotsession_create_error_steps.py_cleanup_scecomment overstates engine disposalsession_create_error_steps.pytimeout = 30.0inconsistent and possibly shortsession_create_error_bench.pysession.pyEvaluate __import__('tempfile')...anti-pattern in Robotsession_create_error.robotPositive Observations
@wipcorrectly excludes scenarios from CI viabehave.ini(tags = ~@wip); Robot excludeswiptagged tests in noxintegration_tests. No CI breakage risk._setup_real_di_pathcorrectly stores original_servicebefore modification (F6 from prior review addressed).createandlistin Scenario 2 (F2 from prior review addressed).--format plainconsistently applied to all Behave and Robotsession createinvocations (F3 from prior review addressed)._cleanup_scecallscontext.add_cleanupimmediately after capturing original state — ensures cleanup runs even on partial setup failure.session_id:(F5 from prior review addressed).Verdict
Requesting changes on F1 (incorrect
@unittag — misleads developers about test nature and breaks classification assumptions), F2 (Robot silent init failure), and F3 (Robot persistence assertion gives false-green). F4 is a time-bomb that will cause subtle isolation failures after the DI fix lands and is strongly recommended.PM Status Check — Day 29 (Update)
Author: @brent.edwards | Milestone: v3.2.0 (M3) | Reviews: REQUEST_CHANGES (Aditya, new review posted today)
New Review — Aditya (#56627)
Aditya confirmed all prior F1-F6 findings resolved, then identified 8 new findings in the latest commit:
@unittag misclassifies integration-style tests (real DB, real DI, real temp dirs)agents initexit code not assertedlistassertion too weak (matches header, not actual session)Action Required
@unit), F2-F4 (Robot/DI fixes)F1 is the highest priority — the
@unittag violates test classification standards since these tests use real SQLite, real temp dirs, and real DI paths. The fix is a one-line tag removal per scenario.Note: This TDD PR blocks the bug fix for #570 (session create DI error). Bug fixes remain top priority.
1c9fe9e4540248d2220dDisposition of Aditya review (comment #56627) — commit
0248d222Single commit rebased onto master (
a808c395). All blocking findings fixed plus recommended items.Fixed
@unittag misclassifies integration-level tests@unitfrom all 4 scenarios. Tags are now@tdd @bug570 @wip, consistent withresource_type_bootstrap_git.featurepattern.agents initexit code not asserted${init}result and assertShould Be Equal As Integers ${init.rc} 0with diagnostic error message.listassertion too weakShould Contain ${list.stdout} sessionswithShould Contain ${list.stdout} total:andShould Not Contain ${list.stdout} total: 0. Mirrors the Behave step logic.from cleveragents.application.container import reset_containerat module level._setup_real_di_path()callsreset_container()before setting env var._cleanup_sce()callsreset_container()after restoring_serviceto release any DI-created connections._cleanup_scecomment overstates engine disposaltimeout = 30.0inconsistenttimeout = 60(int, matching all other benchmark suites).Acknowledged (no code change)
Evaluate __import__('tempfile')anti-patternVerification
nox -s lintnox -s typechecknox -s unit_tests@wip)0248d222— no merge commitsa808c395PM Status — Day 29 (2026-03-09)
@brent.edwards — Acknowledged your disposition of Aditya's review findings (commit
0248d222). Excellent turnaround on all 6 findings (F1–F6), especially the@unittag reclassification and DI container reset.Next step: @aditya — Please re-review this PR at your earliest convenience. All findings from your latest review have been addressed. This TDD PR is blocking the bug fix for #570, which is on the M3 critical path.
Timeline: We need approval by EOD 2026-03-10 to stay on track for M3 closure.
agents session createcauses an error. #570Code Review — Day 29 (2026-03-09)
PR: #595 — test(cli): add failing tests for session create DI container error (#570)
Author: brent.edwards | Milestone: v3.2.0 (M3) | Refs: #570
Findings:
F1 (Medium) —
@tdd_expected_failtag missing: The review playbook requires TDD PRs to use@tdd_expected_failtags to mark tests that are expected to fail until the fix lands. The scenarios use@tdd @bug570 @wipbut not@tdd_expected_fail. If the project convention uses@wipas the equivalent, this should be documented; otherwise, add@tdd_expected_failfor CI gate compatibility. This matters because automated test runners may not know to exclude these from pass/fail reporting without the canonical tag.F2 (Low) — PR body claims
@unittag but diff doesn't include it: The PR description states "Added@unittag to all scenarios for discoverability" butfeatures/session_create_error.featureshows only@tdd @bug570 @wipon each scenario — no@unittag. Either add the tag to match the description or correct the PR body.F3 (Low) — Coupling to private module internals: Step definitions access
session_mod._serviceand callsession_mod._reset_session_service()— both private APIs. This is defensible for DI integration testing (you need to force the real code path), but it creates fragile tests that will break if the module's internal caching mechanism changes. Consider documenting this coupling in a comment at the top of the step file so future maintainers understand why the private access is necessary.F4 (Info) — Benchmark
track_create_persistsdocstring clarity: The docstring correctly notes this doesn't exercise the DI path (addressing Aditya's F4), which is good. Thetype: ignoreremoval (hurui200320's F11) is confirmed absent from the diff — the benchmark has notype: ignorecomments. Clean.F5 (Info) — Cleanup robustness:
_cleanup_scecallssession_mod._reset_session_service()then restorescontext.sce_original_service. This is well-ordered — clearing the cache first, then restoring the original. Thecontext.add_cleanup()registration happens early in_setup_real_di_pathbefore any exception-prone lines (addressing the SEC-3 comment in the code). Good defensive pattern.F6 (Info) —
Refs:vsISSUES CLOSED:: Correctly usesRefs: #570since this TDD PR does not fix the bug — it only adds failing tests. This properly addresses hurui200320's F4 finding.Checklist:
Refs:(notISSUES CLOSED:), implementation notes present, CI status documentedengine.dispose()calledVerdict: COMMENT
Summary: This is a well-structured TDD test PR. Brent has thoroughly addressed all 6 of Aditya's findings (F1-F6) as well as hurui200320's and CoreRasurae's earlier review feedback. The test design is sound — resetting
_serviceto force the real DI path, using file-based SQLite for realistic testing, and proper cleanup with early registration. The two actionable items are F1 (confirm@tdd_expected_failvs@wipconvention for CI gating) and F2 (missing@unittag claimed in PR body). Neither blocks merge. This PR is ready to serve as the TDD harness for the #570 fix.@ -0,0 +9,4 @@Given a session-create-error CLI runner using the real DI path@tdd @bug570 @wipScenario: Session create produces a new sessionF1 (Medium): Review playbook requires
@tdd_expected_failtag for TDD PRs where tests are expected to fail until the fix lands. These scenarios use@tdd @bug570 @wip— if@wipserves the same CI gating purpose, document the equivalence; otherwise add@tdd_expected_failhere.@ -0,0 +14,4 @@Then the session-create-error command should exit successfullyAnd the session-create-error output should contain "session_id:"@tdd @bug570 @wipF2 (Low): PR description states
@unittag was added to all scenarios, but this line only has@tdd @bug570 @wip. Add@unitfor consistency with the stated change, or update the PR body.@ -0,0 +48,4 @@"""# Store original _service so cleanup can restore it.context.sce_original_service = session_mod._servicecontext.sce_tmpdir = tempfile.mkdtemp(prefix="session_create_err_570_")F3 (Low): Accessing
session_mod._service(private attribute) andsession_mod._reset_session_service()is necessary here to force the real DI path, but it creates coupling to internal implementation. Consider adding a brief comment here noting why the private access is required (e.g.# Access private _service to force _get_session_service() to re-resolve via DI) so future maintainers don't mistake this for an oversight.Code Review — PR #595
feature/m3-fix-session-create-errorReviewed commit:
0248d222on basea808c395Review type: Test review + Architecture review
Verdict: REQUEST CHANGES — 2× P1, 5× P2, 2× P3
P1 — Must fix before merge
F1 ·
P1:must-fix— TDD bug tags use wrong conventionfeatures/session_create_error.featureTags
@tdd @bug570 @wipmust be@tdd_bug @tdd_bug_570 @tdd_expected_failper CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The@wiptag is especially problematic becausebehave.inihastags = ~@wipwhich excludes these scenarios from CI entirely — meaning these tests are never actually run.F2 ·
P1:must-fix— Imports buried inside function bodyfeatures/steps/session_create_error_steps.py—_setup_real_di_path()Two imports (
from cleveragents.infrastructure.persistence.connection import reset_containerandimport importlib) are inside the_setup_real_di_path()function body. CONTRIBUTING.md §Import Guidelines (lines 1287-1296) requires all imports at the top of the file. The only exception is insideif TYPE_CHECKING:blocks.P2 — Should fix (follow-up within 3 days)
F3 ·
P2:should-fix— CHANGELOG overstates DI path coverageThe CHANGELOG entry says "Add full DI path coverage" but the PR only tests the session-create error path. "Full DI path coverage" implies comprehensive coverage of all DI paths, which isn't the case. Consider rewording to "Add DI path test coverage for session creation error handling."
F4 ·
P2:should-fix—importlib.reload()is a fragile patternfeatures/steps/session_create_error_steps.pyUsing
importlib.reload()to reset module state between tests is fragile and can cause subtle test pollution. If the module has side effects on import, reloading may not fully reset state. Consider documenting why this approach is necessary or using dependency injection to make the code more testable.F5 ·
P2:should-fix— Robot test #2 missingAttributeErrorstderr checkrobot/session_create_error.robotThe second Robot test case verifies the exit code but doesn't assert on the expected error message in stderr. If the CLI fails for a different reason (e.g., import error), the test would still pass.
F6 ·
P2:should-fix— Missing@regressiontagSibling feature files (e.g.,
session_list_error.featurein PR #596) include a@regressiontag. This feature file is missing it. If this is a regression test for bug #570, it should carry the@regressiontag for consistent test suite filtering.F7 ·
P2:should-fix—track_create_persistsstep always returns 1features/steps/session_create_error_steps.pyThe step that verifies session creation persistence appears to always assert a count of 1 regardless of context. If the test is meant to verify that creation succeeded, the expected count should be parameterized or clearly documented.
P3 — Nits (author discretion)
F8 ·
P3:nit— Commit message tag order doesn't match feature fileThe commit body lists tags in a different order than they appear in the feature file. Minor inconsistency but worth aligning for grep-ability.
F9 ·
P3:nit— No Robot test for custom actor scenariosThe Robot tests cover default actor scenarios but don't exercise the custom actor session creation path. This would improve integration-level confidence.
Summary
The session creation error handling test coverage is a solid addition. The production path through the DI container is properly exercised. However, the P1 findings are significant: the wrong TDD tags mean these tests are silently excluded from CI (via
behave.ini's~@wipfilter), and the buried imports violate the project's import guidelines. Both must be fixed before merge.PM Note (Day 29) — Reassignment
Changes:
Rationale: Brent has the active branch
feature/m3-fix-session-create-errorand the open PR for this work. The issue assignee should match the developer who has the active branch and PR. Luis was nominally assigned but Brent is doing the actual implementation.agents session createDI container missingdbprovider test #631PM Compliance Audit — CONTRIBUTING.md Checklist
Closes #570Merge blockers: 3 stale REQUEST_CHANGES reviews. Brent addressed all findings (commits
0248d222,1c9fe9e4). Reviewers need to re-review or withdraw.0248d2220d9ef1e9c000Disposition of all review findings — commit
9ef1e9c0Rebased as a single commit on master (
83f2f3a0). All quality gates pass:nox -s lint✓,nox -s typecheck0 errors ✓.Key structural changes in this revision
@tdd_expected_failinfrastructure implemented — Behaveafter_scenariohook and Robot listener both invert pass/fail for tagged scenarios. This replaces the@wipexclusion approach entirely.@tdd @bugNNNto@tdd_bug @tdd_bug_NNNacross 5 feature files.@tdd_bug @tdd_bug_570 @tdd_expected_fail.CoreRasurae — Review #1980
--format plainon all invocations--format plainavoids Rich path entirely_countersetup()[Tags]tdd_bug tdd_bug_570 tdd_expected_failhurui200320 — Review #2008
9ef1e9c0ISSUES CLOSEDon test-only PRRefs: #570Type/Testingappropriate for test-only PR--format plain_servicesce_original_service# type: ignorein benchmarkhamza.khyari — Review #2034
behave.inimissing~@wip@tdd_expected_failinversion replaces@wipexclusion; scenarios pass CI via inversionnoxfile.pymissing--exclude wipfor Robot--exclude wipalready present; Robot listener handlestdd_expected_failinversionfix(cli):on test-only PRtest(cli):prefixSessionscase mismatchtotal:_cleanup_scecalls_reset_session_service()thenreset_container()dbvssession_factorytype mismatchType/Testingnot in taxonomyType/Testingexists as a label and is appropriate_setup_real_di_pathimportlib.reloaddoesn't cascadefreemo — Review #2060
@tdd_expected_failtag missing@unitbut not in diff_serviceaccess# type: ignoreRefs:vsISSUES CLOSED:Refs: #570All actionable findings resolved. Items marked ⏭️ are out of scope for this TDD PR or acknowledged process items that don't block merge.
9ef1e9c00037cd34c18eSelf-Review — PR #595
test(cli): add failing tests for session create DI container errorReviewed commit:
37cd34c1on base83f2f3a0Review type: Self-review (test + architecture + infrastructure)
Verdict: APPROVE with comments — 0 P0, 0 P1 (fixed), 4 P2, 1 P3
P1 findings — Fixed in this push
Two P1 issues were identified during self-review and fixed in commit
37cd34c1:F1 (was P1, now fixed) —
slow_integration_testsmissing--listenernoxfile.py— Theslow_integration_testsnox session ranrobotoverrobot/without--listener robot/tdd_expected_fail_listener.py. Theintegration_testssession had it, butslow_integration_testsdid not.@tdd_expected_failtests would fail unexpectedly in that session.Resolution: Added
--listener robot/tdd_expected_fail_listener.pyto theslow_integration_testssession.F2 (was P1, now fixed) — "Unexpected pass" branch sets no error message
features/environment.py:322— When a@tdd_expected_failscenario unexpectedly passes, the Behave hook calledscenario.set_status(Status.failed)with no diagnostic text. Developer would seeFAILEDwith all stepsPASSEDand zero explanation. The Robot listener correctly setsresult.messagebut the Behave hook didn't.Resolution: Added
scenario.error_messagewith the same diagnostic message pattern used by the Robot listener.P2 — Should fix (follow-up within 3 days)
F3 ·
P2:should-fix— CHANGELOG entry says "infrastructure" but could be more specificThe CHANGELOG entry is somewhat vague about what
@tdd_expected_failinfrastructure entails. Consider expanding to mention both the Behave hook and Robot listener specifically.F4 ·
P2:should-fix—importlib.reload()in step setup is fragilefeatures/steps/session_create_error_steps.pyUsing
importlib.reload()to reset module state is a known fragile pattern. It works for the current use case (forcing_service = Noneto exercise the real DI path), but if the module grows side effects on import, it may not fully reset state. Documenting the rationale in a comment would help future maintainers.F5 ·
P2:should-fix— Robot test #2 missing stderr check forAttributeErrorrobot/session_create_error.robotThe second Robot test case verifies exit code but doesn't assert on the expected error message in stderr. A different failure mode could produce the same exit code without the specific
AttributeError.F6 ·
P2:should-fix— Benchmarktrack_create_persistsalways returns session count regardless of DI bugbenchmarks/session_create_error_bench.pyThe benchmark constructs
PersistentSessionServicedirectly, bypassing the DI container. The tracker always returns the session count regardless of bug #570. The class-level docstring correctly notes this, but the method-level docstring could be clearer.P3 — Nits
F7 ·
P3:nit— Step definitions access private_serviceattributefeatures/steps/session_create_error_steps.pySteps access
session_mod._serviceand callsession_mod._reset_session_service(). This is defensible for DI integration testing, but a brief comment explaining why would improve maintainability.Quality Gates
nox -s lintnox -s typecheck83f2f3a0)Refs: #570(not ISSUES CLOSED)Summary
This TDD test PR is solid after 5 review rounds from CoreRasurae, hurui200320, hamza.khyari, and freemo. The latest push fixes the two P1 infrastructure gaps (missing
--listenerinslow_integration_testsand missing diagnostic message in the Behave unexpected-pass branch). The remaining P2/P3 findings are minor and can be addressed in follow-up. Ready for merge.37cd34c18e0f629b2613Disposition — Self-Review #2072 Findings
Addressed all P2/P3 findings from self-review in commit
0f629b26. Rebased onto latestmaster(fcdf80f3).F1 (was P1, fixed previously) —
slow_integration_testsmissing--listenerAlready fixed in prior push. Confirmed still present in
0f629b26.F2 (was P1, fixed previously) — "Unexpected pass" branch sets no error message
Already fixed in prior push. Confirmed still present in
0f629b26.F3 (P2) — CHANGELOG entry vague about infrastructure
Fixed. Expanded CHANGELOG to explicitly describe "a Behave
after_scenariohook infeatures/environment.pythat flips pass/fail for@tdd_expected_failscenarios, and a Robot Framework Listener API v3 plugin (robot/tdd_expected_fail_listener.py) with identical semantics."F4 (P2) —
importlib.reload()/ module reset is fragileFixed. Added inline comment at the
session_mod._service = Noneline explaining the fragility and pointing to the module docstring for full rationale. The step file's module docstring (lines 20-28) already documents why private_serviceaccess is necessary.F5 (P2) — Robot test #2 missing stderr check for
AttributeErrorFixed. Added
Should Not Contain ${create.stderr} AttributeErrorassertion to the "Session Create Then List Shows Created Session" test case, matching the pattern in test case #1.F6 (P2) — Benchmark
track_create_persistsdocstring unclearFixed. Expanded method docstring to explicitly state: "This benchmark constructs
PersistentSessionServicedirectly, bypassing the DI container (_get_session_service/container.db()). It therefore does not reproduce bug #570 — its purpose is to establish a service-layer persistence baseline."F7 (P3) — Private
_serviceaccessAcknowledged. Already documented in module docstring (lines 20-28) with a dedicated "Private API access" section explaining the rationale and maintenance implications. No further action needed.
Quality gates
nox -s typecheck— 0 errors ✓nox -s lint— 0 errors ✓fcdf80f3) ✓Code Review —
test(cli): add failing tests for session create DI container error (#570)Commit:
0f629b26| Branch:feature/m3-fix-session-create-error| Base:masterReviewer: Rui Hu — follow-up to my previous REQUEST_CHANGES review #2008
Prior Review Resolution (my review #2008)
Refs: #570Type/Testingappropriate for test-only PR--format plain--format plain_servicecontext.sce_original_service# type: ignorein benchmark10 of 11 findings resolved. The only remaining item (F3, branch name) was acknowledged and waived by PM (freemo, review #2042).
Verification of Other Reviewers' Findings
CoreRasurae #1980: 9/9 addressed (F1–F2 HIGH fixed, F3/F4/F5/F9 acknowledged, F6–F8 fixed)
hamza.khyari #2034: 12/12 addressed (PROC-1/2/3 major fixed via
@tdd_expected_failinfrastructure, TEST-1 fixed, SEC-1/SEC-3 fixed, remaining acknowledged)freemo #2060: 6/6 addressed (F1 fixed via
@tdd_expected_failtags, F2–F6 fixed/confirmed clean)New Findings (This Review)
N1 — Branch name mismatch (MEDIUM — Process, carried forward)
Issue #570 prescribes
fix/m3-session-create-error. Actual branch:feature/m3-fix-session-create-error. CONTRIBUTING.md § TDD Bug Test Tags specifies TDD branches should usetdd/prefix.Status: Acknowledged by 3 reviewers and the PM. PM explicitly said "not blocking." I accept this waiver and will not block on it.
N2 —
_get_session_service()does not cache to_service(LOW — Note for fix author)File:
src/cleveragents/cli/commands/session.py:69_get_session_service()returns a newPersistentSessionService(...)without assigning to_service. The singleton cache at line 53 (if _service is not None: return _service) is never populated. Currently moot becausecontainer.db()raisesAttributeErrorbefore reaching line 69. The fix developer should add_service = PersistentSessionService(...)before the return statement.Recommendation: Note for the bug-fix PR author — not a concern for this TDD PR.
N3 — PR description mentions
--listenerinintegration_testssession only (LOW — Documentation)The PR description says the listener is registered "in
integration_testssession" but the diff shows it was added to bothintegration_tests(noxfile.py:579–580) andslow_integration_tests(noxfile.py:603–604). The self-review (#2072) notes this was a P1 fix.Recommendation: Update PR description to mention both nox sessions.
Code Quality Assessment
0f629b26, rebased onmastertest(cli):prefix,Refs: #570footer@tdd_bug @tdd_bug_570 @tdd_expected_failtags--format plain@tdd_expected_failinfraenvironment.py:308–327) + Robot listener (tdd_expected_fail_listener.py)integration_testsandslow_integration_tests@tdd @bugNNN→@tdd_bug @tdd_bug_NNN, zero remnants in codebase_servicerestored, container reset, env vars restored, tmpdir removed## Unreleased, references #570# type: ignoresession_id:andtotal:match_session_summary_dict/_session_list_dictoutput keysVerdict: APPROVE
All 10 previously blocking findings from my review #2008 are resolved. The
@tdd_expected_failinversion infrastructure is well-implemented and correctly addresses the CI-blocking concerns raised by hamza.khyari (PROC-1/PROC-2). The tag migration is complete with zero remnants. Test design is sound — Behave scenarios exercise the real DI path via_service = Nonereset, and Robot tests exercise the fullinit→session createsubprocess flow.The two new findings (N2, N3) are LOW severity and informational. N1 (branch name) was waived by PM.
This PR is ready to serve as the TDD regression harness for the #570 bug fix.
0f629b2613c9037a00b5New commits pushed, approval review dismissed automatically according to repository settings
CI Fix: False-positive exit code 1 from behave-parallel runner
Commit:
26b7a0c2Root Cause
The CI failure was not in
repositories_coverage_r2.feature:272(checkpoint prune test) — the garbled parallel output made it look that way. The actual problem was that the@tdd_expected_failinversion inafter_scenariochanged scenario statuses but could not change behave's internal failure tracking.Here's the mechanism:
session_create_error.featurehas 4 scenarios tagged@tdd_expected_failthat are expected to fail (bug #570 is unfixed)Scenario.run()sets its localfailed = Truevariable (line 1202 inbehave/model.py)after_scenariohook fires and flipsscenario.statusfromfailedtopassed✅Scenario.run()returns the localfailedboolean, which is stillTrue— the hook can't modify a local variable in the caller's stack frameFeature.run()incrementsfailed_count,runner.run()returnsTrue_worker_run_featurespropagatedrunner.run()→worker_failed=True→ overallfailed=True→sys.exit(1)Meanwhile,
_extract_summary()readsscenario.status.name(the post-hook value), so the summary correctly showed 0 failures — but the exit code was still 1.Fix
Changed both
_run_features_inprocessand_worker_run_featuresin the embedded_BEHAVE_PARALLEL_CLI_SOURCEto derive thefailedflag from the post-hook summary (via_has_failures(summary)) instead of fromrunner.run(). This ensures the final, post-hook scenario status is authoritative.Verification
nox -e unit_tests: 10,099 scenarios passed, 0 failed — exit code 0 ✅nox -e typecheck: 0 errors ✅nox -e coverage_report: 98% (threshold: 97%) ✅26b7a0c201d0689573e0CI Fix: Intermittent checkpoint test failure under high parallelism
Commit:
4fff7f67Root Cause
The
repositories_coverage_r2.featurecheckpoint scenarios (:249list_by_plan,:272prune) fail intermittently when the behave-parallel runner uses ≥32 worker processes but pass reliably with ≤16.Mechanism — GC-induced implicit ROLLBACK on a shared SQLite connection:
SingletonThreadPool— one connection per thread, shared by all Sessionscreate,prune, etc.) callsself._session_factory()to get a newSession, does work, callsflush(), then returns — the Session goes out of scopeprune()queries the DB and finds 0 rows instead of 5 → returns[]→ assertion failsVerified with instrumented reproduction:
Fix
Replace
sessionmaker(bind=engine)withscoped_session(sessionmaker(bind=engine))in thestep_fresh_dbBackground step (features/steps/repositories_coverage_r2_steps.py:237).scoped_sessionensures everyfactory()call within the same thread returns the same Session instance, so no Session goes out of scope prematurely and no implicit ROLLBACK occurs.Verification
behave-parallel --processes 32: 3 consecutive green runs (10,099 scenarios, 0 failures each)nox -e typecheck: 0 errors ✅nox -e coverage_report: 98% (threshold: 97%) ✅agents session createcauses an error. #570agents session listcauses an error. #554agents session createcauses an error. #570