feat(service): add decision recording and snapshot store #433
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Blocks
#172 feat(service): add decision recording and snapshot store
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!433
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m4-decision-service"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements
DecisionServicewith record/list/tree helpers andSnapshotStorefor Forgejo issue #172.What
DecisionService— dual-mode (in-memory / DB-backed) service with:record_decision()with auto-sequencing, auto-snapshot capture, string-to-enum coercionget_decision(),list_decisions(),list_by_type()retrievalget_tree()(BFS),get_path_to_root()tree navigationmark_superseded(),get_superseded()supersession supportdelete_decision()with snapshot cleanupcount_decisions(),get_next_sequence()statisticsSnapshotStore— hash-based deduplication index forContextSnapshotobjectsDuplicateDecisionError,DecisionNotFoundError,SequenceConflictErrorTests
features/decision_recording.feature(all passing)robot/decision_recording.robotwith helper scriptDocs
docs/reference/decision_service.mdcovering all public API, dual-mode persistence, and exceptionsNote on cherry-picked commits
This branch includes 2 cherry-picked commits from PR #131 (decision persistence) to provide
DecisionRepositoryfor the UoW code paths. When #131 merges to master, a rebase will make these cherry-picks disappear cleanly.Closes #172
Dependencies
cd5ad396420f4b8a967fCode Review — PR #433 / Issue #172:
feat(service): add decision recording and snapshot storeBranch:
feature/m4-decision-serviceReviewed commit:
cd641a8(last by Hamza, 7-commit chain from5539fd0..cd641a8)Scope: 15 files changed, ~2600 insertions, ~160 deletions
Review focus: bugs, security, performance, test coverage/quality, spec conformance
1. Issue #172 Acceptance Criteria Compliance
DecisionServicewith record/list/tree helpers + snapshot storagedecision_service.pydocs/reference/decision_service.mdfeatures/decision_recording.featurerobot/decision_recording.robotbenchmarks/decision_recording_bench.pynox -s coverage_reportnox(all sessions), fix any errors2. Bugs
BUG-1 [HIGH] — Sequence numbers not rehydrated from DB on restart
File:
decision_service.py:203, 622-626_plan_sequenceis an in-memorydictinitialized to{}. In persisted mode, if the service restarts against an existing database with decisions already recorded,_next_sequence()starts from 0 again, causingsequence_numbercollisions. The spec (line 18398, 18470) requiressequence_numberto be monotonically increasing per plan.BUG-2 [HIGH] —
SequenceConflictErrordefined but never raisedFile:
decision_service.py:74-82This exception is defined, exported in
__all__, and whitelisted in vulture — but never actually raised anywhere. If BUG-1 causes a collision, there is no guard to catch it. The error exists as dead code suggesting uniqueness enforcement was intended but never implemented.BUG-3 [MEDIUM] —
delete_decisioninconsistent between modesFile:
decision_service.py:537-560In persisted mode, if the underlying repository's
delete()silently succeeds for a non-existent ID (does not raise), the method returnsTruewithout error. In in-memory mode, it correctly raisesDecisionNotFoundError. The behavior diverges depending on the backend.BUG-4 [MEDIUM] —
mark_supersededdoes not validatenew_decision_idexistsFile:
decision_service.py:483-523The spec (line 18196) states "the original and all its descendants are marked
superseded_by = <new_decision_id>." The service only marks the single targeted decision but never verifies the replacement decision actually exists — this can create dangling references.BUG-5 [MEDIUM] —
get_treesilently drops orphaned subtreesFile:
decision_service.py:383-425If a decision's
parent_decision_idreferences a deleted parent, BFS from roots never reaches it. Theif not rootsfallback only triggers when there are zero roots. Orphaned subtrees are silently omitted from the returned tree, violating the spec's "Reachability" invariant (spec line 18203).BUG-6 [LOW] —
SnapshotStoreis purely in-memory, lost on restartFile:
decision_service.py:90-164Snapshots stored in
SnapshotStore(the hash-indexed secondary store) are never persisted to the database. After a service restart,get_snapshot()andget_snapshots_for_plan()returnNone/empty even though the decisions (with embedded snapshots) still exist in the DB. The spec (line 18421-18425) describes the context snapshot as a replay-critical artifact.BUG-7 [LOW] — Dead code in
PlanLifecycleServiceFile:
plan_lifecycle_service.py:177-188_decision_seqdict and_next_seq()method are never called anywhere in the service._try_record_decisiondelegates sequence numbering toDecisionService.record_decision(). These are dead code, masked by vulture whitelist entries atvulture_whitelist.py:339-340.3. Security Issues
SEC-1 [MEDIUM] — No size limits on
actor_reasoningfieldFile:
decision_service.py:249Raw LLM reasoning traces can be arbitrarily large. No bounds are enforced before storage or logging. This creates risk for log injection and storage exhaustion attacks.
SEC-2 [LOW] — Truncated SHA-256 hash with misleading prefix
File:
decision_service.py:672_auto_capture_snapshottruncates the SHA-256 to 16 hex chars (64 bits) but prefixes it withsha256:, implying a full cryptographic hash. If any downstream code relies on this for integrity verification, the collision resistance is far lower than expected.SEC-3 [LOW] —
SnapshotStore._hash_indexgrows without boundFile:
decision_service.py:99No eviction policy. A caller generating many unique snapshots causes unbounded memory growth.
SEC-4 [LOW] —
tempfile.mktempusage in test step fileFile:
features/steps/decision_recording_steps.py:966tempfile.mktempis deprecated due to a TOCTOU race condition. This is test-only but is a pattern repeated across 17 places in the test suite.4. Performance Issues
PERF-1 [MEDIUM] —
count_decisionsloads all objects just to count themFile:
decision_service.py:593-602count_decisions(plan_id)callslist_decisions(plan_id)which deserializes every decision from the DB, builds a Python list, then takeslen(). Should delegate toCOUNT(*)in persisted mode.PERF-2 [MEDIUM] — No pagination on any list method
Files:
decision_service.py—list_decisions,list_by_type,get_tree,get_superseded,get_snapshots_for_planNone of these support
limit/offset. For plans with hundreds or thousands of decisions, this returns unbounded result sets.PERF-3 [LOW] —
get_treeandget_supersededload full decision setFile:
decision_service.py:395, 476Both call
list_decisions()to load the entire plan's decisions into memory, then perform in-memory filtering/traversal.PERF-4 [LOW] —
SnapshotStore._hash_indexuseslistinstead ofsetFile:
decision_service.py:99, 145list.remove(decision_id)is O(n) for each removal. Using asetwould be O(1).5. Test Coverage and Quality Issues
TEST-1 [HIGH] — BFS order not actually verified in tree tests
Files:
features/decision_recording.feature:103,robot/helper_decision_recording.py:99The "Get tree returns BFS order from root" scenario asserts count and root-first, but never verifies actual BFS ordering. The level-order property (all depth-1 children before depth-2 grandchildren) is not checked. A broken BFS that returns DFS order would pass.
TEST-2 [HIGH] — No test for sequence rehydration after restart
Files: All test files
No test creates decisions, destroys the service, recreates it against the same DB, and verifies that sequence numbers continue from where they left off. This is exactly the scenario where BUG-1 manifests.
TEST-3 [HIGH] —
MagicMock()withoutspec=hides bugsFiles:
decision_di_wiring_steps.py:111,helper_decision_di.py:87,decision_di_bench.py:153All use
MagicMock()forsettingswithoutspec=Settings. This silently creates auto-attributes on access, meaning code that readssettings.nonexistent_propertygets a silent mock object instead of failing. Should usecreate_autospec(Settings).TEST-4 [MEDIUM] —
UnitOfWork.__new__()anti-pattern in 4 placesFiles:
decision_di_wiring_steps.py:82,helper_decision_di.py:37,decision_di_bench.py:67,decision_recording_bench.pyAll bypass
UnitOfWork.__init__()to directly set private attributes. If__init__changes, all 4 break silently. Should be a shared fixture factory.TEST-5 [MEDIUM] — DI wiring assertions are too weak (
>= 1)Files:
decision_di_wiring_steps.py:145, 194len(strategy_decisions) >= 1passes even if 100 spurious decisions are recorded. Should assert the exact expected count.TEST-6 [MEDIUM] — No test for invalid
decision_typestringNo scenario passes an invalid string like
"garbage"torecord_decision(decision_type=...)and asserts aValueErroris raised.TEST-7 [MEDIUM] —
confidence_scoreboundary values never testedTests only use
0.85. Never exercises0.0,1.0, negative, or>1.0. The domain model hasge=0.0, le=1.0validators — untested at boundaries.TEST-8 [LOW] —
get_decisionreturn not validated beyonddecision_idFile:
decision_recording_steps.py:652step_match_recordedonly comparesdecision_id, notquestion,chosen_option,decision_type, or any other field. A buggyget_decisionreturning a corrupted object would pass.TEST-9 [LOW] — Robot tests are strict subset of Behave tests
File:
robot/decision_recording.robot7 Robot test cases duplicate existing Behave scenarios with no unique coverage. No error-path, persisted-mode, or negative tests in Robot.
TEST-10 [LOW] — No benchmarks for persisted (DB-backed) mode
Files:
benchmarks/decision_recording_bench.py,benchmarks/decision_di_bench.pyAll benchmarks use in-memory
DecisionService(). The DB-backed path (with SQLAlchemy overhead, transactions, and serialization) has fundamentally different performance characteristics and is not benchmarked.6. Specification Conformance Issues
SPEC-1 [MEDIUM] — Missing uniqueness enforcement per spec schema
The spec (line 18470) defines
sequence_number INTEGER NOT NULLin the decisions table. The existence ofSequenceConflictErrorin the codebase implies uniqueness enforcement was planned. It is not implemented, and BUG-1 makes collisions possible.SPEC-2 [LOW] —
__init__.pymissingSequenceConflictErrorexportFile:
src/cleveragents/application/services/__init__.pySequenceConflictErroris indecision_service.__all__but not re-exported from the services package__init__.py. Also,PlanLifecycleServiceand its exceptions are absent from__init__.pydespite being a peer service.SPEC-3 [LOW] — Doc claim that retrieval methods raise
DecisionNotFoundErrorFile:
docs/reference/decision_service.md:75States "All raise
DecisionNotFoundErrorwhen the target decision does not exist (where applicable)." However,list_decisionsandlist_by_typereturn empty lists, not exceptions.SPEC-4 [LOW] —
artifacts_producedtyped aslist[Any]vs spec'slist[ArtifactRef]File:
decision_service.py:230The
record_decisionmethod acceptslist[Any]forartifacts_produced, but the spec (line 18442) and domain model both define it aslist[ArtifactRef]. The loose typing defeats compile-time checks.7. Last Commit Review (
cd641a8)The final commit ("fix(service): align test and doc refs with DecisionService API") is a clean-up commit that fixes stale references to old method names (
get_decisions_for_plan->list_decisions,get_decision_tree->get_tree) across 4 files. The diff is correct and no stale references remain.One note: the commit message contains
ISSUES CLOSED: #172, which will auto-close the issue on merge, but the issue's acceptance criteria (coverage >= 97%, fullnoxpass) are not confirmed in the commit chain.Summary — Priority Ranking
_plan_sequencenot rehydrated from DB — sequence collisions after restartSequenceConflictErrordead code — no uniqueness guard existsMagicMock()withoutspec=silently hides bugs in 3 filesdelete_decisionbehavior diverges between persisted/in-memory modesmark_supersededdoesn't validate replacement decision existsget_treesilently drops orphaned decisionsactor_reasoning(log injection / storage exhaustion)count_decisionsloads all objects just to countUnitOfWork.__new__()anti-pattern in 4 places>= 1instead of exact countsdecision_typestring never testedconfidence_scoreboundaries never testedSnapshotStorelost on restart (not persisted)_decision_seq/_next_seqinPlanLifecycleServicetempfile.mktempRecommendation: The HIGH-severity items (BUG-1, BUG-2, TEST-1, TEST-2, TEST-3) should be addressed before merge. BUG-1 in particular is a data integrity risk in any deployment that restarts the service, and the lack of a test for it (TEST-2) means it would go undetected until production.
cd641a89312a9606cb722a9606cb72c2f8bc0eafNew commits pushed, approval review dismissed automatically according to repository settings
c2f8bc0eaf33a9d1e7d833a9d1e7d80e36755db9