UAT: Performance - N+1 query pattern in DecisionRepository.get_tree() causes O(N) database round-trips for decision trees #4063

Open
opened 2026-04-06 09:50:05 +00:00 by freemo · 0 comments
Owner

Bug Report

What was tested: Database query efficiency of DecisionRepository.get_tree() in src/cleveragents/infrastructure/database/repositories.py (lines 5238–5287)

Expected behavior:
Fetching a decision tree should use a single SQL query (or at most a small constant number of queries) to retrieve all nodes, regardless of tree depth or breadth.

Actual behavior:
get_tree() implements a BFS traversal that issues one database query per node in the decision tree. For a tree with N nodes, this results in N+1 database round-trips (1 for the root + 1 per parent node to fetch its children).

Code location: src/cleveragents/infrastructure/database/repositories.py, lines 5238–5287:

def get_tree(self, root_decision_id: str) -> list[Decision]:
    session = self._session()
    # Fetch root — 1 query
    root_row = session.query(DecisionModel).filter_by(decision_id=root_decision_id).first()
    
    result: list[Decision] = [root_row.to_domain()]
    bfs_queue: deque[str] = deque([root_decision_id])
    while bfs_queue:
        parent_id = bfs_queue.popleft()
        # ONE QUERY PER NODE — N+1 pattern
        children = (
            session.query(DecisionModel)
            .filter_by(parent_decision_id=parent_id)
            .order_by(DecisionModel.sequence_number)
            .all()
        )
        for child in children:
            result.append(child.to_domain())
            bfs_queue.append(cast(str, child.decision_id))
    return result

Performance impact:

  • A decision tree with 50 nodes causes 51 database round-trips
  • A decision tree with 500 nodes causes 501 database round-trips
  • Each round-trip has network latency overhead (even for local SQLite, there is syscall overhead)
  • get_tree() is called during plan phase transitions and invariant reconciliation — it is on the critical path for plan execution
  • Decision trees grow with plan complexity; deeply nested correction/decision trees are common in long-running plans

Steps to reproduce:

  1. Create a plan with a complex decision tree (50+ nodes)
  2. Call get_tree(root_decision_id)
  3. Enable SQLAlchemy query logging and observe N+1 queries

Fix options:

Option 1 (Recommended): Single recursive CTE query (PostgreSQL/SQLite 3.35+)

WITH RECURSIVE tree AS (
    SELECT * FROM decisions WHERE decision_id = :root_id
    UNION ALL
    SELECT d.* FROM decisions d
    JOIN tree t ON d.parent_decision_id = t.decision_id
)
SELECT * FROM tree ORDER BY sequence_number;

Option 2: Fetch all descendants in one query, then reconstruct tree in Python

def get_tree(self, root_decision_id: str) -> list[Decision]:
    session = self._session()
    # Fetch root first to validate it exists
    root_row = session.query(DecisionModel).filter_by(decision_id=root_decision_id).first()
    if root_row is None:
        raise DecisionNotFoundError(root_decision_id)
    
    # Fetch ALL decisions for this plan in ONE query
    plan_id = root_row.plan_id
    all_rows = (
        session.query(DecisionModel)
        .filter_by(plan_id=plan_id)
        .order_by(DecisionModel.sequence_number)
        .all()
    )
    
    # Reconstruct tree order in Python (O(N) with dict lookup)
    by_id = {row.decision_id: row for row in all_rows}
    children_map: dict[str | None, list] = {}
    for row in all_rows:
        parent = row.parent_decision_id
        children_map.setdefault(parent, []).append(row)
    
    # BFS using in-memory data (no more DB queries)
    result = []
    queue = deque([root_decision_id])
    while queue:
        current_id = queue.popleft()
        if current_id in by_id:
            result.append(by_id[current_id].to_domain())
            for child in children_map.get(current_id, []):
                queue.append(child.decision_id)
    return result

Severity: High — this is a classic N+1 query anti-pattern on a hot path. Decision trees are central to the plan lifecycle and invariant reconciliation system.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Bug Report **What was tested:** Database query efficiency of `DecisionRepository.get_tree()` in `src/cleveragents/infrastructure/database/repositories.py` (lines 5238–5287) **Expected behavior:** Fetching a decision tree should use a single SQL query (or at most a small constant number of queries) to retrieve all nodes, regardless of tree depth or breadth. **Actual behavior:** `get_tree()` implements a BFS traversal that issues **one database query per node** in the decision tree. For a tree with N nodes, this results in N+1 database round-trips (1 for the root + 1 per parent node to fetch its children). **Code location:** `src/cleveragents/infrastructure/database/repositories.py`, lines 5238–5287: ```python def get_tree(self, root_decision_id: str) -> list[Decision]: session = self._session() # Fetch root — 1 query root_row = session.query(DecisionModel).filter_by(decision_id=root_decision_id).first() result: list[Decision] = [root_row.to_domain()] bfs_queue: deque[str] = deque([root_decision_id]) while bfs_queue: parent_id = bfs_queue.popleft() # ONE QUERY PER NODE — N+1 pattern children = ( session.query(DecisionModel) .filter_by(parent_decision_id=parent_id) .order_by(DecisionModel.sequence_number) .all() ) for child in children: result.append(child.to_domain()) bfs_queue.append(cast(str, child.decision_id)) return result ``` **Performance impact:** - A decision tree with 50 nodes causes 51 database round-trips - A decision tree with 500 nodes causes 501 database round-trips - Each round-trip has network latency overhead (even for local SQLite, there is syscall overhead) - `get_tree()` is called during plan phase transitions and invariant reconciliation — it is on the critical path for plan execution - Decision trees grow with plan complexity; deeply nested correction/decision trees are common in long-running plans **Steps to reproduce:** 1. Create a plan with a complex decision tree (50+ nodes) 2. Call `get_tree(root_decision_id)` 3. Enable SQLAlchemy query logging and observe N+1 queries **Fix options:** **Option 1 (Recommended): Single recursive CTE query (PostgreSQL/SQLite 3.35+)** ```sql WITH RECURSIVE tree AS ( SELECT * FROM decisions WHERE decision_id = :root_id UNION ALL SELECT d.* FROM decisions d JOIN tree t ON d.parent_decision_id = t.decision_id ) SELECT * FROM tree ORDER BY sequence_number; ``` **Option 2: Fetch all descendants in one query, then reconstruct tree in Python** ```python def get_tree(self, root_decision_id: str) -> list[Decision]: session = self._session() # Fetch root first to validate it exists root_row = session.query(DecisionModel).filter_by(decision_id=root_decision_id).first() if root_row is None: raise DecisionNotFoundError(root_decision_id) # Fetch ALL decisions for this plan in ONE query plan_id = root_row.plan_id all_rows = ( session.query(DecisionModel) .filter_by(plan_id=plan_id) .order_by(DecisionModel.sequence_number) .all() ) # Reconstruct tree order in Python (O(N) with dict lookup) by_id = {row.decision_id: row for row in all_rows} children_map: dict[str | None, list] = {} for row in all_rows: parent = row.parent_decision_id children_map.setdefault(parent, []).append(row) # BFS using in-memory data (no more DB queries) result = [] queue = deque([root_decision_id]) while queue: current_id = queue.popleft() if current_id in by_id: result.append(by_id[current_id].to_domain()) for child in children_map.get(current_id, []): queue.append(child.decision_id) return result ``` **Severity:** High — this is a classic N+1 query anti-pattern on a hot path. Decision trees are central to the plan lifecycle and invariant reconciliation system. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:30 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4063
No description provided.