UAT: Performance - BFS graph traversal uses list.pop(0) O(n) instead of deque.popleft() O(1) in ResourceRepository #4062

Open
opened 2026-04-06 09:49:19 +00:00 by freemo · 0 comments
Owner

Bug Report

What was tested: Performance of resource graph traversal in ResourceRepository._get_ancestors() and ResourceRepository._build_cycle_path() in src/cleveragents/infrastructure/database/repositories.py

Expected behavior:
BFS (Breadth-First Search) traversal should use collections.deque with popleft() for O(1) front-of-queue removal, as is standard practice for BFS algorithms.

Actual behavior:
Both _get_ancestors() (line 2701) and _build_cycle_path() (line 2724) implement BFS using a plain Python list with queue.pop(0). This is an O(n) operation per dequeue because Python lists must shift all remaining elements left after removing from index 0.

Code locations:

_get_ancestors() at line 2709–2721:

queue: list[str] = [resource_id]
while queue:
    current = queue.pop(0)  # O(n) — should be deque.popleft() O(1)
    ...
    queue.append(pid)

_build_cycle_path() at line 2737–2751:

queue: list[str] = [parent_id]
while queue and not found:
    current = queue.pop(0)  # O(n) — should be deque.popleft() O(1)
    ...
    queue.append(pid)

Performance impact:

  • For a resource graph with N nodes, the total cost of BFS is O(N²) instead of O(N)
  • _get_ancestors() is called on every add_child() call for cycle detection — this is on the hot path for resource graph construction
  • _build_cycle_path() is called when a cycle is detected to generate error messages
  • At scale (e.g., 1000+ resources in a DAG), this degrades from ~1000 operations to ~1,000,000 operations for a single cycle check
  • The deque import is already present at line 62 (from collections import deque) but is only used in DecisionRepository.get_tree(), not in these BFS methods

Steps to reproduce:

  1. Create a resource DAG with 500+ nodes
  2. Call add_child() repeatedly — each call triggers _get_ancestors() with O(n²) BFS
  3. Observe quadratic slowdown as the graph grows

Fix:
Replace list with deque in both methods:

from collections import deque

def _get_ancestors(session: Session, resource_id: str) -> set[str]:
    visited: set[str] = set()
    queue: deque[str] = deque([resource_id])  # Use deque
    while queue:
        current = queue.popleft()  # O(1) instead of O(n)
        if current in visited:
            continue
        visited.add(current)
        parent_links = (
            session.query(ResourceLinkModel).filter_by(child_id=current).all()
        )
        for link in parent_links:
            pid = cast(str, link.parent_id)
            if pid not in visited:
                queue.append(pid)
    return visited

Same fix applies to _build_cycle_path().

Severity: Medium — affects all resource graph operations at scale. The deque is already imported in the file, making this a trivial fix.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Bug Report **What was tested:** Performance of resource graph traversal in `ResourceRepository._get_ancestors()` and `ResourceRepository._build_cycle_path()` in `src/cleveragents/infrastructure/database/repositories.py` **Expected behavior:** BFS (Breadth-First Search) traversal should use `collections.deque` with `popleft()` for O(1) front-of-queue removal, as is standard practice for BFS algorithms. **Actual behavior:** Both `_get_ancestors()` (line 2701) and `_build_cycle_path()` (line 2724) implement BFS using a plain Python `list` with `queue.pop(0)`. This is an O(n) operation per dequeue because Python lists must shift all remaining elements left after removing from index 0. **Code locations:** `_get_ancestors()` at line 2709–2721: ```python queue: list[str] = [resource_id] while queue: current = queue.pop(0) # O(n) — should be deque.popleft() O(1) ... queue.append(pid) ``` `_build_cycle_path()` at line 2737–2751: ```python queue: list[str] = [parent_id] while queue and not found: current = queue.pop(0) # O(n) — should be deque.popleft() O(1) ... queue.append(pid) ``` **Performance impact:** - For a resource graph with N nodes, the total cost of BFS is O(N²) instead of O(N) - `_get_ancestors()` is called on every `add_child()` call for cycle detection — this is on the hot path for resource graph construction - `_build_cycle_path()` is called when a cycle is detected to generate error messages - At scale (e.g., 1000+ resources in a DAG), this degrades from ~1000 operations to ~1,000,000 operations for a single cycle check - The `deque` import is already present at line 62 (`from collections import deque`) but is only used in `DecisionRepository.get_tree()`, not in these BFS methods **Steps to reproduce:** 1. Create a resource DAG with 500+ nodes 2. Call `add_child()` repeatedly — each call triggers `_get_ancestors()` with O(n²) BFS 3. Observe quadratic slowdown as the graph grows **Fix:** Replace `list` with `deque` in both methods: ```python from collections import deque def _get_ancestors(session: Session, resource_id: str) -> set[str]: visited: set[str] = set() queue: deque[str] = deque([resource_id]) # Use deque while queue: current = queue.popleft() # O(1) instead of O(n) if current in visited: continue visited.add(current) parent_links = ( session.query(ResourceLinkModel).filter_by(child_id=current).all() ) for link in parent_links: pid = cast(str, link.parent_id) if pid not in visited: queue.append(pid) return visited ``` Same fix applies to `_build_cycle_path()`. **Severity:** Medium — affects all resource graph operations at scale. The `deque` is already imported in the file, making this a trivial fix. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:31 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4062
No description provided.