UAT: Performance - PlanLifecycleService.list_plans() fetches ALL plans from DB then filters in Python, bypassing server-side filtering #4064

Open
opened 2026-04-06 09:50:38 +00:00 by freemo · 0 comments
Owner

Bug Report

What was tested: Query efficiency of PlanLifecycleService.list_plans() in src/cleveragents/application/services/plan_lifecycle_service.py (lines 1249–1295)

Expected behavior:
list_plans() with filter parameters (namespace, phase, project_name) should push those filters down to the database query, returning only matching rows. The LifecyclePlanRepository.list_plans() method already supports server-side filtering with these parameters.

Actual behavior:
PlanLifecycleService.list_plans() calls ctx.lifecycle_plans.list_all() (which fetches ALL plans with no filters), then applies namespace/phase/project_name filtering in Python after loading all rows into memory.

Code location: src/cleveragents/application/services/plan_lifecycle_service.py, lines 1249–1295:

def list_plans(
    self,
    namespace: str | None = None,
    phase: PlanPhase | None = None,
    project_name: str | None = None,
) -> list[Plan]:
    if self._persisted and self.unit_of_work is not None:
        try:
            with self.unit_of_work.transaction() as ctx:
                plans = ctx.lifecycle_plans.list_all()  # ← Fetches ALL plans, no filters!
            # Refresh the in-memory cache with persisted plans.
            for p in plans:
                pid = str(p.identity.plan_id)
                if pid not in self._plans:
                    self._plans[pid] = p
        except DatabaseError:
            plans = list(self._plans.values())
    else:
        plans = list(self._plans.values())

    # Filtering happens in Python AFTER loading all rows
    if namespace:
        plans = [p for p in plans if p.namespaced_name.namespace == namespace]

    if phase:
        plans = [p for p in plans if p.phase == phase]

    if project_name:
        plans = [
            p
            for p in plans
            if any(link.project_name == project_name for link in p.project_links)
        ]

    return plans

The repository already supports server-side filtering:
LifecyclePlanRepository.list_plans() (lines 1543–1591) accepts phase, processing_state, action_name, project_name, namespace, limit, and offset parameters and pushes them to the SQL query. However, the service layer never calls this method — it always calls list_all().

Performance impact:

  • With 10,000 plans in the database, list_plans(namespace="local") loads all 10,000 plans into memory, deserializes them all (including JSON parsing of tags, arguments, invariants, etc.), then discards 9,900 of them
  • Each plan deserialization involves multiple json.loads() calls (tags, arguments, invariants, sandbox_refs, automation_profile, error_details, etc.)
  • The in-memory cache grows unboundedly — every list_plans() call populates self._plans with ALL plans, creating a memory leak in long-running server processes
  • The project_name filter uses any(link.project_name == project_name for link in p.project_links) — this requires loading all project_links relationships for every plan, causing additional N+1 lazy-load queries if relationships aren't eagerly loaded

Steps to reproduce:

  1. Create 1000+ plans across multiple namespaces
  2. Call list_plans(namespace="local")
  3. Observe that all plans are loaded from DB regardless of namespace filter

Fix:
Use the repository's list_plans() method with server-side filters instead of list_all():

def list_plans(
    self,
    namespace: str | None = None,
    phase: PlanPhase | None = None,
    project_name: str | None = None,
) -> list[Plan]:
    if self._persisted and self.unit_of_work is not None:
        try:
            with self.unit_of_work.transaction() as ctx:
                # Use server-side filtering instead of list_all()
                plans = ctx.lifecycle_plans.list_plans(
                    namespace=namespace,
                    phase=phase.value if phase else None,
                    project_name=project_name,
                )
            return plans
        except DatabaseError:
            pass  # Fall through to in-memory cache

    # In-memory fallback
    plans = list(self._plans.values())
    if namespace:
        plans = [p for p in plans if p.namespaced_name.namespace == namespace]
    if phase:
        plans = [p for p in plans if p.phase == phase]
    if project_name:
        plans = [
            p for p in plans
            if any(link.project_name == project_name for link in p.project_links)
        ]
    return plans

Severity: High — this is a full-table scan on every list_plans() call with any filter. At scale (thousands of plans), this will cause severe memory pressure and slow response times for all plan listing operations including the CLI agents plan list command.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Bug Report **What was tested:** Query efficiency of `PlanLifecycleService.list_plans()` in `src/cleveragents/application/services/plan_lifecycle_service.py` (lines 1249–1295) **Expected behavior:** `list_plans()` with filter parameters (`namespace`, `phase`, `project_name`) should push those filters down to the database query, returning only matching rows. The `LifecyclePlanRepository.list_plans()` method already supports server-side filtering with these parameters. **Actual behavior:** `PlanLifecycleService.list_plans()` calls `ctx.lifecycle_plans.list_all()` (which fetches ALL plans with no filters), then applies namespace/phase/project_name filtering in Python after loading all rows into memory. **Code location:** `src/cleveragents/application/services/plan_lifecycle_service.py`, lines 1249–1295: ```python def list_plans( self, namespace: str | None = None, phase: PlanPhase | None = None, project_name: str | None = None, ) -> list[Plan]: if self._persisted and self.unit_of_work is not None: try: with self.unit_of_work.transaction() as ctx: plans = ctx.lifecycle_plans.list_all() # ← Fetches ALL plans, no filters! # Refresh the in-memory cache with persisted plans. for p in plans: pid = str(p.identity.plan_id) if pid not in self._plans: self._plans[pid] = p except DatabaseError: plans = list(self._plans.values()) else: plans = list(self._plans.values()) # Filtering happens in Python AFTER loading all rows if namespace: plans = [p for p in plans if p.namespaced_name.namespace == namespace] if phase: plans = [p for p in plans if p.phase == phase] if project_name: plans = [ p for p in plans if any(link.project_name == project_name for link in p.project_links) ] return plans ``` **The repository already supports server-side filtering:** `LifecyclePlanRepository.list_plans()` (lines 1543–1591) accepts `phase`, `processing_state`, `action_name`, `project_name`, `namespace`, `limit`, and `offset` parameters and pushes them to the SQL query. However, the service layer never calls this method — it always calls `list_all()`. **Performance impact:** - With 10,000 plans in the database, `list_plans(namespace="local")` loads all 10,000 plans into memory, deserializes them all (including JSON parsing of tags, arguments, invariants, etc.), then discards 9,900 of them - Each plan deserialization involves multiple `json.loads()` calls (tags, arguments, invariants, sandbox_refs, automation_profile, error_details, etc.) - The in-memory cache grows unboundedly — every `list_plans()` call populates `self._plans` with ALL plans, creating a memory leak in long-running server processes - The `project_name` filter uses `any(link.project_name == project_name for link in p.project_links)` — this requires loading all `project_links` relationships for every plan, causing additional N+1 lazy-load queries if relationships aren't eagerly loaded **Steps to reproduce:** 1. Create 1000+ plans across multiple namespaces 2. Call `list_plans(namespace="local")` 3. Observe that all plans are loaded from DB regardless of namespace filter **Fix:** Use the repository's `list_plans()` method with server-side filters instead of `list_all()`: ```python def list_plans( self, namespace: str | None = None, phase: PlanPhase | None = None, project_name: str | None = None, ) -> list[Plan]: if self._persisted and self.unit_of_work is not None: try: with self.unit_of_work.transaction() as ctx: # Use server-side filtering instead of list_all() plans = ctx.lifecycle_plans.list_plans( namespace=namespace, phase=phase.value if phase else None, project_name=project_name, ) return plans except DatabaseError: pass # Fall through to in-memory cache # In-memory fallback plans = list(self._plans.values()) if namespace: plans = [p for p in plans if p.namespaced_name.namespace == namespace] if phase: plans = [p for p in plans if p.phase == phase] if project_name: plans = [ p for p in plans if any(link.project_name == project_name for link in p.project_links) ] return plans ``` **Severity:** High — this is a full-table scan on every `list_plans()` call with any filter. At scale (thousands of plans), this will cause severe memory pressure and slow response times for all plan listing operations including the CLI `agents plan list` command. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:30 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4064
No description provided.