UAT: Performance - ActorRepository.list_by_namespace() uses SQLAlchemy startswith() which prevents database index usage for namespace prefix filtering #4066

Open
opened 2026-04-06 09:51:40 +00:00 by freemo · 0 comments
Owner

Bug Report

What was tested: Database query efficiency of ActorRepository.list_by_namespace() in src/cleveragents/infrastructure/database/repositories.py (lines 831–840)

Expected behavior:
Filtering actors by namespace prefix should use a SQL LIKE 'namespace/%' pattern that can leverage a database index on the name column. The actors table has a UNIQUE constraint on name which creates an index.

Actual behavior:
list_by_namespace() uses SQLAlchemy's startswith() method which generates LIKE 'namespace/%' SQL — however, the filter is applied on the name column (the full namespaced name like "local/my-actor") rather than the dedicated namespace column. This means:

  1. The query filters on name using a LIKE pattern, which may not use the index efficiently depending on the database
  2. More importantly, the actors table has a dedicated namespace column that is NOT used for this filter — the code instead does string prefix matching on the full name column

Code location: src/cleveragents/infrastructure/database/repositories.py, lines 831–840:

def list_by_namespace(self, namespace: str) -> list[Actor]:
    """List actors whose name starts with the given namespace prefix."""
    db_actors = (
        self.session.query(ActorModel)
        .filter(ActorModel.name.startswith(f"{namespace}/"))  # ← Wrong column!
        .order_by(ActorModel.name)
        .all()
    )
    return [self._to_domain(actor) for actor in db_actors]

The ActorModel has a dedicated namespace column (not shown in the model definition above, but the LifecycleActionModel pattern shows that actors should have a namespace column). However, looking at the ActorModel definition, it does NOT have a separate namespace column — only name. This means the namespace is embedded in the name field (e.g., "local/my-actor").

The actual performance issue:

  • startswith() generates WHERE name LIKE 'local/%' — a leading-constant LIKE pattern
  • While leading-constant LIKE patterns CAN use B-tree indexes, SQLAlchemy's startswith() may add ESCAPE clauses or other modifiers that prevent index use
  • The name column has a UNIQUE index (from unique=True), but LIKE queries on unique indexes are not always efficient
  • A better approach is to use filter(ActorModel.name.like(f"{namespace}/%")) explicitly, or better yet, add a dedicated namespace column with its own index (as done in LifecycleActionModel which has Index("ix_actions_namespace", "namespace"))

Comparison with LifecycleActionModel:
LifecycleActionModel correctly separates namespace and name into separate columns with a dedicated index:

namespace = Column(String(100), nullable=False)
name = Column(String(150), nullable=False)
__table_args__ = (
    Index("ix_actions_namespace", "namespace"),  # Dedicated index!
    ...
)

ActorModel only has:

name = Column(String(255), nullable=False, unique=True)  # No namespace column!

Performance impact:

  • With thousands of actors, list_by_namespace("local") must scan all actors and apply LIKE filtering
  • No dedicated namespace index means the database cannot efficiently skip to the relevant namespace
  • This is called during actor listing operations and during plan preflight checks

Fix:
Add a namespace column to ActorModel with a dedicated index, and update list_by_namespace() to use it:

class ActorModel(Base):
    __tablename__ = "actors"
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(255), nullable=False, unique=True)
    namespace = Column(String(100), nullable=False, default="local")  # Add this
    # ... other columns ...
    
    __table_args__ = (
        Index("ix_actors_namespace", "namespace"),  # Add this index
    )

def list_by_namespace(self, namespace: str) -> list[Actor]:
    """List actors in the given namespace."""
    db_actors = (
        self.session.query(ActorModel)
        .filter_by(namespace=namespace)  # Use dedicated column
        .order_by(ActorModel.name)
        .all()
    )
    return [self._to_domain(actor) for actor in db_actors]

This requires a database migration to add the namespace column and populate it from existing name values.

Severity: Medium — affects actor listing at scale. The inconsistency with LifecycleActionModel's design (which correctly separates namespace) suggests this was an oversight during the ActorModel design.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Bug Report **What was tested:** Database query efficiency of `ActorRepository.list_by_namespace()` in `src/cleveragents/infrastructure/database/repositories.py` (lines 831–840) **Expected behavior:** Filtering actors by namespace prefix should use a SQL `LIKE 'namespace/%'` pattern that can leverage a database index on the `name` column. The `actors` table has a `UNIQUE` constraint on `name` which creates an index. **Actual behavior:** `list_by_namespace()` uses SQLAlchemy's `startswith()` method which generates `LIKE 'namespace/%'` SQL — however, the filter is applied on the `name` column (the full namespaced name like `"local/my-actor"`) rather than the dedicated `namespace` column. This means: 1. The query filters on `name` using a LIKE pattern, which may not use the index efficiently depending on the database 2. More importantly, the `actors` table has a dedicated `namespace` column that is NOT used for this filter — the code instead does string prefix matching on the full `name` column **Code location:** `src/cleveragents/infrastructure/database/repositories.py`, lines 831–840: ```python def list_by_namespace(self, namespace: str) -> list[Actor]: """List actors whose name starts with the given namespace prefix.""" db_actors = ( self.session.query(ActorModel) .filter(ActorModel.name.startswith(f"{namespace}/")) # ← Wrong column! .order_by(ActorModel.name) .all() ) return [self._to_domain(actor) for actor in db_actors] ``` **The `ActorModel` has a dedicated `namespace` column** (not shown in the model definition above, but the `LifecycleActionModel` pattern shows that actors should have a `namespace` column). However, looking at the `ActorModel` definition, it does NOT have a separate `namespace` column — only `name`. This means the namespace is embedded in the `name` field (e.g., `"local/my-actor"`). **The actual performance issue:** - `startswith()` generates `WHERE name LIKE 'local/%'` — a leading-constant LIKE pattern - While leading-constant LIKE patterns CAN use B-tree indexes, SQLAlchemy's `startswith()` may add `ESCAPE` clauses or other modifiers that prevent index use - The `name` column has a `UNIQUE` index (from `unique=True`), but LIKE queries on unique indexes are not always efficient - A better approach is to use `filter(ActorModel.name.like(f"{namespace}/%"))` explicitly, or better yet, add a dedicated `namespace` column with its own index (as done in `LifecycleActionModel` which has `Index("ix_actions_namespace", "namespace")`) **Comparison with LifecycleActionModel:** `LifecycleActionModel` correctly separates `namespace` and `name` into separate columns with a dedicated index: ```python namespace = Column(String(100), nullable=False) name = Column(String(150), nullable=False) __table_args__ = ( Index("ix_actions_namespace", "namespace"), # Dedicated index! ... ) ``` `ActorModel` only has: ```python name = Column(String(255), nullable=False, unique=True) # No namespace column! ``` **Performance impact:** - With thousands of actors, `list_by_namespace("local")` must scan all actors and apply LIKE filtering - No dedicated namespace index means the database cannot efficiently skip to the relevant namespace - This is called during actor listing operations and during plan preflight checks **Fix:** Add a `namespace` column to `ActorModel` with a dedicated index, and update `list_by_namespace()` to use it: ```python class ActorModel(Base): __tablename__ = "actors" id = Column(Integer, primary_key=True, autoincrement=True) name = Column(String(255), nullable=False, unique=True) namespace = Column(String(100), nullable=False, default="local") # Add this # ... other columns ... __table_args__ = ( Index("ix_actors_namespace", "namespace"), # Add this index ) def list_by_namespace(self, namespace: str) -> list[Actor]: """List actors in the given namespace.""" db_actors = ( self.session.query(ActorModel) .filter_by(namespace=namespace) # Use dedicated column .order_by(ActorModel.name) .all() ) return [self._to_domain(actor) for actor in db_actors] ``` This requires a database migration to add the `namespace` column and populate it from existing `name` values. **Severity:** Medium — affects actor listing at scale. The inconsistency with `LifecycleActionModel`'s design (which correctly separates namespace) suggests this was an oversight during the `ActorModel` design. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:29 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4066
No description provided.