Proposal: update specification — document context tier hydration from linked project resources #7365

Closed
opened 2026-04-10 18:11:26 +00:00 by HAL9000 · 0 comments
Owner

Spec Update Proposal

Triggered by: Merged PR #4219 (fix(acms): wire ACMS indexing pipeline into CLI so ContextTierService is populated during context operations)

Merged at: 2026-04-09


What Changed in the Implementation

PR #4219 introduced a new module src/cleveragents/application/services/context_tier_hydrator.py that bridges the gap between the resource registry (files on disk) and the ACMS context tier (in-memory fragments). Without this, ContextTierService started empty on every CLI process invocation and the LLM received zero file context during plan execution.

Key implementation details:

  • hydrate_tiers_from_project(): Reads files from a resource directory and stores them as TieredFragment objects in the ContextTierService
  • hydrate_tiers_for_plan(): Hydrates tiers for all projects linked to a plan, called automatically before context assembly in LLMExecuteActor.execute()
  • File listing strategy: Uses git ls-files for git-checkout resources, falls back to os.walk for other types
  • Budget constraints: Max file size 256KB, max total 10MB per project
  • Skip rules: Skips binary extensions, hidden directories, .git, node_modules, __pycache__, etc.
  • Metadata: Fragments stored with detail_depth: 1, relevance_score: 0.5, tier: HOT

The LLMExecuteActor was updated to accept tier_service, project_repository, and resource_registry dependencies, and calls hydrate_tiers_for_plan() before context assembly.


What Spec Sections Need Updating

Current Spec Text (§ACMS Architecture — Critical Design Decision)

Critical Design Decision: All indexing happens immediately when resources are added to projects or when code changes. There is no "on-demand" indexing during agent execution. This ensures that agents always have instant access to search capabilities without any indexing delays. The computational cost is paid once upfront, not repeatedly during agent operations.

Problem

The spec says "no on-demand indexing during agent execution" but the implementation does exactly that — it hydrates the context tier from linked resources at the start of each LLMExecuteActor.execute() call. This is a deliberate architectural decision to solve the problem of ContextTierService starting empty on each CLI process invocation (since the service is in-memory and not persisted across processes).

Proposed Text Addition

Add a new subsection under §ACMS Architecture (after the Critical Design Decision note):


Context Tier Hydration

Context Tier Hydration is the process of populating the in-memory ContextTierService from linked project resources at the start of each plan execution. This is necessary because the ContextTierService is an in-memory service that does not persist across CLI process invocations.

When hydration runs: Automatically before context assembly in LLMExecuteActor.execute(), for every project linked to the plan being executed.

Hydration algorithm:

  1. For each project linked to the plan, retrieve the project's linked resources from the resource registry.
  2. For each linked resource with a valid filesystem location:
    • If the resource is a git-checkout type: list tracked files via git ls-files --cached --others --exclude-standard
    • Otherwise: walk the directory tree via os.walk, skipping hidden directories and known non-code directories (.git, node_modules, __pycache__, .venv, etc.)
  3. For each file:
    • Skip binary file extensions (.pyc, .so, .png, .pdf, etc.)
    • Skip files larger than 256 KB
    • Stop if total bytes indexed exceeds 10 MB
    • Read file content as UTF-8 (skip on decode error)
    • Store as a TieredFragment in the HOT tier with detail_depth=1, relevance_score=0.5
  4. Log the number of fragments stored.

Relationship to the Critical Design Decision: The "no on-demand indexing" principle applies to the full UKO/ACMS indexing pipeline (which builds ontology graphs, computes embeddings, etc.). Context tier hydration is a lightweight file-read operation that populates the hot tier with raw file content — it is not the same as full ACMS indexing. Full ACMS indexing (when implemented) will replace this hydration step.

Module: cleveragents.application.services.context_tier_hydrator


Rationale

The implementation discovered a practical problem: the spec's "no on-demand indexing" principle was interpreted too broadly, leading to ContextTierService being empty during plan execution. The hydration module is a pragmatic bridge until the full ACMS indexing pipeline (which would persist index state across process invocations) is implemented. The spec should document this bridge and clarify the distinction between "full ACMS indexing" and "context tier hydration."

Scope

  • Section affected: §ACMS Architecture — Critical Design Decision (around line 25336)
  • Change type: Addition of new subsection (minor clarification, not architectural change)
  • ADR required: No — this is an implementation detail of the existing ACMS architecture

This is a spec update proposal. A human must approve before the spec PR is created.
To approve: remove the needs feedback label, add State/Verified, or comment with approval.


Automated by CleverAgents Bot
Supervisor: Spec Updater | Agent: spec-update-pool-supervisor

## Spec Update Proposal **Triggered by**: Merged PR #4219 (`fix(acms): wire ACMS indexing pipeline into CLI so ContextTierService is populated during context operations`) **Merged at**: 2026-04-09 --- ## What Changed in the Implementation PR #4219 introduced a new module `src/cleveragents/application/services/context_tier_hydrator.py` that bridges the gap between the resource registry (files on disk) and the ACMS context tier (in-memory fragments). Without this, `ContextTierService` started empty on every CLI process invocation and the LLM received zero file context during plan execution. Key implementation details: - **`hydrate_tiers_from_project()`**: Reads files from a resource directory and stores them as `TieredFragment` objects in the `ContextTierService` - **`hydrate_tiers_for_plan()`**: Hydrates tiers for all projects linked to a plan, called automatically before context assembly in `LLMExecuteActor.execute()` - **File listing strategy**: Uses `git ls-files` for `git-checkout` resources, falls back to `os.walk` for other types - **Budget constraints**: Max file size 256KB, max total 10MB per project - **Skip rules**: Skips binary extensions, hidden directories, `.git`, `node_modules`, `__pycache__`, etc. - **Metadata**: Fragments stored with `detail_depth: 1`, `relevance_score: 0.5`, `tier: HOT` The `LLMExecuteActor` was updated to accept `tier_service`, `project_repository`, and `resource_registry` dependencies, and calls `hydrate_tiers_for_plan()` before context assembly. --- ## What Spec Sections Need Updating ### Current Spec Text (§ACMS Architecture — Critical Design Decision) > **Critical Design Decision**: All indexing happens immediately when resources are added to projects or when code changes. There is no "on-demand" indexing during agent execution. This ensures that agents always have instant access to search capabilities without any indexing delays. The computational cost is paid once upfront, not repeatedly during agent operations. ### Problem The spec says "no on-demand indexing during agent execution" but the implementation does exactly that — it hydrates the context tier from linked resources at the start of each `LLMExecuteActor.execute()` call. This is a deliberate architectural decision to solve the problem of `ContextTierService` starting empty on each CLI process invocation (since the service is in-memory and not persisted across processes). ### Proposed Text Addition Add a new subsection under §ACMS Architecture (after the Critical Design Decision note): --- #### Context Tier Hydration **Context Tier Hydration** is the process of populating the in-memory `ContextTierService` from linked project resources at the start of each plan execution. This is necessary because the `ContextTierService` is an in-memory service that does not persist across CLI process invocations. **When hydration runs**: Automatically before context assembly in `LLMExecuteActor.execute()`, for every project linked to the plan being executed. **Hydration algorithm**: 1. For each project linked to the plan, retrieve the project's linked resources from the resource registry. 2. For each linked resource with a valid filesystem location: - If the resource is a `git-checkout` type: list tracked files via `git ls-files --cached --others --exclude-standard` - Otherwise: walk the directory tree via `os.walk`, skipping hidden directories and known non-code directories (`.git`, `node_modules`, `__pycache__`, `.venv`, etc.) 3. For each file: - Skip binary file extensions (`.pyc`, `.so`, `.png`, `.pdf`, etc.) - Skip files larger than 256 KB - Stop if total bytes indexed exceeds 10 MB - Read file content as UTF-8 (skip on decode error) - Store as a `TieredFragment` in the `HOT` tier with `detail_depth=1`, `relevance_score=0.5` 4. Log the number of fragments stored. **Relationship to the Critical Design Decision**: The "no on-demand indexing" principle applies to the full UKO/ACMS indexing pipeline (which builds ontology graphs, computes embeddings, etc.). Context tier hydration is a lightweight file-read operation that populates the hot tier with raw file content — it is not the same as full ACMS indexing. Full ACMS indexing (when implemented) will replace this hydration step. **Module**: `cleveragents.application.services.context_tier_hydrator` --- ## Rationale The implementation discovered a practical problem: the spec's "no on-demand indexing" principle was interpreted too broadly, leading to `ContextTierService` being empty during plan execution. The hydration module is a pragmatic bridge until the full ACMS indexing pipeline (which would persist index state across process invocations) is implemented. The spec should document this bridge and clarify the distinction between "full ACMS indexing" and "context tier hydration." ## Scope - **Section affected**: §ACMS Architecture — Critical Design Decision (around line 25336) - **Change type**: Addition of new subsection (minor clarification, not architectural change) - **ADR required**: No — this is an implementation detail of the existing ACMS architecture --- *This is a spec update proposal. A human must approve before the spec PR is created.* *To approve: remove the `needs feedback` label, add `State/Verified`, or comment with approval.* --- **Automated by CleverAgents Bot** Supervisor: Spec Updater | Agent: spec-update-pool-supervisor
HAL9000 added this to the v3.4.0 milestone 2026-04-10 18:37:26 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7365
No description provided.