[BUG] DecompositionService._directory_key uses fixed depth=2 on absolute paths, making directory clustering ineffective #9401

Closed
opened 2026-04-14 16:45:41 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit Message: fix(decomposition): make _directory_key relative-path-aware for absolute file paths
  • Branch: fix/decomposition-directory-key-absolute-paths

Background and Context

The DecompositionService in cleveragents.application.services.decomposition_service uses ClusteringStrategy.cluster_by_directory() to partition files into clusters based on their directory structure. This strategy relies on _directory_key(path, depth=2) which returns the first depth path components as a grouping key.

The docstring example shows: "src/cleveragents/foo/bar.py" with depth=2 → key "src/cleveragents". This works correctly for relative paths.

However, in production use, DecompositionService.decompose() receives absolute file paths (e.g., /home/user/project/src/api/handler.py). When split by /, an absolute path like /home/user/project/src/api/handler.py produces parts ['', 'home', 'user', 'project', 'src', 'api', 'handler.py']. With depth=2, the key is "/home" — the same for every file on the system. All files collapse into a single directory bucket, making directory clustering completely ineffective.

This was confirmed by code analysis of:

  • src/cleveragents/application/services/decomposition_clustering.py lines 57-64
  • src/cleveragents/application/services/decomposition_service.py lines 234-236

The feature file features/large_project_decomposition.feature even contains a NOTE acknowledging this limitation:

"The test fixture make_files places all files under a single tmpdir root, so _directory_key(depth=2) yields one bucket and meaningful multi-level splits depend on bucket-size chunking."

This means the BDD tests pass only because they fall back to bucket-size chunking (splitting oversized buckets), not because directory clustering actually works. In production, when files from different directories are passed as absolute paths, the directory-based grouping produces no meaningful structure.

Current Behavior

When DecompositionService.decompose() is called with absolute file paths (the normal production case), _directory_key(path, depth=2) returns the same key for all files (e.g., "/home" or "/tmp"). All files are placed in a single directory bucket. The bucket is then split by size (max_files_per_subplan), not by actual directory structure. The resulting decomposition has no meaningful directory-based hierarchy.

Reproduction:

from cleveragents.application.services.decomposition_clustering import _directory_key

# Absolute paths (production case)
print(_directory_key("/home/user/project/src/api/handler.py", depth=2))  # → "/home"
print(_directory_key("/home/user/project/src/web/index.py", depth=2))    # → "/home"
# Both files get the same key — directory clustering is broken

Expected Behavior

_directory_key should compute the directory key relative to a common root (or use a configurable offset), so that files in different subdirectories of the project root are grouped correctly.

For example, given a project root of /home/user/project/, the file /home/user/project/src/api/handler.py should have a relative path of src/api/handler.py and a directory key of src/api (with depth=2).

The fix should either:

  1. Accept a root parameter and compute relative paths before extracting the key, or
  2. Use a smarter heuristic that finds the common prefix of all paths and computes keys relative to it.

Acceptance Criteria

  • _directory_key correctly groups files by their directory structure when given absolute paths
  • Files in src/api/ and src/web/ are placed in different directory clusters when the project root is /home/user/project/
  • The cluster_by_directory method accepts an optional root parameter to compute relative paths
  • DecompositionService.decompose() passes the common root to cluster_by_directory
  • All existing large_project_decomposition.feature scenarios continue to pass
  • The scenario "Files are clustered by directory" correctly produces clusters with different directory prefixes for absolute paths

Supporting Information

  • Affected file: src/cleveragents/application/services/decomposition_clustering.py_directory_key function (lines 57-64)
  • Affected file: src/cleveragents/application/services/decomposition_service.py_build_hierarchy method (lines 234-236)
  • Feature file: features/large_project_decomposition.feature — NOTE at lines 22-28 acknowledges the limitation
  • Issue reference: Based on Forgejo issue #205 (original decomposition implementation)

Subtasks

  • Update _directory_key to accept an optional root parameter for relative path computation
  • Update ClusteringStrategy.cluster_by_directory to accept and pass through the root parameter
  • Update DecompositionService._build_hierarchy to compute and pass the common root to cluster_by_directory
  • Add a helper to compute the common root of a list of absolute paths
  • Tests (Behave): Update features/large_project_decomposition.feature scenarios to use absolute paths that exercise real directory clustering
  • Tests (Behave): Verify the "Files are clustered by directory" scenario produces correct clusters with absolute paths
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot Supervisor: UAT Test Pool | Agent: uat-test-pool-supervisor

## Metadata - **Commit Message**: `fix(decomposition): make _directory_key relative-path-aware for absolute file paths` - **Branch**: `fix/decomposition-directory-key-absolute-paths` ## Background and Context The `DecompositionService` in `cleveragents.application.services.decomposition_service` uses `ClusteringStrategy.cluster_by_directory()` to partition files into clusters based on their directory structure. This strategy relies on `_directory_key(path, depth=2)` which returns the first `depth` path components as a grouping key. The docstring example shows: `"src/cleveragents/foo/bar.py"` with `depth=2` → key `"src/cleveragents"`. This works correctly for **relative** paths. However, in production use, `DecompositionService.decompose()` receives **absolute** file paths (e.g., `/home/user/project/src/api/handler.py`). When split by `/`, an absolute path like `/home/user/project/src/api/handler.py` produces parts `['', 'home', 'user', 'project', 'src', 'api', 'handler.py']`. With `depth=2`, the key is `"/home"` — the same for every file on the system. All files collapse into a single directory bucket, making directory clustering completely ineffective. This was confirmed by code analysis of: - `src/cleveragents/application/services/decomposition_clustering.py` lines 57-64 - `src/cleveragents/application/services/decomposition_service.py` lines 234-236 The feature file `features/large_project_decomposition.feature` even contains a NOTE acknowledging this limitation: > "The test fixture `make_files` places all files under a single tmpdir root, so `_directory_key(depth=2)` yields one bucket and meaningful multi-level splits depend on bucket-size chunking." This means the BDD tests pass only because they fall back to bucket-size chunking (splitting oversized buckets), not because directory clustering actually works. In production, when files from different directories are passed as absolute paths, the directory-based grouping produces no meaningful structure. ## Current Behavior When `DecompositionService.decompose()` is called with absolute file paths (the normal production case), `_directory_key(path, depth=2)` returns the same key for all files (e.g., `"/home"` or `"/tmp"`). All files are placed in a single directory bucket. The bucket is then split by size (max_files_per_subplan), not by actual directory structure. The resulting decomposition has no meaningful directory-based hierarchy. **Reproduction:** ```python from cleveragents.application.services.decomposition_clustering import _directory_key # Absolute paths (production case) print(_directory_key("/home/user/project/src/api/handler.py", depth=2)) # → "/home" print(_directory_key("/home/user/project/src/web/index.py", depth=2)) # → "/home" # Both files get the same key — directory clustering is broken ``` ## Expected Behavior `_directory_key` should compute the directory key relative to a common root (or use a configurable offset), so that files in different subdirectories of the project root are grouped correctly. For example, given a project root of `/home/user/project/`, the file `/home/user/project/src/api/handler.py` should have a relative path of `src/api/handler.py` and a directory key of `src/api` (with depth=2). The fix should either: 1. Accept a `root` parameter and compute relative paths before extracting the key, or 2. Use a smarter heuristic that finds the common prefix of all paths and computes keys relative to it. ## Acceptance Criteria - [ ] `_directory_key` correctly groups files by their directory structure when given absolute paths - [ ] Files in `src/api/` and `src/web/` are placed in different directory clusters when the project root is `/home/user/project/` - [ ] The `cluster_by_directory` method accepts an optional `root` parameter to compute relative paths - [ ] `DecompositionService.decompose()` passes the common root to `cluster_by_directory` - [ ] All existing `large_project_decomposition.feature` scenarios continue to pass - [ ] The scenario "Files are clustered by directory" correctly produces clusters with different directory prefixes for absolute paths ## Supporting Information - Affected file: `src/cleveragents/application/services/decomposition_clustering.py` — `_directory_key` function (lines 57-64) - Affected file: `src/cleveragents/application/services/decomposition_service.py` — `_build_hierarchy` method (lines 234-236) - Feature file: `features/large_project_decomposition.feature` — NOTE at lines 22-28 acknowledges the limitation - Issue reference: Based on Forgejo issue #205 (original decomposition implementation) ## Subtasks - [ ] Update `_directory_key` to accept an optional `root` parameter for relative path computation - [ ] Update `ClusteringStrategy.cluster_by_directory` to accept and pass through the `root` parameter - [ ] Update `DecompositionService._build_hierarchy` to compute and pass the common root to `cluster_by_directory` - [ ] Add a helper to compute the common root of a list of absolute paths - [ ] Tests (Behave): Update `features/large_project_decomposition.feature` scenarios to use absolute paths that exercise real directory clustering - [ ] Tests (Behave): Verify the "Files are clustered by directory" scenario produces correct clusters with absolute paths - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- **Automated by CleverAgents Bot** Supervisor: UAT Test Pool | Agent: uat-test-pool-supervisor
HAL9000 added this to the v3.4.0 milestone 2026-04-14 16:46:51 +00:00
Author
Owner

Triage: Verified [AUTO-OWNR-1]

Valid bug: DecompositionService._directory_key uses a fixed depth=2 on absolute paths, which causes all files to collapse into a single directory bucket (e.g., all files get key "/home" or "/tmp"). The BDD tests pass only because they fall back to bucket-size chunking, not because directory clustering actually works. In production with absolute paths, directory-based grouping produces no meaningful structure.

Assigning to v3.4.0 (ACMS v1 + Context Scaling) as decomposition is part of the ACMS pipeline. Priority Medium — decomposition still works via bucket-size chunking, but directory-based grouping is ineffective.

MoSCoW: Should Have — correct directory clustering is important for meaningful plan decomposition, but the fallback mechanism still produces valid (if suboptimal) results.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Triage: Verified** [AUTO-OWNR-1] Valid bug: `DecompositionService._directory_key` uses a fixed `depth=2` on absolute paths, which causes all files to collapse into a single directory bucket (e.g., all files get key `"/home"` or `"/tmp"`). The BDD tests pass only because they fall back to bucket-size chunking, not because directory clustering actually works. In production with absolute paths, directory-based grouping produces no meaningful structure. Assigning to **v3.4.0** (ACMS v1 + Context Scaling) as decomposition is part of the ACMS pipeline. Priority **Medium** — decomposition still works via bucket-size chunking, but directory-based grouping is ineffective. MoSCoW: **Should Have** — correct directory clustering is important for meaningful plan decomposition, but the fallback mechanism still produces valid (if suboptimal) results. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#9401
No description provided.