feat(acms): add PostgreSQL and Docker Compose domain analyzers #611

2026-03-06T01:42:31Z

hamza.khyari commented

2026-03-06 01:42:31 +00:00

Summary

Implements Phase 2 of issue #588: PostgreSQLAnalyzer and DockerComposeAnalyzer domain-specific analyzers for the ACMS UKO indexing pipeline.

Closes #588 (Phase 2 scope — PostgreSQL + Docker Compose).

Changes

New Source Files

postgresql_analyzer.py (~594 lines) — Regex-based DDL parser extracting uko-data:Table, uko-data:Column, uko-data:ForeignKey, uko-data:View, and uko-data:Schema triples. Handles column metadata (data type, nullability, primary key), CREATE TABLE/VIEW with optional schema qualification, and inline/out-of-line FOREIGN KEY constraints. Raises ValueError on empty content; returns empty list on unparsable DDL.
docker_compose_analyzer.py (~393 lines) — YAML-based parser using yaml.safe_load extracting uko-infra:DeploymentUnit, uko-infra:Service, uko-infra:Port, uko-infra:EnvironmentVariable, and uko-infra:connectsTo triples. Validates Docker Compose format via services/version key detection. Returns empty list on empty, unparsable, or non-Compose YAML.

Test Files

features/domain_analyzers.feature — 34 BDD scenarios across 7 groups: protocol conformance (4), registry operations (6), PythonAnalyzer (5), MarkdownAnalyzer (5), PostgreSQLAnalyzer (6), DockerComposeAnalyzer (6), cross-analyzer (2).
features/steps/domain_analyzer_steps.py — Step definitions using use_step_matcher("re") for triple-assertion steps to avoid parse-format ambiguity with existing uko_analyzers_steps.py.
robot/domain_analyzers.robot + robot/helper_domain_analyzers.py — 6 Robot Framework integration smoke tests with realistic multi-object sample content.

Modified Files

acms/__init__.py — Added DockerComposeAnalyzer and PostgreSQLAnalyzer imports and __all__ entries.
vulture_whitelist.py — Added 2 entries for new analyzer classes.
CHANGELOG.md — Added unreleased entry.

Spec Compliance

Per docs/specification.md §44210-44261 (Domain-Specific Analyzers), §42151-42272 (uko-data vocabulary), §42285-42331 (uko-infra vocabulary):

Both analyzers satisfy AnalyzerProtocol (runtime-checkable)
Both register in AnalyzerRegistry by file extension
Both produce UKOTriple instances with uko:// URI scheme and confidence=1.0
Provenance triples are NOT emitted (correct per §43205-43225 — provenance is added by UKOIndexer._add_provenance())

Verification

Check	Result
Behave BDD	34 scenarios, 129 steps, 0 failures
Robot Framework	6 tests, 6 passed
ruff lint	0 errors
pyright typecheck	0 errors
vulture dead code	0 findings

## Summary Implements **Phase 2** of issue #588: `PostgreSQLAnalyzer` and `DockerComposeAnalyzer` domain-specific analyzers for the ACMS UKO indexing pipeline. Closes #588 (Phase 2 scope — PostgreSQL + Docker Compose). ## Changes ### New Source Files - **`postgresql_analyzer.py`** (~594 lines) — Regex-based DDL parser extracting `uko-data:Table`, `uko-data:Column`, `uko-data:ForeignKey`, `uko-data:View`, and `uko-data:Schema` triples. Handles column metadata (data type, nullability, primary key), `CREATE TABLE`/`VIEW` with optional schema qualification, and inline/out-of-line `FOREIGN KEY` constraints. Raises `ValueError` on empty content; returns empty list on unparsable DDL. - **`docker_compose_analyzer.py`** (~393 lines) — YAML-based parser using `yaml.safe_load` extracting `uko-infra:DeploymentUnit`, `uko-infra:Service`, `uko-infra:Port`, `uko-infra:EnvironmentVariable`, and `uko-infra:connectsTo` triples. Validates Docker Compose format via `services`/`version` key detection. Returns empty list on empty, unparsable, or non-Compose YAML. ### Test Files - **`features/domain_analyzers.feature`** — 34 BDD scenarios across 7 groups: protocol conformance (4), registry operations (6), PythonAnalyzer (5), MarkdownAnalyzer (5), PostgreSQLAnalyzer (6), DockerComposeAnalyzer (6), cross-analyzer (2). - **`features/steps/domain_analyzer_steps.py`** — Step definitions using `use_step_matcher("re")` for triple-assertion steps to avoid `parse`-format ambiguity with existing `uko_analyzers_steps.py`. - **`robot/domain_analyzers.robot`** + **`robot/helper_domain_analyzers.py`** — 6 Robot Framework integration smoke tests with realistic multi-object sample content. ### Modified Files - **`acms/__init__.py`** — Added `DockerComposeAnalyzer` and `PostgreSQLAnalyzer` imports and `__all__` entries. - **`vulture_whitelist.py`** — Added 2 entries for new analyzer classes. - **`CHANGELOG.md`** — Added unreleased entry. ## Spec Compliance Per `docs/specification.md` §44210-44261 (Domain-Specific Analyzers), §42151-42272 (`uko-data` vocabulary), §42285-42331 (`uko-infra` vocabulary): - Both analyzers satisfy `AnalyzerProtocol` (runtime-checkable) - Both register in `AnalyzerRegistry` by file extension - Both produce `UKOTriple` instances with `uko://` URI scheme and confidence=1.0 - Provenance triples are NOT emitted (correct per §43205-43225 — provenance is added by `UKOIndexer._add_provenance()`) ## Verification | Check | Result | |-------|--------| | Behave BDD | 34 scenarios, 129 steps, 0 failures | | Robot Framework | 6 tests, 6 passed | | ruff lint | 0 errors | | pyright typecheck | 0 errors | | vulture dead code | 0 findings |

hamza.khyari added the

labels 2026-03-06 01:42:31 +00:00

hamza.khyari removed the

Priority

Medium

label 2026-03-06 01:43:32 +00:00

hamza.khyari added this to the v3.5.0 milestone 2026-03-06 01:45:46 +00:00

hamza.khyari added the

Priority

Medium

label 2026-03-06 01:46:01 +00:00

hamza.khyari self-assigned this 2026-03-06 01:46:22 +00:00

hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from c0f22397b6 to 4d2aad1b25

2026-03-06 02:26:01 +00:00

Compare

hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from 4d2aad1b25 to d3fef49050

2026-03-06 13:06:43 +00:00

Compare

brent.edwards requested changes 2026-03-06 19:32:28 +00:00

Dismissed

brent.edwards left a comment

Review — PR #611 `feature/m6-acms-domain-specific-analyzers`

Reviewer: brent.edwards | Round: 1 | Commit: 4d2aad1b

Quality Gates

Gate	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)

Findings Summary

ID	Sev	Category	File	Description
F1	P1	Policy	`postgresql_analyzer.py`	File is 618 lines — exceeds 500-line limit (CONTRIBUTING §file-size)
F2	P1	Policy	`domain_analyzers.feature`	File is 523 lines — exceeds 500-line limit
F3	P1	Bug	`postgresql_analyzer.py:335-347, 422-443`	`_extract_body` and `_split_entries` count parentheses without respecting SQL string literals — silent data truncation
F4	P2	Policy	`domain_analyzer_steps.py:124`	`# type: ignore[arg-type]` suppression — CONTRIBUTING prohibits inline suppressions
F5	P2	Security	`docker_compose_analyzer.py:138`	`yaml.safe_load` is safe against arbitrary code exec but still vulnerable to billion-laughs alias expansion (quadratic memory). Consider rejecting alias nodes or adding a size guard.
F6	P2	Bug	`postgresql_analyzer.py:388-393`	Quoted-keyword columns silently dropped — `"primary" INTEGER` is stripped of quotes then matched against `_NON_COLUMN_KEYWORDS`, so a column literally named `primary` would be skipped
F7	P2	Process	Out-of-scope changes	3 files modified outside #588 scope: `repositories_uncovered_branches_steps.py` (scoped_session removal), `helper_m4_e2e_verification.py` and `m4_e2e_verification.robot` (3 M4 CLI E2E tests removed). These changes also appear in PR #610 and should be in their own commit or excluded from this PR.

Verdict

REQUEST_CHANGES — 3 P1 findings require resolution. The two 500+ line files (F1, F2) need splitting or restructuring. The parenthesis-in-string-literal bug (F3) silently produces wrong results for real-world DDL containing DEFAULT, CHECK, or string constants with parentheses. The out-of-scope changes (F7) should be removed from this PR.

What Went Well

Clean separation of analyzer concerns — PostgreSQL and Docker Compose analyzers are well-structured with clear protocols
Good BDD coverage with multiple scenarios covering edge cases
Lint and typecheck pass cleanly
CHANGELOG and CONTRIBUTORS.md correctly updated
Commit message follows Conventional Changelog format with issue reference

## Review — PR #611 `feature/m6-acms-domain-specific-analyzers` **Reviewer**: brent.edwards | **Round**: 1 | **Commit**: `4d2aad1b` ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | ### Findings Summary | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F1 | P1 | Policy | `postgresql_analyzer.py` | File is 618 lines — exceeds 500-line limit (CONTRIBUTING §file-size) | | F2 | P1 | Policy | `domain_analyzers.feature` | File is 523 lines — exceeds 500-line limit | | F3 | P1 | Bug | `postgresql_analyzer.py:335-347, 422-443` | `_extract_body` and `_split_entries` count parentheses without respecting SQL string literals — silent data truncation | | F4 | P2 | Policy | `domain_analyzer_steps.py:124` | `# type: ignore[arg-type]` suppression — CONTRIBUTING prohibits inline suppressions | | F5 | P2 | Security | `docker_compose_analyzer.py:138` | `yaml.safe_load` is safe against arbitrary code exec but still vulnerable to billion-laughs alias expansion (quadratic memory). Consider rejecting alias nodes or adding a size guard. | | F6 | P2 | Bug | `postgresql_analyzer.py:388-393` | Quoted-keyword columns silently dropped — `"primary" INTEGER` is stripped of quotes then matched against `_NON_COLUMN_KEYWORDS`, so a column literally named `primary` would be skipped | | F7 | P2 | Process | Out-of-scope changes | 3 files modified outside #588 scope: `repositories_uncovered_branches_steps.py` (scoped_session removal), `helper_m4_e2e_verification.py` and `m4_e2e_verification.robot` (3 M4 CLI E2E tests removed). These changes also appear in PR #610 and should be in their own commit or excluded from this PR. | ### Verdict **REQUEST_CHANGES** — 3 P1 findings require resolution. The two 500+ line files (F1, F2) need splitting or restructuring. The parenthesis-in-string-literal bug (F3) silently produces wrong results for real-world DDL containing `DEFAULT`, `CHECK`, or string constants with parentheses. The out-of-scope changes (F7) should be removed from this PR. ### What Went Well - Clean separation of analyzer concerns — PostgreSQL and Docker Compose analyzers are well-structured with clear protocols - Good BDD coverage with multiple scenarios covering edge cases - Lint and typecheck pass cleanly - CHANGELOG and CONTRIBUTORS.md correctly updated - Commit message follows Conventional Changelog format with issue reference

features/domain_analyzers.feature Outdated

						
				@@ -0,0 +520,4 @@

				  @docker_compose_analyzer

				  Scenario: Analyze dict-form volume mapping

				    Given a DockerComposeAnalyzer analyzer

				    When I analyze the Docker Compose content:

brent.edwards commented

2026-03-06 19:32:28 +00:00

F2 [P1 · Policy] — This feature file is 523 lines, exceeding the 500-line limit. Consider splitting into separate feature files per analyzer domain (e.g., postgresql_analyzer.feature, docker_compose_analyzer.feature).

**F2 [P1 · Policy]** — This feature file is 523 lines, exceeding the 500-line limit. Consider splitting into separate feature files per analyzer domain (e.g., `postgresql_analyzer.feature`, `docker_compose_analyzer.feature`).

features/steps/domain_analyzer_steps.py Outdated

						
				@@ -0,0 +121,4 @@

				        def analyze(self, content: str, resource_uri: str) -> list[UKOTriple]:

				            return []  # pragma: no cover

				    context.registry.register(_DuplicatePyAnalyzer())  # type: ignore[arg-type]

brent.edwards commented

2026-03-06 19:32:28 +00:00

F4 [P2 · Policy] — CONTRIBUTING.md prohibits # type: ignore inline suppressions. Please fix the type mismatch at the call site instead of suppressing it. If the register() method accepts a Protocol type, ensure _DuplicatePyAnalyzer fully satisfies it.

**F4 [P2 · Policy]** — CONTRIBUTING.md prohibits `# type: ignore` inline suppressions. Please fix the type mismatch at the call site instead of suppressing it. If the `register()` method accepts a `Protocol` type, ensure `_DuplicatePyAnalyzer` fully satisfies it.

robot/m4_e2e_verification.robot

brent.edwards commented

2026-03-06 19:32:28 +00:00

F7 [P2 · Process] — Removing 3 M4 CLI E2E tests (cli-plan-use, cli-plan-execute, cli-plan-tree) is out of scope for issue #588 (domain-specific analyzers). These same changes also appear in PR #610. Please remove this file from the PR or move these changes to a separate commit/PR that references the appropriate issue.

**F7 [P2 · Process]** — Removing 3 M4 CLI E2E tests (`cli-plan-use`, `cli-plan-execute`, `cli-plan-tree`) is out of scope for issue #588 (domain-specific analyzers). These same changes also appear in PR #610. Please remove this file from the PR or move these changes to a separate commit/PR that references the appropriate issue.

src/cleveragents/domain/models/acms/docker_compose_analyzer.py Outdated

						
				@@ -0,0 +135,4 @@

				        """

				        if not content or not content.strip():

				            raise ValueError("content must not be empty.")

				        if not resource_uri or not resource_uri.strip():

brent.edwards commented

2026-03-06 19:32:28 +00:00

F5 [P2 · Security] — While yaml.safe_load blocks arbitrary code execution, it is still vulnerable to billion-laughs-style alias expansion that can cause quadratic memory usage with crafted YAML input (e.g., nested anchors/aliases). Since this processes user-provided Docker Compose files, consider either:

Using a size check before parsing (if len(content) > MAX_COMPOSE_SIZE), or
Using yaml.safe_load through a wrapper that rejects alias nodes

**F5 [P2 · Security]** — While `yaml.safe_load` blocks arbitrary code execution, it is still vulnerable to billion-laughs-style alias expansion that can cause quadratic memory usage with crafted YAML input (e.g., nested anchors/aliases). Since this processes user-provided Docker Compose files, consider either: 1. Using a size check before parsing (`if len(content) > MAX_COMPOSE_SIZE`), or 2. Using `yaml.safe_load` through a wrapper that rejects alias nodes

src/cleveragents/domain/models/acms/postgresql_analyzer.py Outdated

						
				@@ -0,0 +337,4 @@

				                    subject_uri=t_uri,

				                    predicate="rdf:type",

				                    object_uri="uko-data:Table",

				                )

brent.edwards commented

2026-03-06 19:32:28 +00:00

F3 [P1 · Bug] — _extract_body counts raw ( and ) characters without respecting SQL string literals. A DEFAULT value like 'func(x)' or a CHECK constraint containing 'a)' will cause the depth counter to hit 0 prematurely, silently truncating the table body. The same bug affects _split_entries (line 422-443).

Example that breaks:

CREATE TABLE t (
  col1 TEXT DEFAULT 'hello(world)',
  col2 INTEGER
);

The 'hello(world)' contains an unmatched ( inside a string literal, which increments depth. The ) after world decrements it. Then the actual closing ) on the last line sets depth to 0 too early, producing a truncated body that misses col2.

Fix: track whether you're inside a single-quoted string ('...') and skip paren counting while inside it. PostgreSQL uses '' for escaping quotes inside strings.

**F3 [P1 · Bug]** — `_extract_body` counts raw `(` and `)` characters without respecting SQL string literals. A `DEFAULT` value like `'func(x)'` or a `CHECK` constraint containing `'a)'` will cause the depth counter to hit 0 prematurely, silently truncating the table body. The same bug affects `_split_entries` (line 422-443). Example that breaks: ```sql CREATE TABLE t ( col1 TEXT DEFAULT 'hello(world)', col2 INTEGER ); ``` The `'hello(world)'` contains an unmatched `(` inside a string literal, which increments depth. The `)` after `world` decrements it. Then the actual closing `)` on the last line sets depth to 0 too early, producing a truncated body that misses `col2`. Fix: track whether you're inside a single-quoted string (`'...'`) and skip paren counting while inside it. PostgreSQL uses `''` for escaping quotes inside strings.

src/cleveragents/domain/models/acms/postgresql_analyzer.py Outdated

						
				@@ -0,0 +387,4 @@

				                    return content[paren_start + 1 : i]

				        return ""

				    def _parse_table_body(

brent.edwards commented

2026-03-06 19:32:28 +00:00

F6 [P2 · Bug] — The first_word extraction strips double quotes then checks against _NON_COLUMN_KEYWORDS. This means a column legitimately named "primary" or "check" (which PostgreSQL allows via quoting) will be silently skipped.

Fix: only check unquoted first words against _NON_COLUMN_KEYWORDS. If the original first token starts with ", it's a quoted identifier and should never be treated as a keyword.

raw_first = stripped.split()[0] if stripped.split() else ""
if not raw_first.startswith('"') and raw_first.lower() in _NON_COLUMN_KEYWORDS:
    continue

**F6 [P2 · Bug]** — The `first_word` extraction strips double quotes then checks against `_NON_COLUMN_KEYWORDS`. This means a column legitimately named `"primary"` or `"check"` (which PostgreSQL allows via quoting) will be silently skipped. Fix: only check unquoted first words against `_NON_COLUMN_KEYWORDS`. If the original first token starts with `"`, it's a quoted identifier and should never be treated as a keyword. ```python raw_first = stripped.split()[0] if stripped.split() else "" if not raw_first.startswith('"') and raw_first.lower() in _NON_COLUMN_KEYWORDS: continue ```

src/cleveragents/domain/models/acms/postgresql_analyzer.py Outdated

						
				@@ -0,0 +615,4 @@

				                    subject_uri=source_col_uri,

				                    predicate="uko-data:foreignKeyTo",

				                    object_uri=target_col_uri,

				                )

brent.edwards commented

2026-03-06 19:32:28 +00:00

F1 [P1 · Policy] — This file is 618 lines, exceeding the 500-line limit per CONTRIBUTING.md. Consider extracting helper methods (e.g., _extract_body, _split_entries, _parse_table_body, foreign-key extraction) into a separate _postgresql_helpers.py module, or splitting table-level vs. function/view parsing into separate classes.

**F1 [P1 · Policy]** — This file is 618 lines, exceeding the 500-line limit per CONTRIBUTING.md. Consider extracting helper methods (e.g., `_extract_body`, `_split_entries`, `_parse_table_body`, foreign-key extraction) into a separate `_postgresql_helpers.py` module, or splitting table-level vs. function/view parsing into separate classes.

hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from d3fef49050 to 13618fb8d5

2026-03-06 20:07:10 +00:00

Compare

hamza.khyari referenced this issue from a commit

2026-03-06 20:07:10 +00:00

fix(acms): address PR #611 review findings F1-F7

brent.edwards commented

2026-03-06 20:21:26 +00:00

Supplementary Findings — PR #611 (Second Pass)

Following review #2018, a thorough second pass uncovered one additional P1 finding:

F8 [P1 · Process] — CHANGELOG deletes pre-existing #495 entry

The CHANGELOG diff removes the (#495) entry (M4 acceptance criteria validation) from the ## Unreleased section and replaces it with the new (#588) entry. The #588 entry correctly prepends to the top of the section, but the deletion of #495 is wrong — that entry belongs to a different merged PR and must be preserved.

This is the same pattern found in PR #610 (F5). The correct behavior is to prepend the new entry above the existing ones, not replace the first entry.

File: CHANGELOG.md (diff lines 5-11 show the deletion)

Updated totals: 3 P1 + 5 P2 = 8 findings (F1-F8).

## Supplementary Findings — PR #611 (Second Pass) Following review #2018, a thorough second pass uncovered one additional P1 finding: ### F8 [P1 · Process] — CHANGELOG deletes pre-existing #495 entry The CHANGELOG diff removes the `(#495)` entry (M4 acceptance criteria validation) from the `## Unreleased` section and replaces it with the new `(#588)` entry. The #588 entry correctly prepends to the top of the section, but the deletion of #495 is wrong — that entry belongs to a different merged PR and must be preserved. This is the same pattern found in PR #610 (F5). The correct behavior is to **prepend** the new entry above the existing ones, not replace the first entry. **File**: `CHANGELOG.md` (diff lines 5-11 show the deletion) --- **Updated totals**: 3 P1 + 5 P2 = 8 findings (F1-F8).

hamza.khyari added 1 commit 2026-03-06 20:31:16 +00:00

fix(changelog): prepend #588 entry to top of Unreleased section (F8)

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 16s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 36s

Details

CI / typecheck (pull_request) Successful in 38s

Details

CI / unit_tests (pull_request) Successful in 2m42s

Details

CI / integration_tests (pull_request) Successful in 3m2s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / coverage (pull_request) Successful in 4m32s

Details

CI / benchmark-regression (pull_request) Successful in 29m33s

Details

fb96d7910b

Move the (#588) CHANGELOG entry from after (#203) to the top of the
## Unreleased section, matching the convention that newest entries are
prepended first. All pre-existing entries (#473, #495, #203, #494)
remain intact and unmodified.

Addresses reviewer finding F8 from PR #611 second-pass review.
Refs: #588

hamza.khyari referenced this issue from a commit

2026-03-06 20:31:16 +00:00

fix(changelog): prepend #588 entry to top of Unreleased section (F8)

hamza.khyari commented

2026-03-06 20:31:27 +00:00

F8 addressed in commit fb96d791.

The (#495) entry was never actually deleted — it was present in the branch at line 30, identical to master. The git diff origin/master..HEAD -- CHANGELOG.md showed a pure 12-line addition with zero deletions of existing entries.

However, the entry ordering was wrong: (#588) was appended after (#203) instead of prepended to the top of ## Unreleased. Fixed by moving the (#588) entry to the first position under ## Unreleased, per convention that newest entries go first.

All pre-existing entries preserved and unchanged: (#473), (#495), (#203), (#494).

Quality gates: lint ✓, typecheck ✓, security_scan ✓, behave 57/57 scenarios ✓.

**F8 addressed** in commit `fb96d791`. The `(#495)` entry was never actually deleted — it was present in the branch at line 30, identical to master. The `git diff origin/master..HEAD -- CHANGELOG.md` showed a pure 12-line addition with zero deletions of existing entries. However, the entry ordering was wrong: `(#588)` was appended after `(#203)` instead of prepended to the top of `## Unreleased`. Fixed by moving the `(#588)` entry to the first position under `## Unreleased`, per convention that newest entries go first. All pre-existing entries preserved and unchanged: `(#473)`, `(#495)`, `(#203)`, `(#494)`. Quality gates: lint ✓, typecheck ✓, security_scan ✓, behave 57/57 scenarios ✓.

brent.edwards approved these changes 2026-03-06 22:30:53 +00:00

Dismissed

brent.edwards left a comment

Verification Review — Round 2 (Post-Fix)

Reviewed at HEAD: fb96d791 (3 commits: 77db78d7 feat, 13618fb8 fix F1-F7, fb96d791 fix F8)

All 8 findings from Review #2018 and Comment #55331 are verified fixed. No new bugs found. Quality gates pass clean.

Finding Disposition

ID	Sev	Finding	Status	Evidence
F1	P1	`postgresql_analyzer.py` was 618 lines (500 limit)	FIXED	Now 310 lines; helpers extracted to `_postgresql_helpers.py` (440 lines). Both under limit.
F2	P1	`domain_analyzers.feature` was 523 lines (500 limit)	FIXED	Split into 3 files: `domain_analyzers.feature` (217), `postgresql_analyzer.feature` (201), `docker_compose_analyzer.feature` (199). All under limit.
F3	P1	Parentheses in SQL string literals silently truncate `extract_body()`	FIXED	`extract_body()` and `split_entries()` in `_postgresql_helpers.py` now track `in_string` state, handle `''` escape sequences. Two regression BDD scenarios added: "DEFAULT with parentheses in string literal" and "CHECK constraint with string containing parentheses". Both pass.
F4	P2	`# type: ignore[arg-type]` in `domain_analyzer_steps.py:124`	FIXED	Replaced with static protocol assertion: `_dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup`. No `type: ignore` anywhere in the file.
F5	P2	YAML alias expansion (billion-laughs) via `yaml.safe_load`	FIXED	`_MAX_COMPOSE_BYTES = 1_048_576` (1 MiB) guard added in `docker_compose_analyzer.py` before `yaml.safe_load()` call. Returns `[]` with warning log if content exceeds limit.
F6	P2	Quoted SQL keyword columns (e.g. `"primary"`) incorrectly filtered	FIXED	`parse_table_body()` now checks `not raw_first.startswith('"')` before keyword matching, so quoted identifiers bypass the filter. Regression BDD scenario "Quoted keyword column names are not silently dropped" tests `"primary"`, `"check"`, and `normal_col`. Passes.
F7	P2	Out-of-scope changes (M4 E2E test removal, scoped_session, CONTRIBUTORS.md)	FIXED	Branch rebased onto current master (`23803f14`). PR now contains exactly 12 files, all in-scope: 5 new source/helper files, 3 feature files, 1 step file, 2 robot files, and expected modifications to `__init__.py`, `vulture_whitelist.py`, `CHANGELOG.md`.
F8	P1	CHANGELOG deletes pre-existing #495 entry	FIXED	`#588` entry placed at top of Unreleased section. `#495` entry and all other pre-existing entries intact.

Quality Gates

Gate	Result
`nox -s lint`	PASS — 0 errors
`nox -s typecheck`	PASS — 0 errors, 0 warnings
BDD scenarios (3 analyzer features)	PASS — 3 features, 57 scenarios, 211 steps, 0 failures

New Bug Scan

Reviewed all code changes introduced by fix commits 13618fb8 and fb96d791:

extract_body() / split_entries() string tracking: Forward-progress guaranteed by i += 1 at loop end; '' escape handled with i += 2 + continue; unterminated strings gracefully return empty. Clean.
parse_table_body() keyword filter: startswith('"') check correctly short-circuits for quoted identifiers before keyword comparison. No false positives or negatives in edge cases.
Size guard in DockerComposeAnalyzer: len(content) measures characters (not bytes), but the naming _MAX_COMPOSE_BYTES is slightly imprecise. Functionally correct since byte count ≥ char count for UTF-8, making the guard conservative. Non-blocking observation.
Protocol assertion pattern: _dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup is idiomatic and has zero runtime cost. Clean.
Feature file split: All scenarios preserved, regression scenarios added. Scenario counts align with test run output.

No new bugs found.

Verdict

All P1 and P2 findings from Round 1 are resolved. Code is clean, well-structured, and fully tested. Approved for merge.

## Verification Review — Round 2 (Post-Fix) **Reviewed at HEAD:** `fb96d791` (3 commits: `77db78d7` feat, `13618fb8` fix F1-F7, `fb96d791` fix F8) All 8 findings from Review #2018 and Comment #55331 are **verified fixed**. No new bugs found. Quality gates pass clean. --- ### Finding Disposition | ID | Sev | Finding | Status | Evidence | |:---|:----|:--------|:-------|:---------| | F1 | P1 | `postgresql_analyzer.py` was 618 lines (500 limit) | **FIXED** | Now 310 lines; helpers extracted to `_postgresql_helpers.py` (440 lines). Both under limit. | | F2 | P1 | `domain_analyzers.feature` was 523 lines (500 limit) | **FIXED** | Split into 3 files: `domain_analyzers.feature` (217), `postgresql_analyzer.feature` (201), `docker_compose_analyzer.feature` (199). All under limit. | | F3 | P1 | Parentheses in SQL string literals silently truncate `extract_body()` | **FIXED** | `extract_body()` and `split_entries()` in `_postgresql_helpers.py` now track `in_string` state, handle `''` escape sequences. Two regression BDD scenarios added: "DEFAULT with parentheses in string literal" and "CHECK constraint with string containing parentheses". Both pass. | | F4 | P2 | `# type: ignore[arg-type]` in `domain_analyzer_steps.py:124` | **FIXED** | Replaced with static protocol assertion: `_dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup`. No `type: ignore` anywhere in the file. | | F5 | P2 | YAML alias expansion (billion-laughs) via `yaml.safe_load` | **FIXED** | `_MAX_COMPOSE_BYTES = 1_048_576` (1 MiB) guard added in `docker_compose_analyzer.py` before `yaml.safe_load()` call. Returns `[]` with warning log if content exceeds limit. | | F6 | P2 | Quoted SQL keyword columns (e.g. `"primary"`) incorrectly filtered | **FIXED** | `parse_table_body()` now checks `not raw_first.startswith('"')` before keyword matching, so quoted identifiers bypass the filter. Regression BDD scenario "Quoted keyword column names are not silently dropped" tests `"primary"`, `"check"`, and `normal_col`. Passes. | | F7 | P2 | Out-of-scope changes (M4 E2E test removal, scoped_session, CONTRIBUTORS.md) | **FIXED** | Branch rebased onto current master (`23803f14`). PR now contains exactly 12 files, all in-scope: 5 new source/helper files, 3 feature files, 1 step file, 2 robot files, and expected modifications to `__init__.py`, `vulture_whitelist.py`, `CHANGELOG.md`. | | F8 | P1 | CHANGELOG deletes pre-existing #495 entry | **FIXED** | `#588` entry placed at top of Unreleased section. `#495` entry and all other pre-existing entries intact. | ### Quality Gates | Gate | Result | |:-----|:-------| | `nox -s lint` | **PASS** — 0 errors | | `nox -s typecheck` | **PASS** — 0 errors, 0 warnings | | BDD scenarios (3 analyzer features) | **PASS** — 3 features, 57 scenarios, 211 steps, 0 failures | ### New Bug Scan Reviewed all code changes introduced by fix commits `13618fb8` and `fb96d791`: - **`extract_body()` / `split_entries()` string tracking**: Forward-progress guaranteed by `i += 1` at loop end; `''` escape handled with `i += 2` + `continue`; unterminated strings gracefully return empty. Clean. - **`parse_table_body()` keyword filter**: `startswith('"')` check correctly short-circuits for quoted identifiers before keyword comparison. No false positives or negatives in edge cases. - **Size guard in DockerComposeAnalyzer**: `len(content)` measures characters (not bytes), but the naming `_MAX_COMPOSE_BYTES` is slightly imprecise. Functionally correct since byte count ≥ char count for UTF-8, making the guard conservative. Non-blocking observation. - **Protocol assertion pattern**: `_dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup` is idiomatic and has zero runtime cost. Clean. - **Feature file split**: All scenarios preserved, regression scenarios added. Scenario counts align with test run output. **No new bugs found.** ### Verdict All P1 and P2 findings from Round 1 are resolved. Code is clean, well-structured, and fully tested. **Approved for merge.**

brent.edwards approved these changes 2026-03-06 22:32:50 +00:00

brent.edwards left a comment

Approved for merge!

hamza.khyari merged commit 95c9c9f3a4 into master

2026-03-06 23:07:26 +00:00

hamza.khyari referenced this issue from a commit

2026-03-06 23:07:28 +00:00

Merge pull request 'feat(acms): add PostgreSQL and Docker Compose domain analyzers' (#611) from feature/m6-acms-domain-specific-analyzers into master

hamza.khyari deleted branch feature/m6-acms-domain-specific-analyzers

2026-03-06 23:07:35 +00:00

freemo referenced this issue from a commit

2026-03-06 23:43:39 +00:00

docs(timeline): Day 26 PM sweep — schedule adherence, PR status, spec gap analysis

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#611

feat(acms): add PostgreSQL and Docker Compose domain analyzers #611

Summary

Changes

New Source Files

Test Files

Modified Files

Spec Compliance

Verification

Review — PR #611 feature/m6-acms-domain-specific-analyzers

Quality Gates

Findings Summary

Verdict

What Went Well

Supplementary Findings — PR #611 (Second Pass)

F8 [P1 · Process] — CHANGELOG deletes pre-existing #495 entry

Verification Review — Round 2 (Post-Fix)

Finding Disposition

Quality Gates

New Bug Scan

Verdict

Review — PR #611 `feature/m6-acms-domain-specific-analyzers`