feat(acms): add PostgreSQL and Docker Compose domain analyzers #611

Merged
hamza.khyari merged 3 commits from feature/m6-acms-domain-specific-analyzers into master 2026-03-06 23:07:26 +00:00
Member

Summary

Implements Phase 2 of issue #588: PostgreSQLAnalyzer and DockerComposeAnalyzer domain-specific analyzers for the ACMS UKO indexing pipeline.

Closes #588 (Phase 2 scope — PostgreSQL + Docker Compose).

Changes

New Source Files

  • postgresql_analyzer.py (~594 lines) — Regex-based DDL parser extracting uko-data:Table, uko-data:Column, uko-data:ForeignKey, uko-data:View, and uko-data:Schema triples. Handles column metadata (data type, nullability, primary key), CREATE TABLE/VIEW with optional schema qualification, and inline/out-of-line FOREIGN KEY constraints. Raises ValueError on empty content; returns empty list on unparsable DDL.
  • docker_compose_analyzer.py (~393 lines) — YAML-based parser using yaml.safe_load extracting uko-infra:DeploymentUnit, uko-infra:Service, uko-infra:Port, uko-infra:EnvironmentVariable, and uko-infra:connectsTo triples. Validates Docker Compose format via services/version key detection. Returns empty list on empty, unparsable, or non-Compose YAML.

Test Files

  • features/domain_analyzers.feature — 34 BDD scenarios across 7 groups: protocol conformance (4), registry operations (6), PythonAnalyzer (5), MarkdownAnalyzer (5), PostgreSQLAnalyzer (6), DockerComposeAnalyzer (6), cross-analyzer (2).
  • features/steps/domain_analyzer_steps.py — Step definitions using use_step_matcher("re") for triple-assertion steps to avoid parse-format ambiguity with existing uko_analyzers_steps.py.
  • robot/domain_analyzers.robot + robot/helper_domain_analyzers.py — 6 Robot Framework integration smoke tests with realistic multi-object sample content.

Modified Files

  • acms/__init__.py — Added DockerComposeAnalyzer and PostgreSQLAnalyzer imports and __all__ entries.
  • vulture_whitelist.py — Added 2 entries for new analyzer classes.
  • CHANGELOG.md — Added unreleased entry.

Spec Compliance

Per docs/specification.md §44210-44261 (Domain-Specific Analyzers), §42151-42272 (uko-data vocabulary), §42285-42331 (uko-infra vocabulary):

  • Both analyzers satisfy AnalyzerProtocol (runtime-checkable)
  • Both register in AnalyzerRegistry by file extension
  • Both produce UKOTriple instances with uko:// URI scheme and confidence=1.0
  • Provenance triples are NOT emitted (correct per §43205-43225 — provenance is added by UKOIndexer._add_provenance())

Verification

Check Result
Behave BDD 34 scenarios, 129 steps, 0 failures
Robot Framework 6 tests, 6 passed
ruff lint 0 errors
pyright typecheck 0 errors
vulture dead code 0 findings
## Summary Implements **Phase 2** of issue #588: `PostgreSQLAnalyzer` and `DockerComposeAnalyzer` domain-specific analyzers for the ACMS UKO indexing pipeline. Closes #588 (Phase 2 scope — PostgreSQL + Docker Compose). ## Changes ### New Source Files - **`postgresql_analyzer.py`** (~594 lines) — Regex-based DDL parser extracting `uko-data:Table`, `uko-data:Column`, `uko-data:ForeignKey`, `uko-data:View`, and `uko-data:Schema` triples. Handles column metadata (data type, nullability, primary key), `CREATE TABLE`/`VIEW` with optional schema qualification, and inline/out-of-line `FOREIGN KEY` constraints. Raises `ValueError` on empty content; returns empty list on unparsable DDL. - **`docker_compose_analyzer.py`** (~393 lines) — YAML-based parser using `yaml.safe_load` extracting `uko-infra:DeploymentUnit`, `uko-infra:Service`, `uko-infra:Port`, `uko-infra:EnvironmentVariable`, and `uko-infra:connectsTo` triples. Validates Docker Compose format via `services`/`version` key detection. Returns empty list on empty, unparsable, or non-Compose YAML. ### Test Files - **`features/domain_analyzers.feature`** — 34 BDD scenarios across 7 groups: protocol conformance (4), registry operations (6), PythonAnalyzer (5), MarkdownAnalyzer (5), PostgreSQLAnalyzer (6), DockerComposeAnalyzer (6), cross-analyzer (2). - **`features/steps/domain_analyzer_steps.py`** — Step definitions using `use_step_matcher("re")` for triple-assertion steps to avoid `parse`-format ambiguity with existing `uko_analyzers_steps.py`. - **`robot/domain_analyzers.robot`** + **`robot/helper_domain_analyzers.py`** — 6 Robot Framework integration smoke tests with realistic multi-object sample content. ### Modified Files - **`acms/__init__.py`** — Added `DockerComposeAnalyzer` and `PostgreSQLAnalyzer` imports and `__all__` entries. - **`vulture_whitelist.py`** — Added 2 entries for new analyzer classes. - **`CHANGELOG.md`** — Added unreleased entry. ## Spec Compliance Per `docs/specification.md` §44210-44261 (Domain-Specific Analyzers), §42151-42272 (`uko-data` vocabulary), §42285-42331 (`uko-infra` vocabulary): - Both analyzers satisfy `AnalyzerProtocol` (runtime-checkable) - Both register in `AnalyzerRegistry` by file extension - Both produce `UKOTriple` instances with `uko://` URI scheme and confidence=1.0 - Provenance triples are NOT emitted (correct per §43205-43225 — provenance is added by `UKOIndexer._add_provenance()`) ## Verification | Check | Result | |-------|--------| | Behave BDD | 34 scenarios, 129 steps, 0 failures | | Robot Framework | 6 tests, 6 passed | | ruff lint | 0 errors | | pyright typecheck | 0 errors | | vulture dead code | 0 findings |
hamza.khyari added this to the v3.5.0 milestone 2026-03-06 01:45:46 +00:00
hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from c0f22397b6
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 38s
CI / unit_tests (pull_request) Failing after 41s
CI / docker (pull_request) Has been skipped
CI / coverage (pull_request) Failing after 47s
CI / integration_tests (pull_request) Successful in 3m4s
CI / benchmark-regression (pull_request) Successful in 29m41s
to 4d2aad1b25
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m24s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m11s
CI / coverage (pull_request) Successful in 5m12s
CI / benchmark-regression (pull_request) Successful in 29m5s
2026-03-06 02:26:01 +00:00
Compare
hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from 4d2aad1b25
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m24s
CI / docker (pull_request) Successful in 40s
CI / integration_tests (pull_request) Successful in 3m11s
CI / coverage (pull_request) Successful in 5m12s
CI / benchmark-regression (pull_request) Successful in 29m5s
to d3fef49050
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m11s
CI / docker (pull_request) Successful in 1m8s
CI / integration_tests (pull_request) Successful in 4m19s
CI / coverage (pull_request) Successful in 4m33s
CI / benchmark-regression (pull_request) Successful in 30m29s
2026-03-06 13:06:43 +00:00
Compare
brent.edwards requested changes 2026-03-06 19:32:28 +00:00
Dismissed
brent.edwards left a comment

Review — PR #611 feature/m6-acms-domain-specific-analyzers

Reviewer: brent.edwards | Round: 1 | Commit: 4d2aad1b

Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)

Findings Summary

ID Sev Category File Description
F1 P1 Policy postgresql_analyzer.py File is 618 lines — exceeds 500-line limit (CONTRIBUTING §file-size)
F2 P1 Policy domain_analyzers.feature File is 523 lines — exceeds 500-line limit
F3 P1 Bug postgresql_analyzer.py:335-347, 422-443 _extract_body and _split_entries count parentheses without respecting SQL string literals — silent data truncation
F4 P2 Policy domain_analyzer_steps.py:124 # type: ignore[arg-type] suppression — CONTRIBUTING prohibits inline suppressions
F5 P2 Security docker_compose_analyzer.py:138 yaml.safe_load is safe against arbitrary code exec but still vulnerable to billion-laughs alias expansion (quadratic memory). Consider rejecting alias nodes or adding a size guard.
F6 P2 Bug postgresql_analyzer.py:388-393 Quoted-keyword columns silently dropped — "primary" INTEGER is stripped of quotes then matched against _NON_COLUMN_KEYWORDS, so a column literally named primary would be skipped
F7 P2 Process Out-of-scope changes 3 files modified outside #588 scope: repositories_uncovered_branches_steps.py (scoped_session removal), helper_m4_e2e_verification.py and m4_e2e_verification.robot (3 M4 CLI E2E tests removed). These changes also appear in PR #610 and should be in their own commit or excluded from this PR.

Verdict

REQUEST_CHANGES — 3 P1 findings require resolution. The two 500+ line files (F1, F2) need splitting or restructuring. The parenthesis-in-string-literal bug (F3) silently produces wrong results for real-world DDL containing DEFAULT, CHECK, or string constants with parentheses. The out-of-scope changes (F7) should be removed from this PR.

What Went Well

  • Clean separation of analyzer concerns — PostgreSQL and Docker Compose analyzers are well-structured with clear protocols
  • Good BDD coverage with multiple scenarios covering edge cases
  • Lint and typecheck pass cleanly
  • CHANGELOG and CONTRIBUTORS.md correctly updated
  • Commit message follows Conventional Changelog format with issue reference
## Review — PR #611 `feature/m6-acms-domain-specific-analyzers` **Reviewer**: brent.edwards | **Round**: 1 | **Commit**: `4d2aad1b` ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | ### Findings Summary | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F1 | P1 | Policy | `postgresql_analyzer.py` | File is 618 lines — exceeds 500-line limit (CONTRIBUTING §file-size) | | F2 | P1 | Policy | `domain_analyzers.feature` | File is 523 lines — exceeds 500-line limit | | F3 | P1 | Bug | `postgresql_analyzer.py:335-347, 422-443` | `_extract_body` and `_split_entries` count parentheses without respecting SQL string literals — silent data truncation | | F4 | P2 | Policy | `domain_analyzer_steps.py:124` | `# type: ignore[arg-type]` suppression — CONTRIBUTING prohibits inline suppressions | | F5 | P2 | Security | `docker_compose_analyzer.py:138` | `yaml.safe_load` is safe against arbitrary code exec but still vulnerable to billion-laughs alias expansion (quadratic memory). Consider rejecting alias nodes or adding a size guard. | | F6 | P2 | Bug | `postgresql_analyzer.py:388-393` | Quoted-keyword columns silently dropped — `"primary" INTEGER` is stripped of quotes then matched against `_NON_COLUMN_KEYWORDS`, so a column literally named `primary` would be skipped | | F7 | P2 | Process | Out-of-scope changes | 3 files modified outside #588 scope: `repositories_uncovered_branches_steps.py` (scoped_session removal), `helper_m4_e2e_verification.py` and `m4_e2e_verification.robot` (3 M4 CLI E2E tests removed). These changes also appear in PR #610 and should be in their own commit or excluded from this PR. | ### Verdict **REQUEST_CHANGES** — 3 P1 findings require resolution. The two 500+ line files (F1, F2) need splitting or restructuring. The parenthesis-in-string-literal bug (F3) silently produces wrong results for real-world DDL containing `DEFAULT`, `CHECK`, or string constants with parentheses. The out-of-scope changes (F7) should be removed from this PR. ### What Went Well - Clean separation of analyzer concerns — PostgreSQL and Docker Compose analyzers are well-structured with clear protocols - Good BDD coverage with multiple scenarios covering edge cases - Lint and typecheck pass cleanly - CHANGELOG and CONTRIBUTORS.md correctly updated - Commit message follows Conventional Changelog format with issue reference
@ -0,0 +520,4 @@
@docker_compose_analyzer
Scenario: Analyze dict-form volume mapping
Given a DockerComposeAnalyzer analyzer
When I analyze the Docker Compose content:
Member

F2 [P1 · Policy] — This feature file is 523 lines, exceeding the 500-line limit. Consider splitting into separate feature files per analyzer domain (e.g., postgresql_analyzer.feature, docker_compose_analyzer.feature).

**F2 [P1 · Policy]** — This feature file is 523 lines, exceeding the 500-line limit. Consider splitting into separate feature files per analyzer domain (e.g., `postgresql_analyzer.feature`, `docker_compose_analyzer.feature`).
@ -0,0 +121,4 @@
def analyze(self, content: str, resource_uri: str) -> list[UKOTriple]:
return [] # pragma: no cover
context.registry.register(_DuplicatePyAnalyzer()) # type: ignore[arg-type]
Member

F4 [P2 · Policy] — CONTRIBUTING.md prohibits # type: ignore inline suppressions. Please fix the type mismatch at the call site instead of suppressing it. If the register() method accepts a Protocol type, ensure _DuplicatePyAnalyzer fully satisfies it.

**F4 [P2 · Policy]** — CONTRIBUTING.md prohibits `# type: ignore` inline suppressions. Please fix the type mismatch at the call site instead of suppressing it. If the `register()` method accepts a `Protocol` type, ensure `_DuplicatePyAnalyzer` fully satisfies it.
Member

F7 [P2 · Process] — Removing 3 M4 CLI E2E tests (cli-plan-use, cli-plan-execute, cli-plan-tree) is out of scope for issue #588 (domain-specific analyzers). These same changes also appear in PR #610. Please remove this file from the PR or move these changes to a separate commit/PR that references the appropriate issue.

**F7 [P2 · Process]** — Removing 3 M4 CLI E2E tests (`cli-plan-use`, `cli-plan-execute`, `cli-plan-tree`) is out of scope for issue #588 (domain-specific analyzers). These same changes also appear in PR #610. Please remove this file from the PR or move these changes to a separate commit/PR that references the appropriate issue.
@ -0,0 +135,4 @@
"""
if not content or not content.strip():
raise ValueError("content must not be empty.")
if not resource_uri or not resource_uri.strip():
Member

F5 [P2 · Security] — While yaml.safe_load blocks arbitrary code execution, it is still vulnerable to billion-laughs-style alias expansion that can cause quadratic memory usage with crafted YAML input (e.g., nested anchors/aliases). Since this processes user-provided Docker Compose files, consider either:

  1. Using a size check before parsing (if len(content) > MAX_COMPOSE_SIZE), or
  2. Using yaml.safe_load through a wrapper that rejects alias nodes
**F5 [P2 · Security]** — While `yaml.safe_load` blocks arbitrary code execution, it is still vulnerable to billion-laughs-style alias expansion that can cause quadratic memory usage with crafted YAML input (e.g., nested anchors/aliases). Since this processes user-provided Docker Compose files, consider either: 1. Using a size check before parsing (`if len(content) > MAX_COMPOSE_SIZE`), or 2. Using `yaml.safe_load` through a wrapper that rejects alias nodes
@ -0,0 +337,4 @@
subject_uri=t_uri,
predicate="rdf:type",
object_uri="uko-data:Table",
)
Member

F3 [P1 · Bug]_extract_body counts raw ( and ) characters without respecting SQL string literals. A DEFAULT value like 'func(x)' or a CHECK constraint containing 'a)' will cause the depth counter to hit 0 prematurely, silently truncating the table body. The same bug affects _split_entries (line 422-443).

Example that breaks:

CREATE TABLE t (
  col1 TEXT DEFAULT 'hello(world)',
  col2 INTEGER
);

The 'hello(world)' contains an unmatched ( inside a string literal, which increments depth. The ) after world decrements it. Then the actual closing ) on the last line sets depth to 0 too early, producing a truncated body that misses col2.

Fix: track whether you're inside a single-quoted string ('...') and skip paren counting while inside it. PostgreSQL uses '' for escaping quotes inside strings.

**F3 [P1 · Bug]** — `_extract_body` counts raw `(` and `)` characters without respecting SQL string literals. A `DEFAULT` value like `'func(x)'` or a `CHECK` constraint containing `'a)'` will cause the depth counter to hit 0 prematurely, silently truncating the table body. The same bug affects `_split_entries` (line 422-443). Example that breaks: ```sql CREATE TABLE t ( col1 TEXT DEFAULT 'hello(world)', col2 INTEGER ); ``` The `'hello(world)'` contains an unmatched `(` inside a string literal, which increments depth. The `)` after `world` decrements it. Then the actual closing `)` on the last line sets depth to 0 too early, producing a truncated body that misses `col2`. Fix: track whether you're inside a single-quoted string (`'...'`) and skip paren counting while inside it. PostgreSQL uses `''` for escaping quotes inside strings.
@ -0,0 +387,4 @@
return content[paren_start + 1 : i]
return ""
def _parse_table_body(
Member

F6 [P2 · Bug] — The first_word extraction strips double quotes then checks against _NON_COLUMN_KEYWORDS. This means a column legitimately named "primary" or "check" (which PostgreSQL allows via quoting) will be silently skipped.

Fix: only check unquoted first words against _NON_COLUMN_KEYWORDS. If the original first token starts with ", it's a quoted identifier and should never be treated as a keyword.

raw_first = stripped.split()[0] if stripped.split() else ""
if not raw_first.startswith('"') and raw_first.lower() in _NON_COLUMN_KEYWORDS:
    continue
**F6 [P2 · Bug]** — The `first_word` extraction strips double quotes then checks against `_NON_COLUMN_KEYWORDS`. This means a column legitimately named `"primary"` or `"check"` (which PostgreSQL allows via quoting) will be silently skipped. Fix: only check unquoted first words against `_NON_COLUMN_KEYWORDS`. If the original first token starts with `"`, it's a quoted identifier and should never be treated as a keyword. ```python raw_first = stripped.split()[0] if stripped.split() else "" if not raw_first.startswith('"') and raw_first.lower() in _NON_COLUMN_KEYWORDS: continue ```
@ -0,0 +615,4 @@
subject_uri=source_col_uri,
predicate="uko-data:foreignKeyTo",
object_uri=target_col_uri,
)
Member

F1 [P1 · Policy] — This file is 618 lines, exceeding the 500-line limit per CONTRIBUTING.md. Consider extracting helper methods (e.g., _extract_body, _split_entries, _parse_table_body, foreign-key extraction) into a separate _postgresql_helpers.py module, or splitting table-level vs. function/view parsing into separate classes.

**F1 [P1 · Policy]** — This file is 618 lines, exceeding the 500-line limit per CONTRIBUTING.md. Consider extracting helper methods (e.g., `_extract_body`, `_split_entries`, `_parse_table_body`, foreign-key extraction) into a separate `_postgresql_helpers.py` module, or splitting table-level vs. function/view parsing into separate classes.
hamza.khyari force-pushed feature/m6-acms-domain-specific-analyzers from d3fef49050
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m11s
CI / docker (pull_request) Successful in 1m8s
CI / integration_tests (pull_request) Successful in 4m19s
CI / coverage (pull_request) Successful in 4m33s
CI / benchmark-regression (pull_request) Successful in 30m29s
to 13618fb8d5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 47s
CI / unit_tests (pull_request) Successful in 2m35s
CI / integration_tests (pull_request) Successful in 3m13s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m26s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-06 20:07:10 +00:00
Compare
Member

Supplementary Findings — PR #611 (Second Pass)

Following review #2018, a thorough second pass uncovered one additional P1 finding:

F8 [P1 · Process] — CHANGELOG deletes pre-existing #495 entry

The CHANGELOG diff removes the (#495) entry (M4 acceptance criteria validation) from the ## Unreleased section and replaces it with the new (#588) entry. The #588 entry correctly prepends to the top of the section, but the deletion of #495 is wrong — that entry belongs to a different merged PR and must be preserved.

This is the same pattern found in PR #610 (F5). The correct behavior is to prepend the new entry above the existing ones, not replace the first entry.

File: CHANGELOG.md (diff lines 5-11 show the deletion)


Updated totals: 3 P1 + 5 P2 = 8 findings (F1-F8).

## Supplementary Findings — PR #611 (Second Pass) Following review #2018, a thorough second pass uncovered one additional P1 finding: ### F8 [P1 · Process] — CHANGELOG deletes pre-existing #495 entry The CHANGELOG diff removes the `(#495)` entry (M4 acceptance criteria validation) from the `## Unreleased` section and replaces it with the new `(#588)` entry. The #588 entry correctly prepends to the top of the section, but the deletion of #495 is wrong — that entry belongs to a different merged PR and must be preserved. This is the same pattern found in PR #610 (F5). The correct behavior is to **prepend** the new entry above the existing ones, not replace the first entry. **File**: `CHANGELOG.md` (diff lines 5-11 show the deletion) --- **Updated totals**: 3 P1 + 5 P2 = 8 findings (F1-F8).
fix(changelog): prepend #588 entry to top of Unreleased section (F8)
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 38s
CI / unit_tests (pull_request) Successful in 2m42s
CI / integration_tests (pull_request) Successful in 3m2s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m32s
CI / benchmark-regression (pull_request) Successful in 29m33s
fb96d7910b
Move the (#588) CHANGELOG entry from after (#203) to the top of the
## Unreleased section, matching the convention that newest entries are
prepended first. All pre-existing entries (#473, #495, #203, #494)
remain intact and unmodified.

Addresses reviewer finding F8 from PR #611 second-pass review.
Refs: #588
Author
Member

F8 addressed in commit fb96d791.

The (#495) entry was never actually deleted — it was present in the branch at line 30, identical to master. The git diff origin/master..HEAD -- CHANGELOG.md showed a pure 12-line addition with zero deletions of existing entries.

However, the entry ordering was wrong: (#588) was appended after (#203) instead of prepended to the top of ## Unreleased. Fixed by moving the (#588) entry to the first position under ## Unreleased, per convention that newest entries go first.

All pre-existing entries preserved and unchanged: (#473), (#495), (#203), (#494).

Quality gates: lint ✓, typecheck ✓, security_scan ✓, behave 57/57 scenarios ✓.

**F8 addressed** in commit `fb96d791`. The `(#495)` entry was never actually deleted — it was present in the branch at line 30, identical to master. The `git diff origin/master..HEAD -- CHANGELOG.md` showed a pure 12-line addition with zero deletions of existing entries. However, the entry ordering was wrong: `(#588)` was appended after `(#203)` instead of prepended to the top of `## Unreleased`. Fixed by moving the `(#588)` entry to the first position under `## Unreleased`, per convention that newest entries go first. All pre-existing entries preserved and unchanged: `(#473)`, `(#495)`, `(#203)`, `(#494)`. Quality gates: lint ✓, typecheck ✓, security_scan ✓, behave 57/57 scenarios ✓.
brent.edwards approved these changes 2026-03-06 22:30:53 +00:00
Dismissed
brent.edwards left a comment

Verification Review — Round 2 (Post-Fix)

Reviewed at HEAD: fb96d791 (3 commits: 77db78d7 feat, 13618fb8 fix F1-F7, fb96d791 fix F8)

All 8 findings from Review #2018 and Comment #55331 are verified fixed. No new bugs found. Quality gates pass clean.


Finding Disposition

ID Sev Finding Status Evidence
F1 P1 postgresql_analyzer.py was 618 lines (500 limit) FIXED Now 310 lines; helpers extracted to _postgresql_helpers.py (440 lines). Both under limit.
F2 P1 domain_analyzers.feature was 523 lines (500 limit) FIXED Split into 3 files: domain_analyzers.feature (217), postgresql_analyzer.feature (201), docker_compose_analyzer.feature (199). All under limit.
F3 P1 Parentheses in SQL string literals silently truncate extract_body() FIXED extract_body() and split_entries() in _postgresql_helpers.py now track in_string state, handle '' escape sequences. Two regression BDD scenarios added: "DEFAULT with parentheses in string literal" and "CHECK constraint with string containing parentheses". Both pass.
F4 P2 # type: ignore[arg-type] in domain_analyzer_steps.py:124 FIXED Replaced with static protocol assertion: _dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup. No type: ignore anywhere in the file.
F5 P2 YAML alias expansion (billion-laughs) via yaml.safe_load FIXED _MAX_COMPOSE_BYTES = 1_048_576 (1 MiB) guard added in docker_compose_analyzer.py before yaml.safe_load() call. Returns [] with warning log if content exceeds limit.
F6 P2 Quoted SQL keyword columns (e.g. "primary") incorrectly filtered FIXED parse_table_body() now checks not raw_first.startswith('"') before keyword matching, so quoted identifiers bypass the filter. Regression BDD scenario "Quoted keyword column names are not silently dropped" tests "primary", "check", and normal_col. Passes.
F7 P2 Out-of-scope changes (M4 E2E test removal, scoped_session, CONTRIBUTORS.md) FIXED Branch rebased onto current master (23803f14). PR now contains exactly 12 files, all in-scope: 5 new source/helper files, 3 feature files, 1 step file, 2 robot files, and expected modifications to __init__.py, vulture_whitelist.py, CHANGELOG.md.
F8 P1 CHANGELOG deletes pre-existing #495 entry FIXED #588 entry placed at top of Unreleased section. #495 entry and all other pre-existing entries intact.

Quality Gates

Gate Result
nox -s lint PASS — 0 errors
nox -s typecheck PASS — 0 errors, 0 warnings
BDD scenarios (3 analyzer features) PASS — 3 features, 57 scenarios, 211 steps, 0 failures

New Bug Scan

Reviewed all code changes introduced by fix commits 13618fb8 and fb96d791:

  • extract_body() / split_entries() string tracking: Forward-progress guaranteed by i += 1 at loop end; '' escape handled with i += 2 + continue; unterminated strings gracefully return empty. Clean.
  • parse_table_body() keyword filter: startswith('"') check correctly short-circuits for quoted identifiers before keyword comparison. No false positives or negatives in edge cases.
  • Size guard in DockerComposeAnalyzer: len(content) measures characters (not bytes), but the naming _MAX_COMPOSE_BYTES is slightly imprecise. Functionally correct since byte count ≥ char count for UTF-8, making the guard conservative. Non-blocking observation.
  • Protocol assertion pattern: _dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup is idiomatic and has zero runtime cost. Clean.
  • Feature file split: All scenarios preserved, regression scenarios added. Scenario counts align with test run output.

No new bugs found.

Verdict

All P1 and P2 findings from Round 1 are resolved. Code is clean, well-structured, and fully tested. Approved for merge.

## Verification Review — Round 2 (Post-Fix) **Reviewed at HEAD:** `fb96d791` (3 commits: `77db78d7` feat, `13618fb8` fix F1-F7, `fb96d791` fix F8) All 8 findings from Review #2018 and Comment #55331 are **verified fixed**. No new bugs found. Quality gates pass clean. --- ### Finding Disposition | ID | Sev | Finding | Status | Evidence | |:---|:----|:--------|:-------|:---------| | F1 | P1 | `postgresql_analyzer.py` was 618 lines (500 limit) | **FIXED** | Now 310 lines; helpers extracted to `_postgresql_helpers.py` (440 lines). Both under limit. | | F2 | P1 | `domain_analyzers.feature` was 523 lines (500 limit) | **FIXED** | Split into 3 files: `domain_analyzers.feature` (217), `postgresql_analyzer.feature` (201), `docker_compose_analyzer.feature` (199). All under limit. | | F3 | P1 | Parentheses in SQL string literals silently truncate `extract_body()` | **FIXED** | `extract_body()` and `split_entries()` in `_postgresql_helpers.py` now track `in_string` state, handle `''` escape sequences. Two regression BDD scenarios added: "DEFAULT with parentheses in string literal" and "CHECK constraint with string containing parentheses". Both pass. | | F4 | P2 | `# type: ignore[arg-type]` in `domain_analyzer_steps.py:124` | **FIXED** | Replaced with static protocol assertion: `_dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup`. No `type: ignore` anywhere in the file. | | F5 | P2 | YAML alias expansion (billion-laughs) via `yaml.safe_load` | **FIXED** | `_MAX_COMPOSE_BYTES = 1_048_576` (1 MiB) guard added in `docker_compose_analyzer.py` before `yaml.safe_load()` call. Returns `[]` with warning log if content exceeds limit. | | F6 | P2 | Quoted SQL keyword columns (e.g. `"primary"`) incorrectly filtered | **FIXED** | `parse_table_body()` now checks `not raw_first.startswith('"')` before keyword matching, so quoted identifiers bypass the filter. Regression BDD scenario "Quoted keyword column names are not silently dropped" tests `"primary"`, `"check"`, and `normal_col`. Passes. | | F7 | P2 | Out-of-scope changes (M4 E2E test removal, scoped_session, CONTRIBUTORS.md) | **FIXED** | Branch rebased onto current master (`23803f14`). PR now contains exactly 12 files, all in-scope: 5 new source/helper files, 3 feature files, 1 step file, 2 robot files, and expected modifications to `__init__.py`, `vulture_whitelist.py`, `CHANGELOG.md`. | | F8 | P1 | CHANGELOG deletes pre-existing #495 entry | **FIXED** | `#588` entry placed at top of Unreleased section. `#495` entry and all other pre-existing entries intact. | ### Quality Gates | Gate | Result | |:-----|:-------| | `nox -s lint` | **PASS** — 0 errors | | `nox -s typecheck` | **PASS** — 0 errors, 0 warnings | | BDD scenarios (3 analyzer features) | **PASS** — 3 features, 57 scenarios, 211 steps, 0 failures | ### New Bug Scan Reviewed all code changes introduced by fix commits `13618fb8` and `fb96d791`: - **`extract_body()` / `split_entries()` string tracking**: Forward-progress guaranteed by `i += 1` at loop end; `''` escape handled with `i += 2` + `continue`; unterminated strings gracefully return empty. Clean. - **`parse_table_body()` keyword filter**: `startswith('"')` check correctly short-circuits for quoted identifiers before keyword comparison. No false positives or negatives in edge cases. - **Size guard in DockerComposeAnalyzer**: `len(content)` measures characters (not bytes), but the naming `_MAX_COMPOSE_BYTES` is slightly imprecise. Functionally correct since byte count ≥ char count for UTF-8, making the guard conservative. Non-blocking observation. - **Protocol assertion pattern**: `_dup: type[AnalyzerProtocol] = _DuplicatePyAnalyzer; del _dup` is idiomatic and has zero runtime cost. Clean. - **Feature file split**: All scenarios preserved, regression scenarios added. Scenario counts align with test run output. **No new bugs found.** ### Verdict All P1 and P2 findings from Round 1 are resolved. Code is clean, well-structured, and fully tested. **Approved for merge.**
brent.edwards left a comment

Approved for merge!

Approved for merge!
hamza.khyari deleted branch feature/m6-acms-domain-specific-analyzers 2026-03-06 23:07:35 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!611
No description provided.