feat(registry): implement canonicalization engine — NFC, key sorting, RFC-8785, SHA-1 #25

Open
opened 2026-06-05 17:34:02 +00:00 by CoreRasurae · 1 comment
Member

Metadata

Commit Message: feat(registry): implement Canonicalizer with NFC normalization and SHA-1 hashing
Branch: feature/m1-registry-canonicalization

Background and context

The Package Registry Standard v1.0.0 (§6) defines a strict canonicalization process for producing deterministic SHA-1 hashes from package content. This ensures content-addressed immutability: same content always produces same PackageId. The process involves NFC normalization, lexicographic key sorting, lifecycle field stripping, dependency resolution, and RFC-8785 JSON serialization.

Part of Epic: Package Registry Client — Support Package Registry Standard v1.0.0

Current behavior

No canonicalization or SHA-1 hashing exists.

Expected behavior

  • Canonicalizer.canonicalize(content: dict) -> str produces deterministic RFC-8785 JSON
  • Canonicalizer.compute_package_id(content: dict, package_type: PackageType) -> PackageId computes SHA-1 + formats ID
  • NFC normalization applied to all string values
  • Dictionary keys sorted lexicographically
  • Lifecycle fields (version, release_date) stripped before hashing
  • Internal references resolved to concrete PackageIds in canonical form

Acceptance criteria

  • Same content always produces the same PackageId
  • Key ordering does not affect the hash
  • Lifecycle fields do not affect the hash
  • NFC normalization eliminates encoding variants
  • Test vectors from §14 produce expected SHA-1 values
  • Different content produces different PackageIds

Subtasks

  • Create src/cleveractors/registry/canonical.py with Canonicalizer class
  • Implement NFC normalization for all string values
  • Implement lexicographic key sorting
  • Implement lifecycle field stripping
  • Implement RFC-8785 JSON serialization
  • Implement SHA-1 hashing via hashlib.sha1()
  • Internal reference resolution in canonical form
  • Tests (Behave): features/registry_canonicalization.feature — test vectors, edge cases
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line matches the Commit Message in Metadata exactly.
  • The commit is pushed to the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a PR to master, reviewed, and merged.
## Metadata Commit Message: feat(registry): implement Canonicalizer with NFC normalization and SHA-1 hashing Branch: feature/m1-registry-canonicalization ## Background and context The Package Registry Standard v1.0.0 (§6) defines a strict canonicalization process for producing deterministic SHA-1 hashes from package content. This ensures content-addressed immutability: same content always produces same PackageId. The process involves NFC normalization, lexicographic key sorting, lifecycle field stripping, dependency resolution, and RFC-8785 JSON serialization. Part of Epic: Package Registry Client — Support Package Registry Standard v1.0.0 ## Current behavior No canonicalization or SHA-1 hashing exists. ## Expected behavior - `Canonicalizer.canonicalize(content: dict) -> str` produces deterministic RFC-8785 JSON - `Canonicalizer.compute_package_id(content: dict, package_type: PackageType) -> PackageId` computes SHA-1 + formats ID - NFC normalization applied to all string values - Dictionary keys sorted lexicographically - Lifecycle fields (`version`, `release_date`) stripped before hashing - Internal references resolved to concrete PackageIds in canonical form ## Acceptance criteria - Same content always produces the same PackageId - Key ordering does not affect the hash - Lifecycle fields do not affect the hash - NFC normalization eliminates encoding variants - Test vectors from §14 produce expected SHA-1 values - Different content produces different PackageIds ## Subtasks - [ ] Create `src/cleveractors/registry/canonical.py` with `Canonicalizer` class - [ ] Implement NFC normalization for all string values - [ ] Implement lexicographic key sorting - [ ] Implement lifecycle field stripping - [ ] Implement RFC-8785 JSON serialization - [ ] Implement SHA-1 hashing via `hashlib.sha1()` - [ ] Internal reference resolution in canonical form - [ ] Tests (Behave): features/registry_canonicalization.feature — test vectors, edge cases - [ ] Verify coverage >=97% via nox -s coverage_report - [ ] Run nox (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the first line matches the Commit Message in Metadata exactly. - The commit is pushed to the branch matching the Branch in Metadata exactly. - The commit is submitted as a PR to master, reviewed, and merged.
CoreRasurae added this to the v2.1.0 milestone 2026-06-05 17:34:02 +00:00
CoreRasurae changed title from Canonicalization engine — NFC, key sorting, RFC-8785, SHA-1 to feat(registry): implement canonicalization engine — NFC, key sorting, RFC-8785, SHA-1 2026-06-05 17:47:17 +00:00
Author
Member

Implemented in PR #36: #36

PR blocks this issue.

Implementation Summary

  • Created src/cleveractors/registry/canonical.py with Canonicalizer class
  • NFC normalization, key sorting, lifecycle stripping, RFC-8785 JSON, SHA-1 hashing
  • 23 Behave BDD scenarios passing
  • canonical.py: 100% line coverage
  • Overall: 97.04% coverage

All quality gates:

  • lint: pass
  • typecheck: pass
  • security_scan: pass
  • dead_code: pass
  • unit_tests: 1926 scenarios pass (0 failed)
  • coverage_report: 97.04% (threshold: 96.5%)
Implemented in PR #36: https://git.cleverthis.com/cleveragents/cleveractors-core/pulls/36 PR blocks this issue. ## Implementation Summary - Created `src/cleveractors/registry/canonical.py` with `Canonicalizer` class - NFC normalization, key sorting, lifecycle stripping, RFC-8785 JSON, SHA-1 hashing - 23 Behave BDD scenarios passing - canonical.py: 100% line coverage - Overall: 97.04% coverage All quality gates: - lint: pass - typecheck: pass - security_scan: pass - dead_code: pass - unit_tests: 1926 scenarios pass (0 failed) - coverage_report: 97.04% (threshold: 96.5%)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveractors-core#25
No description provided.