← Portfolio

Full Report · GIMS Compliance Relay · refactor + Part 11 hardening

From cosmetic compliance to a tamper-evident 21 CFR Part 11 relay

A ground-up rebuild of a copy-paste-grown FastAPI + Tauri app — whose compliance guarantees were largely decorative — into a lean, defensible Part 11 logging service: one append-only HMAC-chained trail, non-forgeable attribution, real two-component e-signatures, provider-neutral storage, and a two-tier edge → server-of-record custody model. Then a second pass that closed the twelve lagging §11.x edges the design described but the code hadn't yet delivered.

The constraint that shaped it: Part 11 defensibility first. The integrity spine was hand-built and hand-tested; breadth was built by multi-agent workflows; and every claim of "this control actually holds" was put through an adversarial verification pass whose only job was to refute it — which is how the two deployment-breaking bugs in this report were caught before shipping.

Package gims_relay/ · standardized on port 8400 · 134 tests pass on live Postgres (0 skipped) · 122 + 12 PG-skip on the zero-dependency SQLite stack · 2026-06-23 → 2026-06-27

01Snapshot

The product captures regulated operational events (primarily instrument/NMR file outputs) into an append-only, tamper-evident, attributable trail. The refactor kept what worked — the storage resolver, the file watcher, the desktop shell, the dormant AWS stack — and rebuilt what didn't actually deliver its claims: identity, the checksum chain, write enforcement, and e-signatures.

134
tests pass
on live Postgres (0 skipped)
12 / 12
lagging Part 11 points
closed (H1–H4, M1–M6, L1–L3)
5
neutral storage ports
(core imports no cloud SDK)
2
deployment tiers
edge · server-of-record
42501
SQLSTATE — writes denied
by GRANT on live PG
3
adversarial workflows
(19 · 42 · 13 agents)

Where it landed

A self-contained gims_relay/ package: one unified event_log trail, HMAC-SHA256 chained over prev_checksum + chain_seq; server-side non-forgeable attribution; real signed-JWT login + two-component e-signatures bound to the records they sign; sealed, offline-verifiable exports; a hardened file watcher with no silent capture gaps; provider-neutral ports with local (Postgres/SQLite + MinIO/FS) and AWS adapters live and Azure/GCP stubs proving the seam; a durable idempotent edge→server forwarder; scheduled notarization; per-provider Terraform; and CI (including a LocalStack AWS-adapter job and a real-Postgres job).

02The starting point — compliance that was cosmetic

Derived from the parent GIMS-Project and grown by iterative copy-paste, the app looked like a Part 11 system. Underneath, the load-bearing guarantees didn't hold. Every item below was a real defect in the shipped code, not a hypothetical.

Attribution was forgeable crit

  • Unsigned fake JWTs (alg:none) decoded without verification (_decode_jwt_no_verify)
  • X-User-Id header trusted raw from the client
  • A no-token append silently wrote user_id="unknown"

The chain was decorative crit

  • checksum was computed without prev_checksum — records weren't actually chained
  • Unkeyed SHA-256: anyone with file access could recompute a "valid" chain
  • Immutability rested only on SQLite triggers, in a file the operator owns

Enforcement & signing were absent high

  • /compliance/log/append bypassed the guard; the watcher posted to it server-side
  • The e-signature node re-authed against a login that didn't exist in the wired build
  • Two rules nodes registered the same route; the RBAC guard was unregistered

Confidentiality & hygiene med

  • Read/export endpoints fully public and unlogged; audit_log never written
  • Local-clock timestamps; schema DDL defined twice, divergently
  • Committed secrets, DBs, and the full build bundle in the repo
Seven locked decisions (2026-06-23): central Postgres server-of-record target with a working local/offline mode; OS-bound identity for automatic capture but real password re-auth for human e-signatures (§11.200); simplify to a lean service (delete the orchestration framework); Part 11 defensibility first; merge the two logs into one append-only signed event_log; the realistic unit is one workstation / one instrument; and a provider-neutral storage seam — ship local + AWS now, Azure/GCP as documented stubs.

03Timeline — how it unfolded

Five days, in three movements: build the integrity spine and the surrounding service, audit it hard for robustness, then close the compliance edges the first pass had left behind.

06-23
CORE

Integrity spine + breadth build

The HMAC chain (over prev_checksum + chain_seq), canonical encoder, and verifier were hand-built and unit-tested (16 tests); the rest of the package — attribution, e-signatures, access control, sealed exports, the file watcher, the provider-neutral ports — was built by a 19-agent workflow. An adversarial verification pass found 6 real defects (1 refuted); all 6 fixed and locked with regression tests. 52 pass / 8 skip, live-verified under uvicorn including a forged-identity rejection.

06-23
SVC

Service, infra, and legacy retirement

Git hygiene (secrets/DBs/bundle untracked + rotated); a durable, ordered, resumable edge→server forwarder; per-provider Terraform + a CI pipeline with a LocalStack AWS-adapter job; a legacy compliance_log → event_log migration tool; the Tauri shell re-pointed at the new app; a served single-page UI; and the entire legacy tree retired so the package is self-contained.

06-23
HARDEN

Robustness audit — 42 agents

A vertical-depth workflow surfaced 36 candidates; 30 confirmed worth fixing (3 critical). 29 fixed — notary deadlock, a proven-zero SQLite fd-leak over 600 ops, path traversal, secret-file perms, shared PG DSN + TLS, threadpool-offloaded async auth, truncated-capture guards, lifespan migration, and more. 1 LOW deferred. 63 pass / 8 skip.

06-26
AUDIT

Fresh adversarial audit — the edges are behind the design

A clean audit found the integrity core solid but the custody / forwarding / notarization edges materially behind: the restricted DB role was never implemented, notarization was unscheduled, forwarding dropped attribution and never verified the edge chain (and couldn't even authenticate), and several §11.x account/time/key controls were stubbed. Twelve evidence-backed findings (H1–H4, M1–M6, L1–L3) recorded as a hardening punch-list.

06-27
P11

Part 11 hardening — all 12 closed, then adversarially re-verified

~1,490 insertions across 38 files (10 new) closed every finding. A 13-agent adversarial verification then refuted M3 and surfaced a deployment-breaking self-event/forwarded-trail collision; both fixed before this report. Final suite 134 pass on live Postgres (0 skipped); 122 + 12 PG-skip on SQLite. The parent GIMS-Project is adopting this integrity core.

04What got built

Integrity core

  • One unified event_log — event kind is a column, not a table
  • HMAC-SHA256 over the canonical record including prev_checksum + chain_seq
  • Standalone verifier module + CLI; gap-detectable per-trail ordinals
  • Append-only via DB triggers plus a least-privilege role (defense in depth)

Identity & signatures

  • Server-side OS-bound attribution; alg:none / raw X-User-Id trust removed
  • Unattributed events are rejected, never written as "unknown"
  • Real signed-JWT password login (Argon2/PBKDF2-600k)
  • Two-component e-signatures (§11.200) capturing signer, meaning, reason, and time — bound to the target record's checksum

Capture, access & export

  • File watcher: content-addressed dedup, initial downtime-window scan, crash auto-restart with backoff, CAPTURE_FAILED markers — no silent gaps
  • Auth + ACCESS/EXPORT logging on every read/export/verify
  • Sealed exports binding the exact rows (rows_digest) and CSV bytes (rendered_sha256); offline verifier re-checks the package

Provider-neutral platform

  • Ports: SecretProvider · RecordStore · ObjectStore · Notary · TimeSource
  • Adapters: local (Postgres/SQLite + MinIO/FS) & AWS (Secrets Manager, RDS, S3 Object-Lock) live; Azure/GCP stubs prove the seam
  • Two tiers from one codebase via RELAY_MODE=local|server; durable idempotent forwarder
  • Notarization (filesystem + S3 object-lock); per-provider Terraform; docker-compose dev = valid on-prem deploy

05Part 11 hardening — the twelve lagging edges

The first build's honest "Remaining" list was understated. A fresh adversarial audit enumerated exactly where a compliance claim in the design was not yet matched by the code — stubbed, unscheduled, unverified, or weaker than described. All twelve were closed in phases R1–R4.

IDSevFinding → what closed itStatus
H1high Restricted DB role never implemented — app ran with full write. → Two-role custody: runtime role holds SELECT,INSERT only; a separate admin DSN runs DDL. Grant-denial proven on live PG (SQLSTATE 42501). done
H2high Notarization was append-count-triggered, unscheduled, operator-owned. → NotaryScheduler anchors every trail head on an interval and on shutdown; failures emit a SYSTEM event; local-without-WORM logs a startup warning. done
H3high Forwarding dropped attribution/time/signature and never verified the edge chain. → Forward the whole signed record; a new authenticated ingest recomputes the edge HMAC, enforces a gapless link, and persists verbatim. done
H4high Forwarder couldn't authenticate to a server-mode ingest. → Service credential (GIMS_FORWARD_TOKEN/GIMS_INGEST_TOKEN), constant-time, fail-closed; startup warning if a forward URL is set without a token. done
M1med Local-mode keys sat in a plaintext 0600 file. → Opt-in KeyringSecretProvider (DPAPI/Keychain/libsecret) with runtime fallback + a loud silent-downgrade warning; the 0600 file is the documented residual. done
M2med §11.70 signature binding was cosmetic. → sign() rejects any target_checksum not present in the same trail. done
M3med A poison event head-of-line-blocked the forwarder. → Stop-on-poison: a genuine 4xx is dead-lettered + raises a SYSTEM event and forwarding halts (never skips, which would gap the replica); 401/403/408/429/5xx are retryable. done
M4med Trusted time was just the local clock. → NtpValidatedTimeSource labels system-vs-validated time + offset; a startup posture event; edge↔server skew beyond a threshold emits a SYSTEM warning. done
M5med Open self-service registration. → Bootstrap-then-authenticate: first account is open, then registration needs an authenticated actor; every creation is an audited REGISTER event; a flag can lock it entirely. done
M6med Postgres — the server-of-record engine — was entirely unverified. → A CI Postgres job runs the conformance + grant-denial suites; GIMS_REQUIRE_PG=1 makes the PG fixtures fail, not skip. done
L1low No lockout on login / sign (§11.300). → In-memory LoginThrottle: per-account attempt throttling + temporary lockout (429 + Retry-After); a lockout emits a SYSTEM event. done
L2low Azure/GCP stub constructors raised, so the seam was untested. → Construction allowed (only method use raises); a port-conformance test builds each stub against its ABC. (L3: the "periodic" docstring corrected.) done

06How the claims were verified

The whole product is integrity, so verification was adversarial by construction: every "this control holds" claim faced an agent whose only job was to refute it, reading the real code and tests. That is how the two most dangerous bugs in the build were caught.

The two bugs the adversarial pass caught

  • Server self-events collided with forwarded trails (deployment-breaking). The server wrote its own startup/notary/time events — and the M4 clock_skew event — into the forwarded trail. Under H3's strictly-gapless ingest check, that bumped the server head between forwarded records and cascaded every later record to 422: the M4 feature literally self-destructed the H3/M3 forward stream. Fix: all relay-process events now go to a reserved _relay trail; the server never writes into a forwarded trail, so an edge trail stays a verbatim, gapless replica.
  • M3 "advance past poison" cascaded a silent gap. Skipping a rejected record breaks the gapless replica for every successor. Fix: stop-on-poison + dead-letter + SYSTEM event; the test that "passed" used a gap-accepting mock and now faithfully proves stop-not-cascade.

Live, not mocked

  • Restricted role denies UPDATE/DELETE/TRUNCATE by grant on real Postgres 16 (SQLSTATE 42501) — earlier/distinct from the append-only trigger (P0001)
  • End-to-end under uvicorn, including a forged-identity rejection
  • AWS adapter exercised against LocalStack in CI — secrets, connection shaping, object put/get/list — with no real account

Test baseline, tracked

  • 52 → 63 → 134 passing on live Postgres (0 skipped)
  • 122 + 12 PG-skip on the zero-dependency SQLite-only stack
  • Adapter-conformance suite runs the same tests against every RecordStore/ObjectStore/SecretProvider
  • Multi-agent passes: 19 (breadth) · 42 (robustness, 30 confirmed) · 13 (hardening verification)

Parent-project cross-check

The parent GIMS-Project (a multi-user web LIMS) is adopting this Relay's integrity core and independently implemented the same two-role custody (H1 / its P8) and NTP time (M4 / its P9) — so this work confirms its direction rather than contradicting it. The forwarding items and the _relay trail are Relay-only edge concerns the parent scopes out.

07The road not taken — OS-delegated signing

One design decision is worth recording because it shows the compliance reasoning, not just the code: how to let more than one human sign. We shipped Option 1 (app-managed accounts + a Manage-Operators surface) and deliberately deferred Option 2 (sign with your OS/Windows credential). The trigger for revisiting it is a specific customer request, not a technical whim.

Why a customer might want it

  • One credential — the e-signature password is the Windows password they already type
  • No manual provisioning; offboarding is owned by IT/Active Directory
  • Single source of truth for identity on AD/SSO-invested sites

Why it's off by default

  • It inverts the load-bearing invariant: OS identity is resolved server-side and never asserted over the wire — Option 2 opens a brand-new trust boundary
  • Unsafe on shared/local-login workstations — collapses two humans into one identity (violates §11.300(a) / §11.200)
  • Platform fragility (LogonUser/PAM privilege needs) and an inherited §11.300 password-policy burden that must be validated per deployment
Decision: not building now. If requested, start from the hybrid shape — keep app_user as the canonical signing identity, add an explicit audited os_identity ⇄ app_user binding, and gate an OS-reauth branch behind a per-deployment flag (GIMS_SIGN_OS_DELEGATED, default off) — treating the network-trust inversion and the shared-workstation exclusion as non-negotiable acceptance criteria.

08Residual risks & what's left

Stated plainly, because a compliance product that hides its residuals isn't defensible.

Documented residual risks (not bugs)

  • Forwarded-path attribution trust rests on the shared trail HMAC key — the server proves a key-holder sealed a record, not which human.
  • Local-mode notary is an operator-owned filesystem anchor unless S3/MinIO object-lock is configured (no external WORM).
  • Audit/integrity SYSTEM events follow the codebase's best-effort pattern — but their absence is itself detectable via chain verification.

Status: defensible end-to-end

The integrity core, the twelve Part 11 hardening points (H1–L3), and the two-tier custody model are complete and adversarially verified. The product remains — accurately — "designed to support 21 CFR Part 11"; formal computer-system validation (CSV) is the customer's responsibility, as intended.