Full Report · GIMS Compliance Relay · refactor + Part 11 hardening
A ground-up rebuild of a copy-paste-grown FastAPI + Tauri app — whose compliance guarantees were largely decorative — into a lean, defensible Part 11 logging service: one append-only HMAC-chained trail, non-forgeable attribution, real two-component e-signatures, provider-neutral storage, and a two-tier edge → server-of-record custody model. Then a second pass that closed the twelve lagging §11.x edges the design described but the code hadn't yet delivered.
The product captures regulated operational events (primarily instrument/NMR file outputs) into an append-only, tamper-evident, attributable trail. The refactor kept what worked — the storage resolver, the file watcher, the desktop shell, the dormant AWS stack — and rebuilt what didn't actually deliver its claims: identity, the checksum chain, write enforcement, and e-signatures.
A self-contained gims_relay/ package: one unified event_log
trail, HMAC-SHA256 chained over prev_checksum + chain_seq; server-side
non-forgeable attribution; real signed-JWT login + two-component e-signatures bound to the records
they sign; sealed, offline-verifiable exports; a hardened file watcher with no silent capture gaps;
provider-neutral ports with local (Postgres/SQLite + MinIO/FS) and AWS adapters live and
Azure/GCP stubs proving the seam; a durable idempotent edge→server forwarder; scheduled notarization;
per-provider Terraform; and CI (including a LocalStack AWS-adapter job and a real-Postgres job).
Derived from the parent GIMS-Project and grown by iterative copy-paste, the app
looked like a Part 11 system. Underneath, the load-bearing guarantees didn't hold. Every item
below was a real defect in the shipped code, not a hypothetical.
alg:none) decoded without verification (_decode_jwt_no_verify)X-User-Id header trusted raw from the clientuser_id="unknown"checksum was computed without prev_checksum — records weren't actually chained/compliance/log/append bypassed the guard; the watcher posted to it server-sideaudit_log never writtenevent_log; the realistic unit is one workstation / one instrument; and a provider-neutral
storage seam — ship local + AWS now, Azure/GCP as documented stubs.Five days, in three movements: build the integrity spine and the surrounding service, audit it hard for robustness, then close the compliance edges the first pass had left behind.
The HMAC chain (over prev_checksum + chain_seq), canonical encoder, and
verifier were hand-built and unit-tested (16 tests); the rest of the package — attribution,
e-signatures, access control, sealed exports, the file watcher, the provider-neutral ports — was
built by a 19-agent workflow. An adversarial verification pass found 6 real defects (1 refuted);
all 6 fixed and locked with regression tests. 52 pass / 8 skip, live-verified under uvicorn
including a forged-identity rejection.
Git hygiene (secrets/DBs/bundle untracked + rotated); a durable, ordered, resumable edge→server
forwarder; per-provider Terraform + a CI pipeline with a LocalStack AWS-adapter job; a legacy
compliance_log → event_log migration tool; the Tauri shell re-pointed at the new app;
a served single-page UI; and the entire legacy tree retired so the package is self-contained.
A vertical-depth workflow surfaced 36 candidates; 30 confirmed worth fixing (3 critical). 29 fixed — notary deadlock, a proven-zero SQLite fd-leak over 600 ops, path traversal, secret-file perms, shared PG DSN + TLS, threadpool-offloaded async auth, truncated-capture guards, lifespan migration, and more. 1 LOW deferred. 63 pass / 8 skip.
A clean audit found the integrity core solid but the custody / forwarding / notarization edges materially behind: the restricted DB role was never implemented, notarization was unscheduled, forwarding dropped attribution and never verified the edge chain (and couldn't even authenticate), and several §11.x account/time/key controls were stubbed. Twelve evidence-backed findings (H1–H4, M1–M6, L1–L3) recorded as a hardening punch-list.
~1,490 insertions across 38 files (10 new) closed every finding. A 13-agent adversarial
verification then refuted M3 and surfaced a deployment-breaking self-event/forwarded-trail
collision; both fixed before this report. Final suite 134 pass on live Postgres (0 skipped);
122 + 12 PG-skip on SQLite. The parent GIMS-Project is adopting this integrity core.
event_log — event kind is a column, not a tableprev_checksum + chain_seqalg:none / raw X-User-Id trust removed"unknown"checksumCAPTURE_FAILED markers — no silent gapsACCESS/EXPORT logging on every read/export/verifyrows_digest) and CSV bytes (rendered_sha256); offline verifier re-checks the packageSecretProvider · RecordStore · ObjectStore · Notary · TimeSourceRELAY_MODE=local|server; durable idempotent forwarderdocker-compose dev = valid on-prem deployThe first build's honest "Remaining" list was understated. A fresh adversarial audit enumerated exactly where a compliance claim in the design was not yet matched by the code — stubbed, unscheduled, unverified, or weaker than described. All twelve were closed in phases R1–R4.
| ID | Sev | Finding → what closed it | Status |
|---|---|---|---|
| H1 | high | Restricted DB role never implemented — app ran with full write. → Two-role custody: runtime role holds SELECT,INSERT only; a separate admin DSN runs DDL. Grant-denial proven on live PG (SQLSTATE 42501). |
done |
| H2 | high | Notarization was append-count-triggered, unscheduled, operator-owned. → NotaryScheduler anchors every trail head on an interval and on shutdown; failures emit a SYSTEM event; local-without-WORM logs a startup warning. |
done |
| H3 | high | Forwarding dropped attribution/time/signature and never verified the edge chain. → Forward the whole signed record; a new authenticated ingest recomputes the edge HMAC, enforces a gapless link, and persists verbatim. | done |
| H4 | high | Forwarder couldn't authenticate to a server-mode ingest. → Service credential (GIMS_FORWARD_TOKEN/GIMS_INGEST_TOKEN), constant-time, fail-closed; startup warning if a forward URL is set without a token. |
done |
| M1 | med | Local-mode keys sat in a plaintext 0600 file. → Opt-in KeyringSecretProvider (DPAPI/Keychain/libsecret) with runtime fallback + a loud silent-downgrade warning; the 0600 file is the documented residual. |
done |
| M2 | med | §11.70 signature binding was cosmetic. → sign() rejects any target_checksum not present in the same trail. |
done |
| M3 | med | A poison event head-of-line-blocked the forwarder. → Stop-on-poison: a genuine 4xx is dead-lettered + raises a SYSTEM event and forwarding halts (never skips, which would gap the replica); 401/403/408/429/5xx are retryable. |
done |
| M4 | med | Trusted time was just the local clock. → NtpValidatedTimeSource labels system-vs-validated time + offset; a startup posture event; edge↔server skew beyond a threshold emits a SYSTEM warning. |
done |
| M5 | med | Open self-service registration. → Bootstrap-then-authenticate: first account is open, then registration needs an authenticated actor; every creation is an audited REGISTER event; a flag can lock it entirely. |
done |
| M6 | med | Postgres — the server-of-record engine — was entirely unverified. → A CI Postgres job runs the conformance + grant-denial suites; GIMS_REQUIRE_PG=1 makes the PG fixtures fail, not skip. |
done |
| L1 | low | No lockout on login / sign (§11.300). → In-memory LoginThrottle: per-account attempt throttling + temporary lockout (429 + Retry-After); a lockout emits a SYSTEM event. |
done |
| L2 | low | Azure/GCP stub constructors raised, so the seam was untested. → Construction allowed (only method use raises); a port-conformance test builds each stub against its ABC. (L3: the "periodic" docstring corrected.) | done |
The whole product is integrity, so verification was adversarial by construction: every "this control holds" claim faced an agent whose only job was to refute it, reading the real code and tests. That is how the two most dangerous bugs in the build were caught.
clock_skew event — into the
forwarded trail. Under H3's strictly-gapless ingest check, that bumped the server head between
forwarded records and cascaded every later record to 422: the M4 feature literally
self-destructed the H3/M3 forward stream. Fix: all relay-process events now go to a reserved
_relay trail; the server never writes into a forwarded trail, so an edge trail stays a
verbatim, gapless replica.SYSTEM
event; the test that "passed" used a gap-accepting mock and now faithfully proves stop-not-cascade.UPDATE/DELETE/TRUNCATE by grant on real Postgres 16 (SQLSTATE 42501) — earlier/distinct from the append-only trigger (P0001)RecordStore/ObjectStore/SecretProviderThe parent GIMS-Project (a multi-user web LIMS) is adopting this
Relay's integrity core and independently implemented the same two-role custody (H1 / its P8) and
NTP time (M4 / its P9) — so this work confirms its direction rather than contradicting it. The
forwarding items and the _relay trail are Relay-only edge concerns the parent scopes out.
One design decision is worth recording because it shows the compliance reasoning, not just the code: how to let more than one human sign. We shipped Option 1 (app-managed accounts + a Manage-Operators surface) and deliberately deferred Option 2 (sign with your OS/Windows credential). The trigger for revisiting it is a specific customer request, not a technical whim.
LogonUser/PAM privilege needs) and an inherited §11.300 password-policy burden that must be validated per deploymentapp_user as the canonical signing identity, add an explicit audited
os_identity ⇄ app_user binding, and gate an OS-reauth branch behind a per-deployment flag
(GIMS_SIGN_OS_DELEGATED, default off) — treating the network-trust inversion and the
shared-workstation exclusion as non-negotiable acceptance criteria.Stated plainly, because a compliance product that hides its residuals isn't defensible.
SYSTEM events follow the codebase's best-effort pattern — but their absence is itself detectable via chain verification.pyinstaller gims_relay.spec → stage into src-tauri/bin/ → tauri build (reproduced in CI as an artifact; not committed)..dev_jwt_secret (deliberately deferred — destructive).The integrity core, the twelve Part 11 hardening points (H1–L3), and the two-tier custody model are complete and adversarially verified. The product remains — accurately — "designed to support 21 CFR Part 11"; formal computer-system validation (CSV) is the customer's responsibility, as intended.