GIMS Backend Refactor — Full Report (the whole branch)

01Snapshot

Despite enormous activity, the codebase ended up smaller and provable — the work was dominated by collapse and dedup, gated on a green baseline at every commit.

152

commits
over 4 days

533

unique files touched

86,896

cumulative churn
(sum of every commit)

−3,100

NET lines
(the tree shrank)

563

tests pass
(from ~382 early on)

8 / 8

backend phases done
(0–8 + R1–R21 + Part 11)

Where it landed

A layered backend — api/routers/ (HTTP) · core/ (logic) · nodes/ (orchestration) · modules/ (registration) · utils/ (kernel) — with a unified SQL record store behind a pluggable provider registry, one part-of-speech engine, a hardened-container execution backend, an HMAC-chained Part 11 audit/compliance core, a uniform error contract, and CI. App entry is api.app:app; the live route surface is stable at 320 OpenAPI paths / 368 routes (order-hash b361335f…).

02Timeline — how it unfolded

The branch moved in waves: build the kernel & contracts, unify the domain model, cut storage over to SQL, harden execution, then reorganize the whole HTTP layer, collapse the duplication, and finish with compliance + the last cohesion items.

06-23

0–2

Foundation — hygiene · shared kernel · error contract

Pinned requirements.txt, consolidated utils/{logger,paths,config,atomic} as the kernel, and introduced core/errors.py AppError + a central handler so every failure renders one envelope. (The error-contract tail and CI were finished on 06-27.)

tag pre-refactor b162595

06-24

Cohesion core — one part-of-speech engine

Built core/words/: WordType, WordHandler, behaviors, the unified Descriptor, validation, WordRegistry, a single normalize-on-read choke point (reader.read_types tolerates list or keyed-dict on disk), and one read-only /pos surface. Editor + workbench + audit now validate through one engine.

8556c74 · db6e772 · 1d6324e · 1bc2583

06-24

Dispatch foundation — typed RunContext + executor registry

A typed RunContext at the GUI boundary and a one-lookup kind→executor registry, replacing inline if kind=="parser" branching.

e3b1322

06-24

Storage unification — JSONL/folders → one SQL `instances` table

Provider-neutral ports (core/storage/ports.py, zero cloud SDK) + local adapter + a unified SqlRecordStore + a JSONL→SQL migrator. A reconciliation gate proved the old SQL store and items.jsonl had diverged at the field level; owner chose SQL-authoritative. Then the live cutover on the LIMS data (md5-verified backup + tag pre-phase5-cutover): 222 rows into instances, images relocated, nouns/ folders retired, and the whole app rewired to read/write through factory.get_record_store().

8c99db9 · bcefc76 · c8dc301 → app-rewire 91a985c…2bc5183

06-25

Backends & hardening — R15 execution + R21 front door

One ExecutionBackend seam with the hardened container as the default (in-process gated behind a flag), a hardened command builder (utils/container_run.py: non-root, --cap-drop=ALL, read-only rootfs + tmpfs, no-new-privileges, pid/mem/cpu caps, --network=none), and a host-gateway artifact broker (path-containment, no-symlink, ext+magic-byte allow-list, size/count caps). Verified end-to-end on real rootless podman. Plus the R21 page-node/module factories + single registry-mounted front door.

5da3c20…32463a7

06-25

Orchestration-engine hardening

Fail-fast/loud startup validation (no silent duplicate-name overwrite), guards fail-closed, the live trigger chain de-silenced + per-handler timeout (an EventDispatcher down-payment), and a shared-singleton mount dedup that collapsed 947→375 route entries with paths unchanged.

39043af…ce6a416

06-25→26

Backend reorganization — passes 1–4 (the "god files")

The big structural sweep, one file at a time, each move proven verbatim. Pass 1: fixed a secretly broken guard (an incidental FastAPI 0.138 upgrade had hidden 311/337 routes), reaped dead code, moved gui/*_gui.py→api/routers/ and gui_main.py→api/app.py, split 6 god files. Pass 2: the other 6 god files + _normalize_for_psycopg×9 deduped + full prefix cleanup. Pass 3: verb-log dedup. Pass 4: the make_compliance_node factory collapsed 13 clones (−5,893 lines) + the archive_workbench split.

38a8481…e3b4aa1 · 6b3ce7b…55040ce · 552acf5 · 3527229…d7d6cee

06-26

R18

Twin-tree Descriptor collapse

Cut the live consumers onto the unified Descriptor and deleted the two legacy twin trees (12 files, −1,032 lines), proven neutral by an old-vs-new equivalence harness before deletion. The backend reorg deferred-list was now empty.

5132ff3

06-27

P11

21 CFR Part 11 compliance core

Replaced the cosmetic unkeyed checksum with a real HMAC tamper-evidence chain (over prev_checksum + chain_seq + signature fields), server-side two-component e-signatures at the append chokepoint, sealed signed exports on every compliance/audit read, a least-privilege gims_compliance_writer DB role, and trusted/validated time. P8 was live-verified on real Postgres 16 — the restricted role is denied UPDATE/DELETE/TRUNCATE by GRANT (SQLSTATE 42501). A 6-agent reconciliation audit then re-verified every phase against live code.

70b3b76…56f6215 · backup-integrity guard 8c96b44

06-27

0/2

Phase 0 + Phase 2 tail — CI, error-contract close-out

Added pyproject.toml (ruff/pyright) + a CI pipeline (pytest + ruff), converted the last 45 HTTPException sites to AppError, audited all 163 silent except blocks (156 intentional, 7 fixed), and fixed 4 latent NameError bugs in untested branches — including a 500 on every successful image upload.

fc5af9e…90baaf3

06-27

3/4/5

Final cohesion / dispatch / storage-ergonomics — backend COMPLETE

The wordtype migration + reader-bypass data-loss fix, the pluggable storage-provider registry, the ExecutionService extraction from the 426-line run_custom_tool (proven by a golden harness + a real container run), and the adjective/adverb descriptor-router collapse. See the companion session report for this day's detail.

9768602…888a295 — 8 commits

03Cumulative churn

Two numbers answer two questions: the net diff (how the tree differs end-to-end) and the cumulative churn (total activity, counting a line re-edited across N commits N times).

Net diff (start → end)

+38,205 ins−41,305 del

560 files · net −3,100 lines. The refactor was net-reductive: god-file splits, the −5,893 compliance-factory collapse, the −1,032 twin-tree collapse and ~770 lines of dead-code reaping removed more than was added.

Cumulative churn (all 152 commits)

+42,693 ins−44,203 del

86,896 lines churned across the branch — the true measure of work. Roughly 572 lines of churn per commit, 533 distinct files touched.

Commits per day

06-24

74 06-25

21 06-26

39 06-27

Highest-churn files (cumulative ins+del)

These are exactly the god files that were split and the deduped engines — high churn here reflects collapse, not feature sprawl.

File (often since split / renamed)	churn
core/core_run_customs.py → `core/run_custom/*`	3,038
api/routers/runlog_workbench.py → package	2,311
api/i_o.py → `api/iostore/*`	1,817
api/routers/nodes_compliance.py → factory + configs	1,764
api/routers/account_roles.py	1,523
gui/archive_workbench_gui.py → `api/routers/archive_workbench/*`	1,498
api/routers/backup.py	1,297
core/run_custom/runner.py	1,263
core/core_audit.py → `core/audit/*`	1,219
api/routers/verb.py	1,100

04What got built

Layered structure

api/routers/ — HTTP/JSON, decorator-defined routes
core/ — pure logic (no cloud SDK; layering-guarded)
nodes/ — orchestration nodes · modules/ — registration
utils/ — the kernel (logger, paths, config, atomic)
No *_gui.py; no core_*/*_module/*_node filename prefixes

Storage

One unified SQL instances table; per-noun tables / items.jsonl / nouns/ retired
Provider-neutral ports + a pluggable provider registry (local/aws built-in, entry-point discovery)
Blobs via ObjectStore; boto3 isolated to api/
Transactional unit-of-work (R4); locked IdService (R1)

Domain & execution

One part-of-speech engine: WordType/Descriptor/WordRegistry, twin trees deleted
Hardened-container execution backend (default) + artifact broker
ExecutionService extracted from the custom-tool monolith

Safety & contracts

21 CFR Part 11: HMAC-chained audit/compliance, two-component e-sign, sealed exports, least-privilege role, trusted time
Uniform AppError error envelope; silent-except audit
CI (pytest + ruff) on every push/PR; ruff-clean tree
Request-scoped EventDispatcher (R7); server-side gate sign-off (R6)

05Risk catalog & Part 11

The refactor was scoped against a risk catalog R1–R21 and a 21 CFR Part 11 parity gap P1–P10. Headline outcomes below; full file:line evidence lives in proposals/gims_project_refactor.md.

Risk catalog — R1–R21 done

R1 locked id service · R4 transactional record store
R6 server-side gate sign-off · R7 request-scoped EventDispatcher
R9 resilient scheduler + run-history · R10 upload hardening
R15 hardened-container execution + artifact broker
R17 atomic archive (DB+FS or rollback) · R18 twin-tree collapse · R19 audit engine

Part 11 — P1–P10 P1–6, 8–10

P1–P3, P5 HMAC tamper-evidence chain + verifier
P4 server-side two-component e-sign at the append chokepoint
P6 auth + sealed signed exports on every read
P8 least-privilege DB role — live-verified on Postgres (GRANT denies writes, SQLSTATE 42501)
P9 trusted/validated time · P10 audit log HMAC-chained too
P7 is the one intentional open-logging deviation (documented)

06How "nothing changed" was proven

The discipline that let 152 commits land without regressions: layered, independent checks, run after every commit.

Structural guards

Ordered-route fingerprint — a hash of every path::methods in registration order; byte-identical for wiring-neutral moves
Three byte-pinned baselines — OpenAPI paths, all-routes, route-order
Function-level AST check — proves a body moved verbatim across a split
Layering guard — core/ never imports a cloud SDK; factory imports stay lazy

Behavioural proof where the suite was thin

Behaviour-golden harnesses — pin endpoint responses / side effects PRE vs POST (compliance envelopes, the custom-tool runner, the descriptor routers)
Equivalence harnesses — old-vs-new call-for-call before deleting legacy trees
Live runs — real hardened-container parser on rootless podman; P8 on real Postgres
Lesson banked: AST proves bodies, not module-scope imports — a full-suite run catches the dropped import

Test baseline grew the whole way

~382 → 406 → 471 → 484 → 534 → 563 passing, with a stable set of pre-existing environment/isolation failures and zero new regressions introduced by the refactor.

07Token / effort estimate (whole refactor)

Not measured — extrapolated. There is no token-accounting tool exposed inside a session, so these are methodology-based estimates with deliberately wide bands. The usage dashboard is the only authoritative source for cumulative spend; /cost gives the per-session truth.

~13–18

work sessions
(several with multi-agent workflows)

~1.5–3M

output tokens
(everything actually written)

~100–300M

total tokens processed
(rough order-of-magnitude)

Why the "processed" total is so large

Per-turn context re-reads. Every turn re-sends the growing conversation (system prompt + all prior messages + tool results). ~13–18 sessions each processing on the order of a few million tokens (mostly cache) already lands around ~75–160M.
Multi-agent fan-outs. The refactor leaned on workflows & audits — a 6-agent reconciliation audit, an 8-agent engine scan, read-only split-map Workflows, per-file split subagents. Each subagent carries its own context, adding tens of millions more.
It's mostly cache. The large majority of "processed" is prompt-cache hits, billed at a fraction of fresh input — so dollar cost is far below what the processed count implies.

Scope	output (written)	total processed	basis
A single focused session (e.g. the 06-27 backend close-out, 8 commits)	~55–80k	~4–8M	measured churn + ~90 turns × growing context
Whole refactor (152 commits)	~1.5–3M	~100–300M	~13–18 sessions scaled + subagent/workflow fan-out

Bands could be off by 2–3×. Treat the whole-refactor figure as "order tens-to-hundreds of millions processed, low-single-digit millions written" — and confirm against the usage dashboard.

08What's left

The backend refactor is complete. The remaining work is front-end only.

Phase 8 — front end: the "Watery" restyle of the 22 gui/components/*.html (only the launcher is done), a real build step, and grid consolidation.
Optional UI follow-ups: signature-meaning dropdown on the reason-sign modal; a viewer for the sealed-export headers.

Backend status: complete

Phases 0–8 (backend), the risk catalog R1–R21, and the 21 CFR Part 11 track are done. handoff.md at the repo root is the canonical, up-to-date record.

01Snapshot

Where it landed

02Timeline — how it unfolded

Foundation — hygiene · shared kernel · error contract

Cohesion core — one part-of-speech engine

Dispatch foundation — typed RunContext + executor registry

Storage unification — JSONL/folders → one SQL instances table

Backends & hardening — R15 execution + R21 front door

Orchestration-engine hardening

Backend reorganization — passes 1–4 (the "god files")

Twin-tree Descriptor collapse

21 CFR Part 11 compliance core

Phase 0 + Phase 2 tail — CI, error-contract close-out

Final cohesion / dispatch / storage-ergonomics — backend COMPLETE

03Cumulative churn

Net diff (start → end)

Cumulative churn (all 152 commits)

Commits per day

Highest-churn files (cumulative ins+del)

04What got built

Layered structure

Storage

Domain & execution

Safety & contracts

05Risk catalog & Part 11

Risk catalog — R1–R21 done

Part 11 — P1–P10 P1–6, 8–10

06How "nothing changed" was proven

Structural guards

Behavioural proof where the suite was thin

Test baseline grew the whole way

07Token / effort estimate (whole refactor)

Why the "processed" total is so large

08What's left

Backend status: complete

Storage unification — JSONL/folders → one SQL `instances` table