Skip to content

sec(parser): BoundedSegment + audit trail for untrusted asSlice#27

Open
dfa1 wants to merge 6 commits into
mainfrom
worktree-asslice-audit
Open

sec(parser): BoundedSegment + audit trail for untrusted asSlice#27
dfa1 wants to merge 6 commits into
mainfrom
worktree-asslice-audit

Conversation

@dfa1

@dfa1 dfa1 commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Replaces every untrusted MemorySegment.asSlice call in the file-open and decode path with a typed BoundedSegment wrapper, so malformed input throws VortexException (the documented SECURITY.md contract) rather than IndexOutOfBoundsException.

The win is not "no MemorySegment ever exists" — it's every trust transfer is grep-discoverable:

$ git grep unwrapForSubParser

35 hits, one per encoder/inspector boundary. Each entry documents "this code consumes a bounded buffer." Future regressions get caught in code review (or by a Checkstyle ban on raw .asSlice( outside MemorySegments.java / BoundedSegment.java — see follow-ups).

Design

BoundedSegment (a record holding MemorySegment + context label) exposes only:

  • slice(off, len, childContext) — bounds-checked, throws VortexException on bad input
  • getByte/getIntLE/getLongLE(off) — bounds-checked primitive reads
  • asByteBufferLE() — for FlatBuffer parsers (which validate offsets independently)
  • unwrapForSubParser(reason) — the audited escape hatch. Every call documents the trust transfer and surfaces reason in error messages.

MemorySegments.slice / .checkRange remain as the implementation detail that BoundedSegment delegates to. Prior art for this pattern: Lucene IndexInput.slice (LUCENE-5827), Trail of Bits bounded_buffer, C++ gsl::span, Rust &[u8] / bytes::Bytes. The JDK FFM team has not formally recommended it; this is the first articulation of the pattern for MemorySegment.

Commits (5)

  1. b6a9c24 — original MemorySegments static helper (covers the immediate asSlice audit)
  2. b7d583a — Phase 1: introduce BoundedSegment at the mmap boundary (VortexReader.parse, Trailer.parse, PostscriptParser.parse)
  3. 413edb8 — Phase 2: thread through DecodeContext.buffer() and FlatSegmentDecoder (14 encoder consumers + 71 test sites migrated)
  4. 4352be0 — Phase 3: VortexHandle.slice signature → BoundedSegment; ScanIterator/InspectorTree/TUI/integration-test callers updated
  5. a9a5189 — Phase 4: VortexReader internals migrated; MemorySegments becomes implementation detail of BoundedSegment

Out of scope (follow-ups)

Eliminating the remaining 35 unwrapForSubParser sites would require:

  • Migrating core/array (BoolArray, IntArray, etc.) to hold BoundedSegment — broad blast radius
  • Migrating ProtoReader + every generated proto record decode(MemorySegment, ...) factory
  • Adding hot-loop primitive readers on BoundedSegment for encoder buffer math
  • Wrapping ZstdEncoding.toArray() + similar

Each is mechanically straightforward but expensive (~7-10 hours total). Deferred — the current audit trail is the actionable property; deeper migration is purity for purity's sake at alpha stage.

Test plan

  • ./mvnw verify — 938 unit + 243 integration tests pass (incl. Rust round-trip suite)
  • MemorySegmentsTest (7 cases) + BoundedSegmentTest (4 cases) cover in-range, out-of-range, negative offset/length, overflow-prone Long.MAX_VALUE-1, primitive reads, slice context propagation, unwrap identity
  • Reviewer: confirm the "left as unwrapForSubParser" categories are actually trust-bounded buffers (FlatSegmentDecoder slices them before they reach encoders)

🤖 Generated with Claude Code

dfa1 and others added 5 commits June 10, 2026 21:20
…ents helper

SECURITY.md promises that malformed input throws VortexException, never an
IndexOutOfBoundsException or other unchecked JDK exception. Several call
sites in the file-open and scan paths violated this when fed adversarial
offsets/lengths read from the on-disk schema:

- VortexReader.parse: file shorter than the 8-byte trailer made
  `bodyBytes = size - TRAILER_SIZE` negative, then `seg.asSlice(negative, 8)`
  threw IOOBE.
- VortexReader.parse: `trailer.postscriptLen()` greater than the body made
  `postscriptOffset` negative, same outcome.
- VortexReader.slice (public, exposed via VortexHandle) forwarded any
  caller-supplied offset/length straight to asSlice with no bounds check.
- VortexReader.readFlatStats / ScanIterator.readFlatStats: segment offsets
  taken from footer.segmentSpecs() were sliced unchecked.
- VortexHttpReader: trailer + postscript offsets in the prefetched HTTP
  tail were sliced unchecked.
- FlatSegmentDecoder: per-buffer dataOffset accumulated from
  attacker-controlled `Buffer.padding()` and `Buffer.length()` was sliced
  unchecked.
- PostscriptParser.slice: local helper called asSlice without converting
  IOOBE to VortexException (checkBlobBounds already runs upstream, but the
  helper should defend the same contract on its own).

Introduces `io.github.dfa1.vortex.core.MemorySegments.slice(MemorySegment,
long, long, String)`. The static helper throws VortexException — with the
caller-supplied context label baked into the message — for negative
offsets, negative lengths, and overflow-prone offset+length combinations.
Replaces eight asSlice call sites across reader/scan/encoding.

Each surviving raw asSlice call now operates only on already-validated
internal offsets (already-decoded array buffer projection in VarBinArray,
ArraySegments slicing of post-decode buffers in ScanIterator's
narrowToRows, ProtoReader's own bounds-checked cursor, the constant-offset
magic check in Trailer, and the decoded-output buffer in ZstdEncoding).
A Checkstyle ban on raw asSlice in io/scan/encoding packages is a planned
follow-up so future regressions are caught at build time.

MemorySegmentsTest covers in-range slicing, the zero-length-at-end edge,
negative offsets, negative lengths, offset+length past segment size,
length alone larger than the segment, and an overflow-prone
(Long.MAX_VALUE - 1) + 100 input that would defeat a naive
`off + len > segSize` check.

./mvnw verify — all unit + integration tests pass (incl. Rust round-trips).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hase 1/4]

Adds io.github.dfa1.vortex.core.BoundedSegment, a record wrapping a
MemorySegment with a context label. The type does not expose
MemorySegment.asSlice; the only way to derive a sub-region is
BoundedSegment.slice(off, len, childContext), which routes through
MemorySegments.checkRange and throws VortexException — not
IndexOutOfBoundsException — on malformed input.

Raw access is exposed only through unwrapForSubParser(String reason),
which both documents the trust transfer and surfaces the reason string in
any propagated error, so trust transfers are greppable for audit.

Phase 1 covers the file-open boundary:

- VortexReader.parse: wraps the mmap'd segment as
  `new BoundedSegment(seg, "vortex file")` immediately after `channel.map`.
  Trailer + postscript slices go through `file.slice(off, len, ctx)`.
- Trailer.parse: signature changed from (MemorySegment, long) to
  (BoundedSegment, long). The single internal `asSlice` on the magic
  bytes uses `unwrapForSubParser("trailer parser")` since the offset is
  a compile-time constant (4) on a segment whose length has already been
  validated to 8 bytes.
- PostscriptParser.parse: signature changed from
  (ByteBuffer, MemorySegment, long) to (ByteBuffer, BoundedSegment). The
  local `slice` helper goes away; callers use `file.slice(...).asByteBufferLE()`.
  Explicit checkBlobBounds calls are retained so the per-blob error
  message ("postscript footer blob out of bounds") is more specific than
  BoundedSegment's generic "vortex file" parent context.
- VortexHttpReader: tail array now wrapped as
  `BoundedSegment(MemorySegment.ofArray(tail), "http tail")` so the
  changed Trailer.parse signature is satisfied. Full HTTP-aware
  refactor lands in Phase 3.

MemorySegments.slice is refactored to call a new
MemorySegments.checkRange helper, which BoundedSegment's primitive
readers (getByte, getIntLE, getLongLE) reuse to share the same
bounds-check path without producing a slice.

VortexHandle.slice (the public interface method) is NOT touched in
this phase. Its signature change to return BoundedSegment lands in
Phase 2 alongside the DecodeContext.buffer migration.

BoundedSegmentTest covers in-range slice, bad slice carrying parent
context label, bounds-checked primitive reads, and unwrapForSubParser
identity. The 7 MemorySegmentsTest cases from PR #27 still pass.

./mvnw verify — all unit + integration tests pass (incl. Rust round-trips).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ase 2/4]

DecodeContext.segmentBuffers becomes BoundedSegment[]; DecodeContext.buffer(i)
returns BoundedSegment instead of MemorySegment. The encoded-buffer arithmetic
in FlatSegmentDecoder now wraps the parent flat segment as BoundedSegment and
slices each per-buffer region with a "encoded buffer i" context label, so a
crafted Buffer.padding()/Buffer.length() pair in the FlatBuffer schema throws
VortexException at slice time rather than IndexOutOfBoundsException at first
read.

Encoder consumers (14 files, ~25 call sites) keep their existing MemorySegment-
based decode logic and add an explicit `.unwrapForSubParser("<encoding> encoding")`
at each `ctx.buffer(i)` site. Trust transfers are greppable: `git grep
unwrapForSubParser` lists every encoder boundary.

A new convenience factory `DecodeContext.ofRawBuffers(...)` wraps raw
MemorySegment arrays as BoundedSegments for synthetic test inputs that produce
their own trusted buffers; the 71 test sites that previously called
`new DecodeContext(..., new MemorySegment[]{buf}, ...)` migrated mechanically
to `DecodeContext.ofRawBuffers(..., new MemorySegment[]{buf}, ...)`. Production
callers (only FlatSegmentDecoder) keep using the canonical constructor with
already-bounded buffers.

DictEncoding.java:381 was the one site `ctx.buffer(i)` did not cover —
it indexed `ctx.segmentBuffers()[...]` directly. That site now calls
`.unwrapForSubParser("dict encoding values")` for symmetry.

VortexHandle.slice (the public deprecated escape hatch) is NOT touched in
this phase; it still returns MemorySegment. Phase 3 covers VortexHttpReader's
HTTP-aware wrapping and ScanIterator.readFlatStats.

./mvnw verify — all 938 unit + 243 integration tests pass (incl. Rust round-trips).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e 3/4]

VortexHandle.slice (the internal escape hatch used by ScanIterator,
InspectorTree, VortexInspectorTui, and integration-test code) now returns
BoundedSegment instead of MemorySegment. Callers that genuinely need raw
MemorySegment access add an explicit `.unwrapForSubParser("<purpose>")` at
the call site, making every cross-package escape-hatch usage greppable for
audit.

Sites migrated:
- VortexReader.slice: wraps the file-segment slice as BoundedSegment with
  context "file slice".
- VortexHttpReader.slice: wraps the freshly-fetched HTTP range as
  BoundedSegment with context "http range <off>..<end>" so error messages
  attribute bounds-check failures to the specific HTTP request.
- ScanIterator.readFlat: `.unwrapForSubParser("flat segment decoder")`
  before handing off to FlatSegmentDecoder.decode (which still takes
  MemorySegment; full FlatSegmentDecoder migration would be a separate
  follow-up).
- ScanIterator.readFlatStats: keeps a BoundedSegment for the parent stats
  region, slices the stats flatbuffer via `statsRegion.slice(off, len,
  "stats flatbuffer").asByteBufferLE()` instead of the raw
  MemorySegments.slice helper.
- InspectorTree.peekFlat / collectEncodingsAndStats: unwrap with
  "inspector flat segment decoder" before peekFlatRoot.
- VortexInspectorTui.previewSegment / hexPeekSegment: unwrap with
  "inspector tui flat segment" and "inspector tui hex peek".
- PcoFixtureInspectionIntegrationTest.walkLayoutInner: unwrap with
  "integration test inspector".

The slice method is still marked @deprecated(forRemoval = true) on
VortexHandle; the long-term direction is to remove cross-package raw
segment access entirely and route everything through Scan + typed
accessors. That demolition happens in Phase 4.

./mvnw verify — all unit + integration tests pass (incl. Rust round-trips).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…4/4]

VortexReader.slice and VortexReader.readFlatStats — the last two
file-open / scan-adjacent sites still calling MemorySegments.slice
directly — now construct a BoundedSegment from fileSegment and slice
through it:

- VortexReader.slice: returns `new BoundedSegment(fileSegment, "vortex
  file").slice(offset, length, "file slice")`. The intermediate label
  flows into VortexHandle.slice's BoundedSegment return value, so a
  downstream `.slice(off, len, "...")` on it carries both parent context
  ("file slice") and the child label in error messages.
- VortexReader.readFlatStats: builds a `statsRegion` BoundedSegment for
  the per-flat stats slice, then uses `statsRegion.getIntLE(...)` for the
  trailing fbLen read and `statsRegion.slice(fbStart, fbLen, "stats
  flatbuffer").asByteBufferLE()` for the FlatBuffer payload. The
  bounds-check helper calls (MemorySegments.slice / .checkRange) are no
  longer reachable from reader-layer code.

MemorySegments class javadoc updated to describe its new role as the
implementation detail behind BoundedSegment — application code should
prefer BoundedSegment, and direct MemorySegments use is reserved for
constructing a BoundedSegment from a raw segment at the mmap boundary.

Phase 4 closeout — the remaining unwrapForSubParser sites (35) are all
in encoders that consume `ctx.buffer(i)` and pass to sub-parsers
(ProtoReader, FlatBuffer runtime) which take raw MemorySegment. Future
work to drop those would mean adapting ProtoReader to take BoundedSegment
directly (it already does its own bounds-checked cursor internally) and
adding a BoundedSegment-aware FlatBuffer helper. Both are mechanically
straightforward but out of scope here; the audit point — every trust
transfer is greppable via unwrapForSubParser — is satisfied.

./mvnw verify — all unit + integration tests pass (incl. Rust round-trips).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfa1 dfa1 changed the title sec(parser): wrap untrusted asSlice calls in MemorySegments helper sec(parser): BoundedSegment + audit trail for untrusted asSlice Jun 10, 2026
@dfa1

dfa1 commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

Audit: unwrapForSubParser call sites (33 total)

git grep unwrapForSubParser enumerates every trust transfer. Listed here for review traceability — each entry is a place where a BoundedSegment is converted to a raw MemorySegment because the downstream API does not accept the typed wrapper. Future removal is purely a matter of migrating those downstream APIs.

encoders → MemorySegment-typed array constructors or buffer math (24)

File Line Reason Downstream
BitpackedEncoding.java 273 bitpacked encoding fastlanes unpack into pre-allocated output segment
BoolEncoding.java 71 bool encoding BoolArray stores MemorySegment
ByteBoolEncoding.java 47 bytebool encoding byte-level scan + BoolArray construction
ConstantEncoding.java 136 constant encoding ProtoReader(MemorySegment, ...) for scalar payload
DecimalEncoding.java 112 decimal encoding per-row decimal byte parsing
DictEncoding.java 381 dict encoding values dict-values direct buffer indexing
DictEncoding.java 438 dict encoding dict-bytes for VarBinArray
DictEncoding.java 439 dict encoding dict-offsets for VarBinArray
DictEncoding.java 440 dict encoding codes for child decode
FsstEncoding.java 209 fsst encoding FSST symbol table (8B per symbol)
FsstEncoding.java 210 fsst encoding FSST symbol lengths (1B per symbol)
FsstEncoding.java 211 fsst encoding FSST-compressed heap
PcoEncoding.java 144 pco encoding chunk metadata ProtoReader
PcoEncoding.java 163, 189, 205, 228 pco encoding per-page Pco decode
PrimitiveEncoding.java 315 primitive encoding MemorySegment.copy into output
Registry.java 80 registry Registry.decodeAsSegment adapter
SparseEncoding.java 213 sparse encoding fill-value ProtoReader
VarBinEncoding.java 143 varbin encoding VarBinArray constructor
VarBinViewEncoding.java 123 varbinview encoding views buffer for VarBinViewArray
VarBinViewEncoding.java 126 varbinview encoding per-data-buffer
ZstdEncoding.java 317 zstd encoding .toArray() for native Zstd dict
ZstdEncoding.java 322 zstd encoding .toArray() for native Zstd frame
ZstdEncoding.java 351 zstd encoding frame slice for ZstdJavaDecompressor

scan / inspector / TUI → FlatSegmentDecoder.decode(MemorySegment, ...) (7)

File Line Reason
ScanIterator.java 483 flat segment decoder
ScanIterator.java 637 stats segment fbLen read
InspectorTree.java 150 inspector flat segment decoder
InspectorTree.java 205 inspector flat segment decoder
VortexInspectorTui.java 574 inspector tui flat segment
VortexInspectorTui.java 763 inspector tui hex peek
Trailer.java 33 trailer parser (constant-offset magic read)

integration test (1)

File Line Reason
PcoFixtureInspectionIntegrationTest.java 104 integration test inspector

Removal cost map (follow-up PRs)

  • FlatSegmentDecoder.decode takes BoundedSegment → kills 6 of the scan/inspector/TUI unwraps. ~30 min.
  • Trailer.parse uses getByte family instead of asSlice → kills 1. ~5 min.
  • ProtoReader(BoundedSegment, long, long) + regenerated proto records → kills the 7 ProtoReader-bound unwraps (Constant, Pco×5, Sparse). Requires CodeGen update + 42 file regen. ~1 hour.
  • core/array hierarchy holds BoundedSegment → kills the ~12 Array-constructor unwraps. Broad blast radius. ~3-5 hours.
  • BoundedSegment.toArray() proxy → kills 2 Zstd unwraps. ~10 min.

Easy wins (FlatSegmentDecoder + Trailer + Zstd proxy) cumulatively eliminate 9 unwraps in under 1 hour. The remaining 24 require the deeper migrations.

@dfa1

dfa1 commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Tradeoffs

What we gain

  1. SECURITY.md contract is honored at the read-side entry points. Before this PR, opening a file with a malformed trailer or postscript threw IndexOutOfBoundsException (and similar) — bypassing every catch (VortexException) in caller code. After, malformed input at every wrapped site throws VortexException carrying a context label naming the on-disk structure that was being parsed.
  2. Audit trail by construction. git grep unwrapForSubParser lists every trust transfer the codebase performs. Future PRs that add a new sub-parser or decoder must either accept BoundedSegment or insert a labelled unwrapForSubParser("reason") — and that line shows up in code review.
  3. Error messages name the structure, not the offset. "malformed trailer: offset+length 0+8 exceeds segment size 4" is more debuggable than the previous IndexOutOfBoundsException: ... for a corrupted 4-byte truncated file.

What we pay

  1. API churn. VortexHandle.slice signature changed from returning MemorySegment to BoundedSegment. Even though it's @Deprecated(forRemoval = true), anyone using it has to add .unwrapForSubParser(...) at the call site. Inspector, TUI, and integration-test callers all needed an update in this PR. External users — none yet, but a real cost once there are.
  2. 33 ugly unwraps in encoder code. Each ctx.buffer(i).unwrapForSubParser("foo encoding") is more verbose than ctx.buffer(i). The verbosity is the point (audit trail), but a maintainer who doesn't know the design will read it as noise and consider deleting it.
  3. Conceptual gap with "pure" parse-don't-validate. The pattern is not affine like Rust &[u8]. BoundedSegment.unwrapForSubParser returns the raw segment; nothing stops a caller from calling seg.asSlice(badOff, badLen) on the unwrapped result. The enforcement is review-time, not compile-time. A genuinely complete version would (a) remove unwrapForSubParser and migrate core/array, ProtoReader, FlatBuffer integration to BoundedSegment (~7–10 hours of work) or (b) layer Checker Framework / ErrorProne on top.
  4. Test-code adapter. The 71 test sites that built a DecodeContext directly now have to route through DecodeContext.ofRawBuffers(...). Mechanical mapping today; one more thing for a contributor to learn tomorrow.
  5. Two paths in the codebase. MemorySegments (static helper) still exists as the implementation detail behind BoundedSegment. Some surviving direct uses are documented and intentional, but a reader has to know which to reach for. The doc on MemorySegments covers this; still a paper cut.

Perf

Negligible. BoundedSegment.slice is a record allocation + MemorySegments.slice (one bounds check + one MemorySegment.asSlice). JIT inlines through. Hot read paths (decode loops) operate on the unwrapped MemorySegment once unwrapForSubParser runs, so the inner loops are byte-identical to pre-PR. ./mvnw verify integration suite — all 938 unit + 243 integration tests pass, no regression observed in the integration test wallclock.

When this design is the wrong fit

  • A contributor proposes adding a 34th unwrapForSubParser site for a brand-new encoder that handles a different layout shape. The audit trail logic stops scaling once the list is long enough that nobody reads it.
  • The lib goes past alpha and gains external users. Then VortexHandle.slice returning BoundedSegment is a breaking change for them; promoting BoundedSegment to public API has a documentation cost.
  • Someone runs JFR and finds BoundedSegment.slice allocations show up in a hot decode loop. Then escape analysis isn't doing what we assumed and the record needs to become a primitive value type (Valhalla) or a static helper after all.

Net

For an alpha-stage library where the threat model is "untrusted columnar files from S3 / HTTP" and the contract is documented in SECURITY.md, the audit-trail property is worth the verbosity. The remaining 33 unwraps are documented (see previous comment) with concrete removal costs; they can be picked off opportunistically in future PRs rather than holding this one back.

After Phase 4 nothing outside BoundedSegment called MemorySegments.slice
or MemorySegments.checkRange any more, so the static helper was a
one-line proxy used only by BoundedSegment itself — two parallel APIs
saying the same thing.

Moves the bounds-check logic directly into BoundedSegment as a private
instance method that closes over `context`, so each call no longer
re-passes the label. slice now does the check inline and calls
seg.asSlice directly; getByte/getIntLE/getLongLE share the same private
checkRange.

Deletes MemorySegments.java and MemorySegmentsTest.java. The seven test
cases that lived in MemorySegmentsTest (zero-length-at-end, negative
offset, negative length, len > segSize, overflow-prone
Long.MAX_VALUE - 1 + 100) were absorbed into BoundedSegmentTest so the
coverage doesn't regress.

Net: one type, one API, ~93 lines fewer code. No behavioural change;
./mvnw verify — all unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dfa1 added a commit that referenced this pull request Jun 11, 2026
The 33 `unwrapForSubParser` audit-trail sites in PR #27 are a strong
signal that the current module layout is wrong: `core` hosts the read
runtime, not just the data model. `reader` is a thin orchestration
shell around `core`; cross-module byte hand-offs force public
escape-hatch APIs (`VortexHandle.slice`, `Registry.decodeAsSegment`)
that no architectural wrapper can really cure.

Proposes splitting `Encoding` into separate read and write interfaces
and moving the corresponding runtime into `reader` / `writer`, leaving
`core` as the model-only module it was originally intended to be.
PR #27's escape hatches collapse as a side effect of co-locating the
decoder runtime with the byte source.

Includes:
- Root-cause analysis of the current `core` layout (5 concrete smells)
- Target module split with the `Array` hierarchy moving to `reader`
  (writer only touches `NullableData`, confirmed by audit) — only
  `NullableData` and `BoundedSegment` stay in `core` proper
- Registry split into distinct `ReadRegistry` and `WriteRegistry` types
  passed alongside the entry points (not folded into options records)
  — read-only callers never pull the `writer` module onto their
  classpath
- 6-phase migration plan with effort estimates (~9 person-days plus
  CI fallout)
- Consequences (positive and negative) and risks-to-manage
- Rejected alternatives (JPMS hide, move FlatSegmentDecoder alone,
  adopt Arrow codec SPI, status-quo + docs)

Status: Proposed. No code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dfa1 added a commit that referenced this pull request Jun 11, 2026
The 33 `unwrapForSubParser` audit-trail sites in PR #27 are a strong
signal that the current module layout is wrong: `core` hosts the read
runtime, not just the data model. `reader` is a thin orchestration
shell around `core`; cross-module byte hand-offs force public
escape-hatch APIs (`VortexHandle.slice`, `Registry.decodeAsSegment`)
that no architectural wrapper can really cure.

Proposes splitting `Encoding` into separate read and write interfaces
and moving the corresponding runtime into `reader` / `writer`, leaving
`core` as the model-only module it was originally intended to be.
PR #27's escape hatches collapse as a side effect of co-locating the
decoder runtime with the byte source.

Includes:
- Root-cause analysis of the current `core` layout (5 concrete smells)
- Target module split with the `Array` hierarchy moving to `reader`
  (writer only touches `NullableData`, confirmed by audit) — only
  `NullableData` and `BoundedSegment` stay in `core` proper
- Registry split into distinct `ReadRegistry` and `WriteRegistry` types
  passed alongside the entry points (not folded into options records)
  — read-only callers never pull the `writer` module onto their
  classpath
- 6-phase migration plan with effort estimates (~9 person-days plus
  CI fallout)
- Consequences (positive and negative) and risks-to-manage
- Rejected alternatives (JPMS hide, move FlatSegmentDecoder alone,
  adopt Arrow codec SPI, status-quo + docs)

Status: Proposed. No code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant