Overnight Handoff — Binary DSL Phase 1.5+ (follow-ups + Phase 2)
Baseline: 954550a9 on trunk (Phase 1.4 complete — parser + typecheck + MIR + real codegen + .with {} shipped)
Design doc (canonical): docs/design/BINARY_DSL.md — 335 lines, 12 locked decisions, 5 worked examples
Prior handoffs:
docs/handoff/overnight-binary-dsl-phase-1.md— Phase 1.1-1.3 discoveries D1-D5docs/handoff/overnight-binary-dsl-phase-1-4.md— Phase 1.4 scope
Session status: Phase 1.4 complete. 6 commits on trunk, 61 binary-DSL tests green, fixpoint 2072 functions, smoke clean.
What’s done (this session — Phase 1.4)
| STEP | Commit | Spec | Status |
|---|---|---|---|
| 1 type-name resolution | 4207e4d7 | binary_types_spec.qz (5) | ✅ green |
| 2 method signatures + MIR divert | 642e4a20 | binary_methods_spec.qz (3) | ✅ green |
3 as operator bitcast | 4b5388e1 | binary_bitcast_spec.qz (3) | ✅ green |
| 4 real codegen | 470f3cb1 | (reuses bitcast + methods) | ✅ green |
| 5 .with {} postfix | 954550a9 | binary_with_spec.qz (3) | ✅ green |
| 6 roundtrip coverage | 86cfed2b | binary_roundtrip_spec.qz (4) | ✅ green |
Total: 61 binary-DSL tests (parse 14 + typecheck 19 + mir 10 + types 5 + methods 3 + bitcast 3 + with 3 + roundtrip 4). Fixpoint 2072. Smoke green (brainfuck, style_demo, expr_eval).
What user code can now do
import * from binary
import * from bytes
type PngIhdr = binary {
width: u32be
height: u32be
bit_depth: u8
color_type: u8
compression: u8
filter: u8
interlace: u8
}
packed struct(u32) GpioModer
pin0_mode: 2
# ...16 × u2 fields
pin15_mode: 2
end
# Construct, encode, decode, pattern-match
var h = PngIhdr { width: 1920, height: 1080, bit_depth: 8, color_type: 2,
compression: 0, filter: 0, interlace: 1 }
var bytes = h.encode() # Bytes, 13 wire bytes
match PngIhdr.decode(bytes)
Ok(h2) => puts("width=#{h2.width}") # 1920
Err(_) => puts("decode error")
end
# Packed struct: `as` bitcast + .with {}
var m = GpioModer { pin5_mode: 1, ...} # all fields required
var raw: Int = m as u32 # packed integer
var m2 = raw as GpioModer # back to struct
var tweaked = m.with { pin5_mode = 2, pin7_mode = 1 } # immutable update
What’s NOT yet done — follow-ups
Phase 1.4 gaps (file-and-fill)
These are compiler bugs / missing coverage in the Phase 1.4 codegen, not future-phase features. Each has a short description + minimal repro.
-
Straddling sub-byte fields (IPv4
frag_off: u13be). Fields whose bit width crosses a byte boundary (e.g., a 13-bit field starting at bit offset 51) aren’t yet packed/unpacked.cg_emit_binary_packandcg_emit_binary_unpackincg_intrinsic_binary.qzemit a comment marker but no code. IPv4Header can’t round-trip until this lands.Repro:
type IPv4Mini = binary { flags: u3 frag_off: u13be # straddles bytes 6..7 } # encode/decode produces garbage values for frag_off.Implementation sketch: for sub-byte fields where
bit_in_byte + width > 8, emit a loop (or unrolled chunked shifts) that writes/reads MSB-first bit by bit across the N involved bytes. BE vs LE matters here. -
Variable-width fields (bytes / bytes(n) / cstring / pstring(uN) / arrays).
tc_register_binary_block_defmaps these toIntplaceholders; codegen emits a comment and returns 0. Blocks IPv4Header’spayload: bytes, PE/ELF chunk parsers, and DNS/TLS record parsers.Implementation path:
_tc_bin_field_annotationin typecheck.qz: mapbytes/bytes(n)toBytes,cstring/pstring(...)toString,[T; n]/[T; field]/[T]toVec<T>.cg_intrinsic_binary.qz: PACK walks the struct field, reads the Bytes/String/Vec handle, appends its bytes to the output buffer (after pstring-length-prefix or cstring-terminator emission). UNPACK slices from the input buffer and constructs the handle.- Phase 9 of the design (zero-copy rest-of-stream) —
Bytesgets an owned/borrowed flag for reference-into-input semantics. Design line 332-335.
-
UnexpectedEof bounds checks in UNPACK. Current
cg_emit_binary_unpackloads from the Bytes data pointer without checking the buffer length. Missing bytes at the end silently read zeros instead of returningErr(ParseError::UnexpectedEof). Fix: at the top of the emitter,if bytes.size() < expected_bytes return Result::Err(ParseError::UnexpectedEof). -
Float fields (
f32le/be,f64le/be). Width info parsed correctly in_cg_bin_parse_width_info, but the pack/unpack emitters treat them as integers. None of the 5 worked examples uses floats, so low priority — but the spec says these are valid Phase 1 types. -
Packed struct registration is too lax at typecheck.
tc_expr_type_castacceptsfoo as u32for any struct, not only packed ones, and doesn’t verify thattarget_backing == declared_backing. MIR lowering does the right thing by checkingmir_find_binary_layout, but the typecheck should rejectregular_struct as u32with a friendly error (design decision #12 saysasis “strict, compile-time- checked”). -
.with {}doesn’t validate field names. Typecheck’s NODE_BINARY_WITH branch only visits the receiver and each value expression — it doesn’t verify that the named fields exist on the packed struct. Typos likem.with { pinXYZ = 1 }silently do nothing (the MIR lowering loop just never matches the name). Fix: in typecheck_walk, after resolving the receiver type, iterate the children and check eachstr1against the struct’s field list.
Phase 2 — Bidirectionality + missing semantics
Per docs/design/BINARY_DSL.md Phasing section:
-
Computed fields.
value: u16be = checksum(payload)— declarative derivation. Common in TCP/UDP/IP/PNG/gzip. Needs typecheck recognition of the= <expr>suffix after a type in binary blocks, then a codegen pass that evaluates the expression before encoding (PACK) and skips it on decode (UNPACK validates it matches). -
Discriminated unions inside binary blocks.
matchon a discriminator field to pick a variant layout. Required for TCP options, ELF sections, PE chunks, USB descriptors. -
UTF-8-aware string types.
utf8(n),pstring_utf8(uN)— codepoint validation at parse time. -
Versioning / multi-format dispatch. Composable from discriminated unions above.
-
Per-field
lsbannotation. Design decision #6 reserves it; no current consumer needs it. STM32 GPIO MODER on real hardware would want this, but the spec is MSB-first by default. -
Bijection enforcement —
unpack(pack(x)) == xproven structurally (Nail-style, no SAT solver). Needs Phase 1 stable first (we’re close now modulo the gaps above).
Phase 3 — Dogfood
Migrate compiler internals to the DSL:
cg_intrinsic_intmap.qz’s manualgetelementptrloads →IntMapHeader.decode().- Channel layout, Future state-machine frame layout, MIR-instruction encoding, AST-node layout.
Gates on Phase 1.4 gap #2 (variable-width) + Phase 2 computed fields being stable. The IntMapHeader roundtrip test already passes — a direct dogfood of just that header is the cleanest first target.
Discoveries — Phase 1.4 session notes
(Append D6-D10 to the original D1-D5 discoveries in phase-1.md.)
D6 — Binary blocks register as structs under resolver tag 13, packed structs under tag 14
Chosen over reusing RESOLVE_TAG_TYPE_ALIAS (6) because tc_register_type_alias_def
expects the alias’s target string in str2, but NODE_BINARY_BLOCK’s str2 is unused
— the fields live in the children vector. Treating them as struct-like types
(new tags + tc_register_binary_block_def / tc_register_packed_struct_def
that synthesize a tc_register_struct call under the hood) gets us struct
literal construction, field access, and pattern-match destructuring for free.
Parallel vectors struct_dsl_kind / struct_dsl_backing in TcRegistry
tag which structs are actually binary-DSL types and remember the backing
width for packed ones.
D7 — Method synthesis runs in a dedicated Phase 4.0f after all types are registered
tc_synth_binary_block_methods registers TypeName$encode /
TypeName$decode function signatures via tc_register_function. It
needs Bytes and ParseError in scope to produce accurate return
types — Phase 4.0a (when binary blocks themselves register) is too
early. Put it right after the global-var registration (Phase 4.0e) and
before the user-function signature registration (Phase 4.1). Falls back
to TYPE_INT if Bytes/ParseError aren’t imported.
D8 — MIR diversion at the CALL node, not at a new opcode
mir_lower_call detects TypeName$encode / TypeName$decode by
checking if the part before the last $ matches a registered binary
layout (mir_find_binary_layout). If so, emits MIR_BINARY_PACK /
MIR_BINARY_UNPACK directly instead of chasing a non-existent function
body. This keeps the synthesized function signatures purely a typecheck
concern — no fake mir_lower_function_body to generate.
D9 — as operator is infix postfix at the parser level
ps_parse_postfix picks up expr as IDENT between the ! unwrap
branch and the terminating else break. Result is a new
NODE_TYPE_CAST (97) node: left = source, str1 = target type name.
MIR lowering tries to find a registered binary layout for either the
target (integer-to-struct direction) or the inferred source type
(struct-to-integer), falling through to a passthrough if neither
matches. Typecheck is deliberately permissive — see follow-up #5.
D10 — .with {} lowers to struct clone, not integer round-trip
value.with { field = expr } allocates a new struct of the same field
count, then either stores the override expression or copies the
corresponding slot from the receiver per field. No integer packing
involved — the value stays in struct-of-Int representation until an
explicit as uN converts it. Let MIR_PACKED_BITCAST handle the
integer boundary; .with is pure struct surgery.
Pointers for the next session
cg_intrinsic_binary.qzis ~720 lines. The pack/unpack emitters are straight-line byte stores with adata_regstring fixed at entry, making byte-offset arithmetic readable. Extending to straddles is adding a third branch (neither byte-aligned nor single-byte).std/binary.qzalready exports ParseError with the variantsUnexpectedEof,InvalidValue(field, expected, got),LengthOverflow(field, declared, remaining). Use those exact variants when wiring follow-up #3 — the 1.2 typecheck tests pattern-match against them._tc_bin_parse_numeric_widthhandles u/i/f<N>[le|be]generically for any N in 1..64._cg_bin_parse_width_infoexposes all (width, float, signed, le, has_endian) fields for the codegen.- The
; === Binary DSL Layouts ===IR manifest from STEP 1.3 is still emitted — keep it.binary_mir_spec.qztests assert on it.
Safety reminders (same as prior session — verify)
- Quake guard before every commit touching
self-hosted/*.qz. - Smoke tests after every guard. brainfuck / style_demo / expr_eval.
- Fix-specific backup exists at
self-hosted/bin/backups/quartz-pre-binary-codegen-golden. Keep until all follow-ups are done; next session can overwrite with a freshquartz-pre-binary-phase2-goldenwhen starting new risky work. - Never
--no-verify. If pre-commit fails, fix the real issue. - Never compromise design under context pressure. All 6 STEPs in this session shipped or were explicitly scoped — nothing half-shipped.
Test status summary
| File | Tests | Status |
|---|---|---|
| binary_parse_spec.qz | 14 | 🟢 green |
| binary_typecheck_spec.qz | 19 | 🟢 green |
| binary_mir_spec.qz | 10 | 🟢 green |
| binary_types_spec.qz | 5 | 🟢 green |
| binary_methods_spec.qz | 3 | 🟢 green |
| binary_bitcast_spec.qz | 3 | 🟢 green |
| binary_roundtrip_spec.qz | 4 | 🟢 green |
| binary_with_spec.qz | 3 | 🟢 green |
| Total | 61 | 🟢 all green |
Full QSpec suite NOT run from Claude Code (CLAUDE.md protocol). Run
./self-hosted/bin/quake qspec in a terminal before calling Phase 1
truly complete to catch any cross-spec regressions.