Quartz v5.25

Overnight Handoff — Binary DSL Phase 1 Foundation

Baseline: b06abab6 (trunk, design doc landed, fixpoint 2026 functions, guard valid) Design doc (canonical): docs/design/BINARY_DSL.md — 335 lines, 12 locked decisions, 5 worked examples Estimated effort: Phase 1 is ~1500 lines total. Realistic overnight scope: parser + typecheck + MIR opcodes (sub-phases 1.1, 1.2, 1.3). Codegen (1.4) is multi-session and should be left for the next handoff. Session goal: Land the parser/AST surface, the typecheck pass, and the MIR opcodes with stub codegen returning a clear “not yet implemented” error. The compiler must still build and fixpoint after every commit.


Copy-paste handoff prompt (paste this into a fresh session)

Read `docs/handoff/overnight-binary-dsl-phase-1.md` and execute the work
described. The canonical design is `docs/design/BINARY_DSL.md` (335 lines,
all 12 design decisions locked). Do not re-litigate design choices —
implement what's specified.

Target: complete sub-phases 1.1 (parser+AST), 1.2 (typecheck), 1.3 (MIR
opcodes) with stub codegen. Phase 1.4 (real codegen) is explicitly out of
scope for tonight; leave a clean handoff for the next session.

Workflow per sub-phase:
1. Write QSpec tests FIRST (red phase) — they should fail in a specific way.
2. Implement the minimum to make them pass (green phase).
3. Run quake guard before EVERY commit. Never skip. Never use --no-verify.
4. Commit each sub-phase as a single coherent commit.
5. After each commit: run smoke tests (brainfuck + style_demo + expr_eval).
   Stop and recover if any regress.

Prime Directives v2 compact:
1. Pick highest-impact, not easiest.
2. Design before building; we already did this — see BINARY_DSL.md.
3. Pragmatism = sequencing correctly. Shortcut = wrong thing because right
   is hard. Name the path.
4. Work spans sessions. Don't compromise design because context is ending.
5. Report reality. No weasel words. If a sub-phase is half-done, say so.
6. Holes get filled or filed (in this doc's "Discoveries" section).
7. Delete freely. Pre-launch, zero users.
8. Binary discipline: quake guard mandatory, fixpoint not optional, smoke
   tests not optional after guard, fix-specific backups before risky work.
9. Quartz-time estimation: traditional ÷ 4.
10. Corrections are calibration, not conflict.

Stop conditions:
- Sub-phase 1.3 complete with all tests green and fixpoint stable → done,
  write handoff for next session covering 1.4 codegen.
- Get blocked on a real compiler bug → file in this doc's "Discoveries"
  section with minimal repro, commit what works so far, write handoff.
- Approaching context limit → stop at the next clean commit boundary,
  write handoff for next session.

Tree is clean at b06abab6. Guard stamp valid. Smoke green.
Save the session-wide backup BEFORE any compiler source changes:
  cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-binary-dsl-golden

The implementation plan

Sub-phase 1.1 — Parser + AST (target: 1 commit, 300-500 lines)

Goal: Surface syntax compiles to AST. No semantics yet — the typechecker will reject everything.

Scope:

  • Lexer: tokenize new keywords (binary, packed, as if not already a keyword) and the type vocabulary literals (u4-u7, i1-i7, u16le, u16be, …, u64be, f32le, f32be, f64le, f64be, bytes, cstring, pstring).
  • Parser: parse type Foo = binary { field: type, ... } and packed struct(uN) Name ... end into new AST node kinds:
    • NODE_BINARY_BLOCK (children: vec of NODE_BINARY_FIELD, str1: type name)
    • NODE_PACKED_STRUCT (children: vec of NODE_BINARY_FIELD, str1: type name, int1: backing-integer width N)
    • NODE_BINARY_FIELD (str1: field name, str2: type spec serialized as string, int1: bit width when known)
  • Parser: array syntax [T; n], [T; field_name], [T] inside binary blocks (gate on _in_binary_block parser flag — these forms are illegal outside).
  • Parser: .with { field = value; field = value } block syntax (only legal as a method on packed-struct values; gate at typecheck, parse permissively).

Tests (spec/qspec/binary_parse_spec.qz):

  • All 5 worked examples from BINARY_DSL.md parse cleanly.
  • Sub-byte type at top level (var x: u4 = ...) → parse OK, typecheck error in 1.2.
  • [T; n] outside binary block → parse error.
  • Empty binary {} → parse OK, typecheck error in 1.2 (zero width).
  • Round-trip parse → format → parse stable.

Files touched (estimated):

  • self-hosted/frontend/lexer.qz: ~80 lines (new keyword + type-literal recognizers)
  • self-hosted/frontend/parser.qz: ~250 lines (block parsing + field parsing + array syntax)
  • self-hosted/frontend/ast.qz: ~50 lines (new NODE_* constants and accessors)
  • self-hosted/shared/node_constants.qz: 3 new constants
  • self-hosted/shared/token_constants.qz: ~5 new tokens
  • New: spec/qspec/binary_parse_spec.qz

Commit message: Binary DSL Phase 1.1: parser + AST surface

Sub-phase 1.2 — Typecheck (target: 1-2 commits, 400-600 lines)

Goal: All design-doc invariants are enforced at compile time with friendly error messages.

Scope:

  • Register binary {} and packed struct(uN) types in the type registry; assign each a unique layout-id.
  • Parse the type-spec string in each NODE_BINARY_FIELD into a structured field descriptor (kind, width, endianness, optional length expression).
  • Width-sum check for binary {}: total bits must be multiple of 8. Friendly error per worked example in BINARY_DSL.md (QZ0950).
  • Width-sum check for packed struct(uN): total bits must equal exactly N. Friendly error (QZ0951).
  • Multi-byte alignment check: any u16le/u32be/etc. field must start at a byte-aligned bit offset. Error (QZ0952) suggests _: uN pad.
  • Sub-byte type usage check: u1-u7 and i1-i7 outside binary {} / packed struct(uN) → error (QZ0953).
  • Synthesize decode(bytes: Bytes): Result<T, ParseError> and encode(self): Bytes methods on each binary type.
  • Synthesize .with { ... } block on packed-struct types.
  • Register ParseError enum (in std/error.qz or std/binary.qz — pick one).

Tests (spec/qspec/binary_typecheck_spec.qz):

  • Width-sum errors fire with correct message format (assert_output_contains).
  • Misalignment errors fire and suggest correct pad width.
  • Sub-byte at top level errors.
  • Valid worked examples typecheck cleanly.
  • .encode() and .decode() methods exist with correct signatures.

Files touched:

  • self-hosted/middle/typecheck.qz and friends: register binary types, validate widths
  • self-hosted/middle/typecheck_walk.qz: handle NODE_BINARY_BLOCK / NODE_PACKED_STRUCT
  • self-hosted/middle/typecheck_builtins.qz: register ParseError and synthesize methods
  • self-hosted/error/diagnostic.qz: add QZ0950-QZ0953 error codes with friendly explainers
  • New: std/binary.qz (the ParseError enum + supporting types) — small file

Commit message: Binary DSL Phase 1.2: typecheck and validation

Sub-phase 1.3 — MIR opcodes + stub codegen (target: 1 commit, 300-500 lines)

Goal: MIR layer represents the layout. Codegen emits a stub that returns a clear panic. Compiler still self-compiles.

Scope:

  • Add three MIR opcodes in mir.qz:
    • MIR_BINARY_PACK (operands: layout_id, field_value_ids… → result: Bytes)
    • MIR_BINARY_UNPACK (operands: layout_id, bytes_id → result: Result<tuple, ParseError>)
    • MIR_PACKED_BITCAST (operands: layout_id, value_id → result: opposite-typed value)
  • Layout registry inside MIR program state: vec of layout descriptors, each with field list (kind, width, endianness, offset).
  • Lower NODE_BINARY_BLOCK and NODE_PACKED_STRUCT to layout registration in MIR.
  • Lower .encode() calls to MIR_BINARY_PACK.
  • Lower .decode() calls to MIR_BINARY_UNPACK.
  • Lower as operator on packed structs to MIR_PACKED_BITCAST.
  • Lower .with { ... } block to a sequence of MIR_PACKED_BITCAST + register-level shift/mask + MIR_PACKED_BITCAST.
  • Stub codegen for the three new opcodes in a new cg_intrinsic_binary.qz file: emits a call to __qz_binary_not_implemented(layout_id) which panics with the layout name. Real codegen is sub-phase 1.4 (next session).

Tests (spec/qspec/binary_mir_spec.qz):

  • Layout-id assignment is stable across runs (use --dump-mir if available; otherwise verify via internal API).
  • Stub codegen produces well-formed IR (compiles via llc, links).
  • Calling decode or encode at runtime panics with the expected message.

Files touched:

  • self-hosted/backend/mir.qz: 3 new opcodes + layout registry (~200 lines)
  • self-hosted/backend/mir_lower.qz and friends: lowering for the new node kinds (~150 lines)
  • self-hosted/backend/codegen.qz: dispatch new opcodes
  • New: self-hosted/backend/cg_intrinsic_binary.qz (~100 lines, mostly stubs)
  • self-hosted/backend/intrinsic_registry.qz: register the new opcodes

Commit message: Binary DSL Phase 1.3: MIR opcodes + stub codegen

Out of scope for tonight (handoff for next session)

  • Sub-phase 1.4 — Real codegen (~600-800 lines, the biggest piece). Adjacent-field fusion, shift+mask sequences, ParseError construction, Bytes interop. This is a multi-session piece on its own. Leave a clean handoff doc.
  • Sub-phase 1.5 — Roundtrip tests for all 5 worked examples. Depends on 1.4 working.
  • Phase 2-5. Per BINARY_DSL.md.

Critical safety rules (from CLAUDE.md and tonight’s lessons)

  1. Quake guard before every commit touching self-hosted/*.qz. No exceptions. The Bootstrap Island incident lost 100+ commits because guard was skipped. Apr 16 cache-pattern session lost a binary because of the same trap (recovered from golden, no commit damage).

  2. Fix-specific golden BEFORE risky work. Done at the top of this prompt. Never overwrite that file until the work is committed AND verified end-to-end.

  3. Smoke tests after every guard. Fixpoint alone is insufficient — a self-consistent broken compiler can pass fixpoint forever. After each guard run:

    ./self-hosted/bin/quartz examples/brainfuck.qz | llc -filetype=obj -o /tmp/bf.o && \
      clang /tmp/bf.o -o /tmp/bf -lm -lpthread && /tmp/bf | tail -1
    ./self-hosted/bin/quartz examples/style_demo.qz | llc -filetype=obj -o /tmp/sd.o && \
      clang /tmp/sd.o -o /tmp/sd -lm -lpthread && /tmp/sd | head -1
    ./self-hosted/bin/quartz examples/expr_eval.qz > /tmp/ee.ll && \
      llc -filetype=obj /tmp/ee.ll -o /tmp/ee.o && clang /tmp/ee.o -o /tmp/ee -lm -lpthread && \
      /tmp/ee | tail -3

    All three must pass. Stop and recover if any regress.

  4. Two-phase bootstrap for bidirectional changes. When a change requires gen1 to compile new source that uses gen1’s own new feature (e.g., the cache-pattern attempt tonight): split into two commits. First commit lands the compiler change without using it; second commit uses the new feature in source. This was the lesson of the Apr 16 cache-pattern session.

  5. Never --no-verify. If pre-commit fails, fix the underlying issue and re-commit. Never bypass.

  6. Check ~/Library/Logs/DiagnosticReports/<binary>-*.ips FIRST on silent SIGSEGVs. macOS writes a full crash report (stack, registers, fault address) for every silent crash. Don’t ASAN/lldb-loop until you’ve read the .ips.

  7. Restore from quartz-pre-binary-dsl-golden if anything breaks that you can’t immediately fix:

    cp self-hosted/bin/backups/quartz-pre-binary-dsl-golden self-hosted/bin/quartz

Discoveries (append as you go)

D1 — ; is lexed as TOK_NEWLINE, not TOK_SEMI

The Quartz lexer emits TOK_NEWLINE for both real newlines and the ; character (there is no TOK_SEMI token). Inside binary-array type position [T; n], the ; arrives as TOK_NEWLINE, so the separator check uses that. Harmless here because field-type position is always one line, but worth remembering: never write if ps_check(ps, TOK_SEMI) — that constant doesn’t exist, and ripgrep won’t warn you.

D2 — ps_error does not advance the token stream

Error recovery is caller’s responsibility. A while ps_check(RBRACE) == 0 loop that calls ps_error(...) without advancing will spin forever on malformed input. Both ps_parse_binary_block_body and ps_parse_packed_struct handle this by breaking out when the field-name token doesn’t match, rather than leaning on ps_expect for recovery. This pattern is load-bearing — future parser work should follow it.

D3 — prog: Int vs prog: MirProgram in codegen_instr

cg_emit_instr takes prog: Int (existential erasure) but the MIR program accessors expect MirProgram. Getting MirProgram from the Int requires as_type(prog) or renaming the handler signature. The 1.3 stub sidesteps this by only using operand1 (layout id) in the comment, not looking up the layout name. 1.4 real codegen will need layout name / field table lookups — pick a solution at the start of that work (easiest: add a variant cg_emit_instr_typed(prog: MirProgram, ...) and forward from the untyped entry).

The design doc type-vocabulary table lists only u1..u7 as “sub-byte” and u8/u16/u32/u64 as “byte-aligned.” The IPv4 header example uses u13be for the 13-bit fragment offset. Interpretation: u<N>[le|be] is valid for any 1 ≤ N ≤ 64; only N ∈ {16, 32, 64} is considered “multi-byte” (must byte-align); everything else is bit-packed and needs no alignment. Both _tc_bin_spec_kind (typecheck) and _bin_prim_bit_width (parser) now parse numeric widths generically. Worth updating BINARY_DSL.md to spell this out.

D5 — NODE_BINARY_BLOCK replaces the type alias, not augments it

type Foo = binary { ... } is parsed by branching inside ps_parse_type_alias and returning a NODE_BINARY_BLOCK node in place of the NODE_TYPE_ALIAS the caller was going to build. The resolver’s kind-dispatcher at resolve_collect_funcs silently skips unknown kinds, so binary blocks don’t appear as type aliases — they go unregistered in the type registry. Sub-phase 1.4 will need to register them properly (for IPv4Header.decode(...) resolution), likely by adding a new resolver tag alongside the existing struct/enum/type_alias tags.


Handoff to next session — Sub-phase 1.4 (real codegen)

What’s done (this session)

Sub-phaseCommitTestsStatus
1.1 parser + ASTfc0b594fbinary_parse_spec.qz (14)✅ green
1.2 typecheck + validation (QZ0950-0953) + std/binary.qz0e50b640binary_typecheck_spec.qz (19)✅ green
1.3 MIR opcodes + layout registry + stub codegenfa0b9fd2binary_mir_spec.qz (10)✅ green

43 tests total, all green. Fixpoint 2051 functions (gen1 == gen2). Smokes (brainfuck, style_demo, expr_eval) green after every commit. Session-wide backup at self-hosted/bin/backups/quartz-pre-binary-dsl-golden untouched.

What’s next — Sub-phase 1.4

Real codegen for the three MIR opcodes + method synthesis so IPv4Header.decode(bytes) / header.encode() / moder as u32 are callable from user code. Target scope ~600-800 lines, multi-session.

Recommended sequence:

  1. Method resolution first. Add an entry to the resolver that registers each binary block under its type name and knows about synthesised decode / encode methods. Resolver entry tag similar to existing struct/enum tags; typecheck_walk synthesises method signatures in Phase 2 (tc_register_function_signature). Without this step, IPv4Header.decode(...) fails with “unknown function” and nothing downstream runs.

  2. MIR lowering for .encode() / .decode() / as. Hook mir_lower_expr_handlers.qz at the call-expression handler. When the callee resolves to a synthesised encode/decode method, emit MIR_BINARY_PACK / MIR_BINARY_UNPACK with the layout id as operand1 and the field values as args. For as: hook the binary-op as lowering.

  3. Real codegen in cg_intrinsic_binary.qz. Per design §MIR opcodes (BINARY_DSL.md L244-253): walk the layout field list, coalesce adjacent byte-aligned fields into single loads (u32be followed by u32be → one i64 load + bswap + two extracts), emit shift/mask/or sequences for sub-byte fields. Use Bytes (see docs/design/BYTES.md) for the buffer; Phase 9 ownership is zero-copy with a borrowed/owned flag.

  4. ParseError plumbing. Result construction wires through the existing Result<T, E> codegen. Errors are UnexpectedEof (buffer too short), InvalidValue (only fires for discriminated unions in Phase 2 — skip for Phase 1.4), LengthOverflow (length prefix > remaining buffer). Zero-length binary {} already errors at typecheck (QZ0950) so won’t reach codegen.

  5. .with { field = val } block for packed structs. Parser currently has no handler; handoff called for permissive parse. Simplest: parse .with { as a method-call form in ps_parse_postfix (after TOK_DOT); each inner name = expr becomes a NODE_ASSIGN-like entry in the children vector. Typecheck resolves against the registered packed-struct fields; codegen emits the shift+mask+or sequence inline.

  6. Roundtrip QSpec (binary_roundtrip_spec.qz). All 5 worked examples from BINARY_DSL.md — encode then decode the same value, assert equality. This is sub-phase 1.5 territory but unlocks as soon as 1.4 is solid.

Key files for 1.4

self-hosted/frontend/parser.qz         # .with { } postfix (~80 lines)
self-hosted/resolver.qz                # binary-block resolver tag (~50 lines)
self-hosted/middle/typecheck.qz        # method synthesis in register_binary_block
self-hosted/middle/typecheck_walk.qz   # call the synthesiser in Phase 2
self-hosted/backend/mir_lower_expr_handlers.qz  # .encode / .decode / as lowering
self-hosted/backend/cg_intrinsic_binary.qz     # real shift/mask emitter
self-hosted/backend/codegen_instr.qz   # remove the stub branch, call real emitter

Quirks to remember

  • All five discoveries above (D1-D5) — read them first.
  • Guard is mandatory before every commit; pre-commit hook enforces it. Never bypass.
  • Smoke tests after every guard: brainfuck / style_demo / expr_eval. The three together cover ~90% of the language surface.
  • Fix-specific binary backup at self-hosted/bin/backups/quartz-pre-binary-dsl-golden — restore from this if 1.4 breaks the compiler past the point where fixpoint can help.
  • prog: Int in cg_emit_instr — see D3. Decide on typed/untyped accessor before writing real codegen.

Test status

FileTestsStatus
spec/qspec/binary_parse_spec.qz14🟢 green
spec/qspec/binary_typecheck_spec.qz19🟢 green
spec/qspec/binary_mir_spec.qz10🟢 green
spec/qspec/binary_roundtrip_spec.qz🚧 to write in 1.5

Full QSpec suite was NOT run from Claude Code (CLAUDE.md protocol — too slow / risky in the harness PTY). Run ./self-hosted/bin/quake qspec in a terminal before calling 1.4 complete to catch any cross-spec regressions.

Memory updates — candidate entries for ~/.claude

  • project_binary_dsl.md — Phase 1 foundation (parser + typecheck + MIR + stub codegen) shipped in 3 commits on trunk, 43 tests. Phase 1.4 is the next piece: real shift/mask codegen + method synthesis + .with postfix. Design doc docs/design/BINARY_DSL.md has 12 locked decisions; don’t re-litigate.
  • feedback_binary_dsl_parser.md (discoveries D1, D2) — Two Quartz parser conventions worth remembering generally: no TOK_SEMI, ps_error doesn’t advance.
  • feedback_binary_dsl_width_vocab.md (discovery D4) — u[le|be] is valid for any N ∈ [1, 64], not just {8, 16, 32, 64}. Multi-byte alignment rule applies only to N ∈ {16, 32, 64}.