Quartz v5.25

Handoff — Incremental-Cache %push Bug in socket$pipe_create

STATUS: CLOSED — fixed Apr 19, 2026 (overnight session). Commit d83ad7b9. Fixpoint verified at 2142 functions. 4-compile repro reports 0 bad %push loads across 6 consecutive rounds. Follow-up commit 6cda03e3 removed the 5 safest --no-cache workarounds from Quakefile.qz. A secondary UFCS/extern shadowing bug was uncovered while investigating file_helpers_spec during the same session — filed in ROADMAP as UFCS-EXTERN-SHADOW (commit 108fdc2e), not part of this fix.

What actually landed (vs. the §4 sketch below): the §4 “per-module fragment check” was necessary but not sufficient. Two additional pieces were required:

  1. Short-name normalization of recompile_set: dep-graph uses import-path names (std/ffi/socket), cache fragments and function-name prefixes use short names (socket). Without normalization the TC-skip check never matched.
  2. Resolve owning module via ast_store, not function-name prefix: for impl methods (Progress$spinner_only), the prefix is the struct name, not the module. Added dep_graph::depgraph_name_for_ast_store and use it in the TC-skip gate.

See the commit message of d83ad7b9 for the full three-part explanation. The narrative below is preserved for future debugging context; it correctly identified the TC-skip path as the problem site but underestimated the namespace mismatch.


Original handoff (pre-fix)

One-liner: Tier 2 incremental compilation skips TC+MIR for unchanged modules under the assumption that a cached .ll fragment covers them. When a module is unchanged but has no cached fragment (because prior builds didn’t emit one for that interface hash), codegen fresh-lowers MIR from an AST that was never type-checked this round — losing Vec<T> type annotations and falling back to closure-call dispatch that references undefined %push local.

Status: Pre-existing, reproduces on quartz-golden (pre-2026-04-19 binary). Documented in docs/handoff/sys1-complete.md:92 as a “low-priority compiler-internals bug for a debugging session.” This handoff is that session’s starter artifact.

Symptoms narrowed in session 2026-04-19: root cause confidently identified; no code fix yet.

Scope: single-session, maybe 2 if the fix reveals secondary issues.


1. The symptom

Direct native compile of certain spec files produces IR that llc rejects:

llc: error: /tmp/foo.ll:19637:24: error: use of undefined value '%push'
  %v66 = load i64, ptr %push, align 8
                       ^

The failing load is inside define i64 @socket$pipe_create(). The surrounding IR pattern is the Quartz “tagged function-pointer dispatch” — closure-vs-plain call trampoline — invoked on a value loaded from the non-existent local %push.

The correct IR for the same function compiles result.push(read_fd) into inlined vec_push intrinsic (grow+store+memcpy). No %push load at all.


2. Deterministic reproducer (4 compiles, ~15 seconds)

cd /Users/mathisto/projects/quartz
rm -rf .quartz-cache
./self-hosted/bin/quartz examples/brainfuck.qz > /tmp/c1.ll 2>/dev/null
./self-hosted/bin/quartz examples/expr_eval.qz > /tmp/c2.ll 2>/dev/null
./self-hosted/bin/quartz spec/qspec/try_keyword_spec.qz > /tmp/c3.ll 2>/dev/null
# Any 4th spec (or any qspec_main program) triggers the bug:
./self-hosted/bin/quartz spec/qspec/postfix_try_spec.qz > /tmp/c4.ll 2>/dev/null
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
llc -filetype=obj /tmp/c4.ll -o /tmp/c4.o 2>&1 | head -3
# expected: "error: use of undefined value '%push'"
grep -c "load i64, ptr %push" /tmp/c4.ll
# expected: 14

Minimal 4th file (confirmed; bug fires even without any effect syntax):

import * from qspec

def piezo_seven(): Int = 7

def main(): Int
  describe("trigger") do ->
    it("fires the bug") do -> assert_eq(piezo_seven(), 7) end
  end
  return qspec_main()
end

Goes away with --no-cache on the 4th compile — confirmed. This is why Quakefile.qz uses --no-cache in 10+ places (qspec_file, fuzz, fixpoint gen1/gen2, site build, doc gen, etc.) — working around this bug.

Single-spec compile is finerm -rf .quartz-cache && quartz spec/qspec/postfix_try_spec.qz produces clean IR. The bug needs cache warmup from at least one prior “large” build (brainfuck and expr_eval together suffice; either alone does not).


3. Why it happens — full causal chain

3.1 The bad IR

In socket$pipe_create, result.push(read_fd) gets lowered to MIR_CALL_INDIRECT with fn_ptr = load %push. The MIR emitter assumed push was a first-class function value in a local variable. Codegen at self-hosted/backend/codegen_instr.qz:555-680 emits the tagged-pointer dispatch trampoline, which always starts with:

%v66 = load i64, ptr %push, align 8   ; <-- references undeclared local
%v67.tag = and i64 %v66, 1
%v67.is_closure = icmp eq i64 %v67.tag, 1
br i1 %v67.is_closure, label %call67.closure, label %call67.plain

3.2 Why MIR emitted MIR_CALL_INDIRECT instead of vec_push intrinsic

The UFCS method-rewrite lives at self-hosted/middle/typecheck_expr_handlers.qz:1811-1857:

elsif first_arg_type == type_constants::TYPE_VEC
  # Vec UFCS method rewrites
  type_name = "Vec"
  canonical_intrinsic = match func_name
    "push" => "vec_push"
    "pop"  => "vec_pop"
    ...
  end
  if str_byte_len(canonical_intrinsic) > 0
    func_name = canonical_intrinsic
    ast::ast_set_str1(ast_storage, callee_node, canonical_intrinsic)   # <-- rewrites AST node
    ...
  end

This rewrite only fires when first_arg_type == TYPE_VEC. If the receiver’s type is not known at TC time, the rewrite is skipped — the method name stays push — and MIR later interprets push as an identifier reference, emitting MIR_LOAD + MIR_CALL_INDIRECT.

3.3 Why first_arg_type becomes “not TYPE_VEC” in the cached path

Two-tier incremental cache logic in self-hosted/quartz.qz:625-683:

  • Tier 1 (depgraph_invalidate_with_cutoff) walks the dep graph and builds changed_modules from content-hash diffs.
  • Tier 2 (recompile_set) is derived from changed_modules + interface-hash cutoffs. Modules NOT in recompile_set get their TC+MIR skipped — their function bodies are not typechecked this round; the compiler expects to reuse their cached IR fragment.

The safety check at lines 641-653:

var has_fragments = 0
for hfi in 0..iface_names.size
  var hf_path = ".quartz-cache/modules/#{iface_names[hfi]}.ll"
  if file_exists(hf_path)
    has_fragments = 1
    break
  end
end
if has_fragments == 0
  recompile_set = vec_new<String>()   # disable TC skip for this build
end

This check fires only if NO fragments exist at all. The assumption is: “if any fragment exists, all skipped modules must also have fragments.” That assumption is false. After brainfuck.qz + expr_eval.qz + try_keyword_spec.qz, the cache has fragment files for Int.ll, OpKind.ll, Option.ll, Result.ll, show_desc.ll, __main__.ll and four content-hash .ll files — but no socket.ll fragment. socket wasn’t extracted as a named interface-hash fragment by any of the prior compiles (its symbols landed inside the whole-program .ll files instead).

When the 4th compile runs:

  1. Tier 1 finds only __main__ changed.
  2. Tier 2 builds recompile_set = ["__main__"]. 15 of 16 modules “skipped.” --explain-cache confirms: [cache] Tier 2: skipping TC+MIR for 15/16 modules.
  3. has_fragments == 1 (because Int/Option/Result fragments exist) → safety check passes.
  4. But socket has NO fragment. It’s neither in recompile_set nor does it have a cached .ll to reuse.
  5. At codegen time, socket$pipe_create needs to be emitted. It has been parsed but never typechecked this round. ast_set_str1 for push → vec_push never happened. MIR lowering sees raw push and emits MIR_CALL_INDIRECT.
  6. Codegen emits the %push-loading trampoline; LLVM rejects.

3.4 Why try_keyword_spec doesn’t trigger but effect_decl_parse_spec does

Red herring. Both can trigger it as the “4th file”; both are clean as the “1st-3rd file.” What matters is did the prior compiles emit a cached fragment for socket. None of brainfuck/expr_eval/try_keyword extract socket as its own interface-hash fragment, so on compile #4 socket is in the gap.

What pushes socket INTO a named fragment is the Tier 2 logic ticking over the interface-hash boundary — which depends on which modules changed since last build. Not fully traced; doesn’t matter for the fix.


4. Proposed fix (per-module fragment check)

Replace the “any fragment exists” guard in self-hosted/quartz.qz:641-653 with a per-module check: any unchanged module whose fragment is missing must be added to recompile_set.

Sketch:

# Before computing recompile_set, collect modules that are unchanged AND
# lack a cached fragment — they must be recompiled this round.
var missing_fragments = vec_new<String>()
for mi in 0..iface_names.size
  var mod_name = iface_names[mi]
  var frag_path = ".quartz-cache/modules/#{mod_name}.ll"
  if file_exists(frag_path) == 0
    # Skip modules already in changed_modules (they're being recompiled anyway)
    var already_changed = 0
    for ci in 0..changed_modules.size
      if changed_modules[ci].eq(mod_name)
        already_changed = 1
        break
      end
    end
    if already_changed == 0
      missing_fragments.push(mod_name)
    end
  end
end

# Union missing_fragments into recompile_set before dep_graph invalidation
# (so anything downstream of a recompiled module is also flagged).
for mi in 0..missing_fragments.size
  changed_modules.push(missing_fragments[mi])
end
# ... existing depgraph_invalidate_with_cutoff call ...

Caveats / things to verify while writing the fix:

  • iface_names is populated before this point — confirm by reading context around line 625.
  • depgraph_invalidate_with_cutoff signature — it may already do transitive closure, in which case just add the missing-fragment modules to changed_modules and let the existing logic propagate.
  • The existing has_fragments == 0 fallback can stay as a belt-and-suspenders guard, or be removed once the per-module check is in place.
  • Whatever fix lands, verify against the 4-compile reproducer, plus self-compile fixpoint — since the compiler uses --no-cache internally, fixpoint won’t change, but a regression that makes ALL builds recompile everything would still be a perf regression worth catching.

5. What’s been ruled out in the prior session

Do not re-investigate:

  • Not caused by the piezo Milestone A parser changes (committed as 86e0f906). Orphan ast_function nodes from ps_parse_effect_decl were suspected; an experiment replacing that with a pure token-skip also eliminated the bug in a narrow test, but the minimal reproducer confirms the bug fires with zero effect syntax and zero parser changes to the spec file. The parser work is clean.
  • Not caused by $try → try migration. The reproducer uses any spec (including non-migrated ones like builtin_macros_spec) and non-spec code alike.
  • Not caused by the new with handler suppress_trailing_block flag. Added to fix a different problem (handler call eating the body’s do), orthogonal.
  • Not a nondeterminism / race. Two identical fresh compiles produce byte-identical IR. The bug is fully deterministic on the 4th-compile path.
  • Not a fixpoint failure. quake guard passes at 2141 functions. self-hosted/bin/quartz compiles itself gen1 == gen2 byte-identical.
  • Not a symptom of the stale sb_append_int theory from sys1-complete.md:92. That doc speculated it might be a SB buffer interaction; it isn’t. It’s the cache skip logic.

6. Useful grep / file-pointer reference card

NeedPath
Tier 2 safety gate (the buggy one)self-hosted/quartz.qz:637-683
UFCS rewrite that depends on TCself-hosted/middle/typecheck_expr_handlers.qz:1811-1857
MIR_CALL_INDIRECT codegen (emits the bad %push load)self-hosted/backend/codegen_instr.qz:555-680
Vec intrinsic categoriesself-hosted/backend/intrinsic_registry.qz:595 (vec_push)
Depgraph invalidationself-hosted/backend/dep_graph.qz (depgraph_invalidate_with_cutoff)
Module-fragment cache writeself-hosted/backend/codegen_separate.qz:210
Hash/manifest layoutself-hosted/shared/build_cache.qz
--explain-cache CLI handlerself-hosted/quartz.qz:655-683

Key CLI flags:

  • --no-cache: disable cache entirely (sidesteps bug; used throughout Quakefile).
  • --explain-cache: print per-module Tier 1/Tier 2 decisions to stderr.
  • --separate: emit per-module .ll files (different path; may help understand fragment emission).
  • --dump-effects: unrelated to this bug but useful general intro to compiler internals.

No --dump-mir or --dump-ast exists despite CLAUDE.md claims.


7. Investigation strategy for the fresh session

  1. Read this doc, then the three file excerpts above (quartz.qz:625-683, typecheck_expr_handlers.qz:1811-1857, codegen_instr.qz:555-680). ~30 minutes.
  2. Reproduce with --explain-cache to confirm the “15/16 modules skipped, socket among the skipped-without-fragment” picture:
    rm -rf .quartz-cache
    for f in brainfuck expr_eval; do quartz examples/$f.qz > /dev/null; done
    quartz spec/qspec/try_keyword_spec.qz > /dev/null
    quartz --explain-cache spec/qspec/postfix_try_spec.qz 2>/tmp/cache.log >/dev/null
    cat /tmp/cache.log
    ls .quartz-cache/modules/ | grep -E "^(socket|Int|Option|Result)\.ll$"
  3. Print the iface_names list in quartz.qz right before the safety gate, to confirm socket is in it. Wire a quick eputs for debugging, then remove.
  4. Implement the per-module fragment check (sketch in § 4).
  5. Verify the reproducer is fixed (green llc, 0 bad %push refs in IR).
  6. Verify no regression in quake guard (fixpoint must still hold).
  7. Verify no regression in quake qspec (full suite, run from a separate terminal per CLAUDE.md — DO NOT run from Claude Code).
  8. Audit the 10 Quakefile --no-cache sites — now that the cache is sound, some may become unnecessary. Remove cautiously, commit separately.
  9. Write the bug’s postmortem back into docs/handoff/sys1-complete.md:92-ish area as a closed loop (“fix landed in commit X, see Y”).

Expected fix size: < 50 lines of code change, probably 20. The hard part was the diagnosis, which is done.


8. Dev-time gotchas

  • The .quartz-cache/ dir persists across compiler rebuilds. When you run quake guard, a guard-internal step clears it (“Cleared .quartz-cache (compiler binary changed)”). But if you’re iterating without quake guard, stale cache from a prior run can mask or shift the repro. Always rm -rf .quartz-cache before each repro test.
  • quake guard uses --no-cache internally (see Quakefile lines 492, 514 — gen1 and gen2 compiles). So guard + fixpoint do NOT exercise the cache path. A passing guard is necessary but not sufficient — you MUST also test the cache path manually.
  • Don’t run the full quake qspec from Claude Code (it hangs PTY per CLAUDE.md). Run single specs via FILE=… quake qspec_file, and quake qspec_file uses --no-cache — meaning it doesn’t exercise the cache either. To validate your fix, run the 4-compile native-IR reproducer manually.
  • self-hosted/.fixpoint timestamps update on every quake guard; don’t let timestamp churn scare you into thinking the binary changed. Check binary_hash line, not verified_at.
  • The compiler is at 2141 functions on trunk at commit 86e0f906. If your fix adds functions, guard will update that count — fine.

9. Acceptance criteria

A fix for this bug is done when:

  1. The 4-compile reproducer produces llc-clean IR for the 4th file (grep -c "load i64, ptr %push" returns 0).
  2. quake guard passes; fixpoint holds.
  3. At least 3 existing specs compile cleanly via the direct quartz | llc | clang pipeline on a warm cache — pick postfix_try_spec, defer_spec, never_type_spec.
  4. No regression in total compile time for the smoke-test suite (brainfuck + style_demo + expr_eval). This is a correctness fix, not a perf fix — but doubling compile time would warrant a second look.
  5. Commit message [cache] Fix Tier 2 skip for unchanged modules missing cached fragments or similar. Cross-reference this handoff doc and docs/handoff/sys1-complete.md:92.

That’s it. Everything the fresh session needs is in this doc plus the three file excerpts. The hard diagnostic work is done; the fix is a targeted 20-line patch with clear acceptance criteria.

Good hunting.