Handoff — Incremental-Cache %push Bug in socket$pipe_create
STATUS: CLOSED — fixed Apr 19, 2026 (overnight session). Commit
d83ad7b9. Fixpoint verified at 2142 functions. 4-compile repro reports 0 bad%pushloads across 6 consecutive rounds. Follow-up commit6cda03e3removed the 5 safest--no-cacheworkarounds fromQuakefile.qz. A secondary UFCS/extern shadowing bug was uncovered while investigatingfile_helpers_specduring the same session — filed in ROADMAP as UFCS-EXTERN-SHADOW (commit108fdc2e), not part of this fix.
What actually landed (vs. the §4 sketch below): the §4 “per-module fragment check” was necessary but not sufficient. Two additional pieces were required:
- Short-name normalization of
recompile_set: dep-graph uses import-path names (std/ffi/socket), cache fragments and function-name prefixes use short names (socket). Without normalization the TC-skip check never matched.- Resolve owning module via
ast_store, not function-name prefix: for impl methods (Progress$spinner_only), the prefix is the struct name, not the module. Addeddep_graph::depgraph_name_for_ast_storeand use it in the TC-skip gate.See the commit message of
d83ad7b9for the full three-part explanation. The narrative below is preserved for future debugging context; it correctly identified the TC-skip path as the problem site but underestimated the namespace mismatch.
Original handoff (pre-fix)
One-liner: Tier 2 incremental compilation skips TC+MIR for unchanged modules under the assumption that a cached
.llfragment covers them. When a module is unchanged but has no cached fragment (because prior builds didn’t emit one for that interface hash), codegen fresh-lowers MIR from an AST that was never type-checked this round — losingVec<T>type annotations and falling back to closure-call dispatch that references undefined%pushlocal.
Status: Pre-existing, reproduces on quartz-golden (pre-2026-04-19 binary). Documented in docs/handoff/sys1-complete.md:92 as a “low-priority compiler-internals bug for a debugging session.” This handoff is that session’s starter artifact.
Symptoms narrowed in session 2026-04-19: root cause confidently identified; no code fix yet.
Scope: single-session, maybe 2 if the fix reveals secondary issues.
1. The symptom
Direct native compile of certain spec files produces IR that llc rejects:
llc: error: /tmp/foo.ll:19637:24: error: use of undefined value '%push'
%v66 = load i64, ptr %push, align 8
^
The failing load is inside define i64 @socket$pipe_create(). The surrounding IR pattern is the Quartz “tagged function-pointer dispatch” — closure-vs-plain call trampoline — invoked on a value loaded from the non-existent local %push.
The correct IR for the same function compiles result.push(read_fd) into inlined vec_push intrinsic (grow+store+memcpy). No %push load at all.
2. Deterministic reproducer (4 compiles, ~15 seconds)
cd /Users/mathisto/projects/quartz
rm -rf .quartz-cache
./self-hosted/bin/quartz examples/brainfuck.qz > /tmp/c1.ll 2>/dev/null
./self-hosted/bin/quartz examples/expr_eval.qz > /tmp/c2.ll 2>/dev/null
./self-hosted/bin/quartz spec/qspec/try_keyword_spec.qz > /tmp/c3.ll 2>/dev/null
# Any 4th spec (or any qspec_main program) triggers the bug:
./self-hosted/bin/quartz spec/qspec/postfix_try_spec.qz > /tmp/c4.ll 2>/dev/null
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
llc -filetype=obj /tmp/c4.ll -o /tmp/c4.o 2>&1 | head -3
# expected: "error: use of undefined value '%push'"
grep -c "load i64, ptr %push" /tmp/c4.ll
# expected: 14
Minimal 4th file (confirmed; bug fires even without any effect syntax):
import * from qspec
def piezo_seven(): Int = 7
def main(): Int
describe("trigger") do ->
it("fires the bug") do -> assert_eq(piezo_seven(), 7) end
end
return qspec_main()
end
Goes away with --no-cache on the 4th compile — confirmed. This is why Quakefile.qz uses --no-cache in 10+ places (qspec_file, fuzz, fixpoint gen1/gen2, site build, doc gen, etc.) — working around this bug.
Single-spec compile is fine — rm -rf .quartz-cache && quartz spec/qspec/postfix_try_spec.qz produces clean IR. The bug needs cache warmup from at least one prior “large” build (brainfuck and expr_eval together suffice; either alone does not).
3. Why it happens — full causal chain
3.1 The bad IR
In socket$pipe_create, result.push(read_fd) gets lowered to MIR_CALL_INDIRECT with fn_ptr = load %push. The MIR emitter assumed push was a first-class function value in a local variable. Codegen at self-hosted/backend/codegen_instr.qz:555-680 emits the tagged-pointer dispatch trampoline, which always starts with:
%v66 = load i64, ptr %push, align 8 ; <-- references undeclared local
%v67.tag = and i64 %v66, 1
%v67.is_closure = icmp eq i64 %v67.tag, 1
br i1 %v67.is_closure, label %call67.closure, label %call67.plain
3.2 Why MIR emitted MIR_CALL_INDIRECT instead of vec_push intrinsic
The UFCS method-rewrite lives at self-hosted/middle/typecheck_expr_handlers.qz:1811-1857:
elsif first_arg_type == type_constants::TYPE_VEC
# Vec UFCS method rewrites
type_name = "Vec"
canonical_intrinsic = match func_name
"push" => "vec_push"
"pop" => "vec_pop"
...
end
if str_byte_len(canonical_intrinsic) > 0
func_name = canonical_intrinsic
ast::ast_set_str1(ast_storage, callee_node, canonical_intrinsic) # <-- rewrites AST node
...
end
This rewrite only fires when first_arg_type == TYPE_VEC. If the receiver’s type is not known at TC time, the rewrite is skipped — the method name stays push — and MIR later interprets push as an identifier reference, emitting MIR_LOAD + MIR_CALL_INDIRECT.
3.3 Why first_arg_type becomes “not TYPE_VEC” in the cached path
Two-tier incremental cache logic in self-hosted/quartz.qz:625-683:
- Tier 1 (
depgraph_invalidate_with_cutoff) walks the dep graph and buildschanged_modulesfrom content-hash diffs. - Tier 2 (
recompile_set) is derived fromchanged_modules+ interface-hash cutoffs. Modules NOT inrecompile_setget their TC+MIR skipped — their function bodies are not typechecked this round; the compiler expects to reuse their cached IR fragment.
The safety check at lines 641-653:
var has_fragments = 0
for hfi in 0..iface_names.size
var hf_path = ".quartz-cache/modules/#{iface_names[hfi]}.ll"
if file_exists(hf_path)
has_fragments = 1
break
end
end
if has_fragments == 0
recompile_set = vec_new<String>() # disable TC skip for this build
end
This check fires only if NO fragments exist at all. The assumption is: “if any fragment exists, all skipped modules must also have fragments.” That assumption is false. After brainfuck.qz + expr_eval.qz + try_keyword_spec.qz, the cache has fragment files for Int.ll, OpKind.ll, Option.ll, Result.ll, show_desc.ll, __main__.ll and four content-hash .ll files — but no socket.ll fragment. socket wasn’t extracted as a named interface-hash fragment by any of the prior compiles (its symbols landed inside the whole-program .ll files instead).
When the 4th compile runs:
- Tier 1 finds only
__main__changed. - Tier 2 builds
recompile_set = ["__main__"]. 15 of 16 modules “skipped.”--explain-cacheconfirms:[cache] Tier 2: skipping TC+MIR for 15/16 modules. has_fragments == 1(because Int/Option/Result fragments exist) → safety check passes.- But
sockethas NO fragment. It’s neither inrecompile_setnor does it have a cached.llto reuse. - At codegen time,
socket$pipe_createneeds to be emitted. It has been parsed but never typechecked this round.ast_set_str1forpush → vec_pushnever happened. MIR lowering sees rawpushand emitsMIR_CALL_INDIRECT. - Codegen emits the
%push-loading trampoline; LLVM rejects.
3.4 Why try_keyword_spec doesn’t trigger but effect_decl_parse_spec does
Red herring. Both can trigger it as the “4th file”; both are clean as the “1st-3rd file.” What matters is did the prior compiles emit a cached fragment for socket. None of brainfuck/expr_eval/try_keyword extract socket as its own interface-hash fragment, so on compile #4 socket is in the gap.
What pushes socket INTO a named fragment is the Tier 2 logic ticking over the interface-hash boundary — which depends on which modules changed since last build. Not fully traced; doesn’t matter for the fix.
4. Proposed fix (per-module fragment check)
Replace the “any fragment exists” guard in self-hosted/quartz.qz:641-653 with a per-module check: any unchanged module whose fragment is missing must be added to recompile_set.
Sketch:
# Before computing recompile_set, collect modules that are unchanged AND
# lack a cached fragment — they must be recompiled this round.
var missing_fragments = vec_new<String>()
for mi in 0..iface_names.size
var mod_name = iface_names[mi]
var frag_path = ".quartz-cache/modules/#{mod_name}.ll"
if file_exists(frag_path) == 0
# Skip modules already in changed_modules (they're being recompiled anyway)
var already_changed = 0
for ci in 0..changed_modules.size
if changed_modules[ci].eq(mod_name)
already_changed = 1
break
end
end
if already_changed == 0
missing_fragments.push(mod_name)
end
end
end
# Union missing_fragments into recompile_set before dep_graph invalidation
# (so anything downstream of a recompiled module is also flagged).
for mi in 0..missing_fragments.size
changed_modules.push(missing_fragments[mi])
end
# ... existing depgraph_invalidate_with_cutoff call ...
Caveats / things to verify while writing the fix:
iface_namesis populated before this point — confirm by reading context around line 625.depgraph_invalidate_with_cutoffsignature — it may already do transitive closure, in which case just add the missing-fragment modules tochanged_modulesand let the existing logic propagate.- The existing
has_fragments == 0fallback can stay as a belt-and-suspenders guard, or be removed once the per-module check is in place. - Whatever fix lands, verify against the 4-compile reproducer, plus self-compile fixpoint — since the compiler uses
--no-cacheinternally, fixpoint won’t change, but a regression that makes ALL builds recompile everything would still be a perf regression worth catching.
5. What’s been ruled out in the prior session
Do not re-investigate:
- Not caused by the piezo Milestone A parser changes (committed as
86e0f906). Orphanast_functionnodes fromps_parse_effect_declwere suspected; an experiment replacing that with a pure token-skip also eliminated the bug in a narrow test, but the minimal reproducer confirms the bug fires with zero effect syntax and zero parser changes to the spec file. The parser work is clean. - Not caused by $try → try migration. The reproducer uses any spec (including non-migrated ones like builtin_macros_spec) and non-spec code alike.
- Not caused by the new
withhandlersuppress_trailing_blockflag. Added to fix a different problem (handler call eating the body’sdo), orthogonal. - Not a nondeterminism / race. Two identical fresh compiles produce byte-identical IR. The bug is fully deterministic on the 4th-compile path.
- Not a fixpoint failure.
quake guardpasses at 2141 functions.self-hosted/bin/quartzcompiles itself gen1 == gen2 byte-identical. - Not a symptom of the stale
sb_append_inttheory fromsys1-complete.md:92. That doc speculated it might be a SB buffer interaction; it isn’t. It’s the cache skip logic.
6. Useful grep / file-pointer reference card
| Need | Path |
|---|---|
| Tier 2 safety gate (the buggy one) | self-hosted/quartz.qz:637-683 |
| UFCS rewrite that depends on TC | self-hosted/middle/typecheck_expr_handlers.qz:1811-1857 |
MIR_CALL_INDIRECT codegen (emits the bad %push load) | self-hosted/backend/codegen_instr.qz:555-680 |
| Vec intrinsic categories | self-hosted/backend/intrinsic_registry.qz:595 (vec_push) |
| Depgraph invalidation | self-hosted/backend/dep_graph.qz (depgraph_invalidate_with_cutoff) |
| Module-fragment cache write | self-hosted/backend/codegen_separate.qz:210 |
| Hash/manifest layout | self-hosted/shared/build_cache.qz |
--explain-cache CLI handler | self-hosted/quartz.qz:655-683 |
Key CLI flags:
--no-cache: disable cache entirely (sidesteps bug; used throughout Quakefile).--explain-cache: print per-module Tier 1/Tier 2 decisions to stderr.--separate: emit per-module.llfiles (different path; may help understand fragment emission).--dump-effects: unrelated to this bug but useful general intro to compiler internals.
No --dump-mir or --dump-ast exists despite CLAUDE.md claims.
7. Investigation strategy for the fresh session
- Read this doc, then the three file excerpts above (quartz.qz:625-683, typecheck_expr_handlers.qz:1811-1857, codegen_instr.qz:555-680). ~30 minutes.
- Reproduce with
--explain-cacheto confirm the “15/16 modules skipped, socket among the skipped-without-fragment” picture:rm -rf .quartz-cache for f in brainfuck expr_eval; do quartz examples/$f.qz > /dev/null; done quartz spec/qspec/try_keyword_spec.qz > /dev/null quartz --explain-cache spec/qspec/postfix_try_spec.qz 2>/tmp/cache.log >/dev/null cat /tmp/cache.log ls .quartz-cache/modules/ | grep -E "^(socket|Int|Option|Result)\.ll$" - Print the
iface_nameslist inquartz.qzright before the safety gate, to confirm socket is in it. Wire a quickeputsfor debugging, then remove. - Implement the per-module fragment check (sketch in § 4).
- Verify the reproducer is fixed (green llc, 0 bad
%pushrefs in IR). - Verify no regression in
quake guard(fixpoint must still hold). - Verify no regression in
quake qspec(full suite, run from a separate terminal per CLAUDE.md — DO NOT run from Claude Code). - Audit the 10 Quakefile
--no-cachesites — now that the cache is sound, some may become unnecessary. Remove cautiously, commit separately. - Write the bug’s postmortem back into
docs/handoff/sys1-complete.md:92-ish area as a closed loop (“fix landed in commit X, see Y”).
Expected fix size: < 50 lines of code change, probably 20. The hard part was the diagnosis, which is done.
8. Dev-time gotchas
- The
.quartz-cache/dir persists across compiler rebuilds. When you runquake guard, a guard-internal step clears it (“Cleared .quartz-cache (compiler binary changed)”). But if you’re iterating withoutquake guard, stale cache from a prior run can mask or shift the repro. Alwaysrm -rf .quartz-cachebefore each repro test. quake guarduses--no-cacheinternally (see Quakefile lines 492, 514 — gen1 and gen2 compiles). So guard + fixpoint do NOT exercise the cache path. A passing guard is necessary but not sufficient — you MUST also test the cache path manually.- Don’t run the full
quake qspecfrom Claude Code (it hangs PTY per CLAUDE.md). Run single specs viaFILE=… quake qspec_file, andquake qspec_fileuses--no-cache— meaning it doesn’t exercise the cache either. To validate your fix, run the 4-compile native-IR reproducer manually. self-hosted/.fixpointtimestamps update on everyquake guard; don’t let timestamp churn scare you into thinking the binary changed. Checkbinary_hashline, notverified_at.- The compiler is at 2141 functions on
trunkat commit86e0f906. If your fix adds functions, guard will update that count — fine.
9. Acceptance criteria
A fix for this bug is done when:
- The 4-compile reproducer produces
llc-clean IR for the 4th file (grep -c "load i64, ptr %push"returns 0). quake guardpasses; fixpoint holds.- At least 3 existing specs compile cleanly via the direct
quartz | llc | clangpipeline on a warm cache — pickpostfix_try_spec,defer_spec,never_type_spec. - No regression in total compile time for the smoke-test suite (brainfuck + style_demo + expr_eval). This is a correctness fix, not a perf fix — but doubling compile time would warrant a second look.
- Commit message
[cache] Fix Tier 2 skip for unchanged modules missing cached fragmentsor similar. Cross-reference this handoff doc anddocs/handoff/sys1-complete.md:92.
That’s it. Everything the fresh session needs is in this doc plus the three file excerpts. The hard diagnostic work is done; the fix is a targeted 20-line patch with clear acceptance criteria.
Good hunting.