Next Session — Phase 3b.next + bonus targets
Baseline: dfc29c01 (ROADMAP Phase 3 update, post c1eb4fd4 Phase 3c)
Headline target: Self-compile peak RSS 12.47 GB → ~8 GB by closing the tc_function body-walk leak. If it’s the wins I expect, that’s another 30–35% reduction on top of Phase 3a+3c’s already-50.6% drop.
Scope: One focused session. Primary is Phase 3b.next; everything below it is a stretch.
Prime directive: D1 (highest impact). The 3 MB-per-function leak inside tc_stmt is the single biggest known lever in the compiler. Every other open item (#10–#13 of the roadmap stack rank) waits for either compute budget or context this session won’t have time to spend.
Pre-flight (≤ 5 min)
cd /Users/mathisto/projects/quartz
# 1. Verify baseline
git log --oneline -6
# Expected top 3 commits:
# dfc29c01 ROADMAP: Phase 3 progress
# c1eb4fd4 Phase 3c: gate egraph + lint
# c4086eea Phase 3b: investigation
git status # clean
./self-hosted/bin/quake guard:check # "Fixpoint stamp valid"
./self-hosted/bin/quake smoke 2>&1 | tail -6 # 4/4 + 22/22
# 2. Capture baseline memory measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
-I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
-I self-hosted/shared -I std -I tools \
self-hosted/quartz.qz > /dev/null 2>/tmp/mem_baseline.txt
grep '\[mem\]' /tmp/mem_baseline.txt
# Expected (post Phase 3c, 1985 functions):
# [mem] resolve: ~375 MB
# [mem] typecheck: ~6900 MB (+6500) ← THE TARGET DELTA
# [mem] mir: ~14400 MB (+7500)
# [mem] codegen: ~12500 MB peak (current ≈ peak)
# Wall time: ~21s
# 3. Fix-specific backup
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-mem3b-next-golden
ls -la self-hosted/bin/backups/quartz-pre-mem3b-next-golden
PRIMARY — Phase 3b.next: localize and fix the tc_function leak
What we already know (don’t re-investigate)
The work in c4086eea (Phase 3b investigation) established these facts. Trust them:
-
tc_free’s tracked Vecs only hold ~1.4 MB total (90k entries summed across alltc.<field>Vecs and nestedVec<Vec<Int>>registry tables). The historical “tc_free is a no-op” framing was correct but for the wrong reason — there’s nothing meaningful to free intc.<field>. -
The 6.5 GB “typecheck phase delta” lives inside
tc_function’s body walk. Bisected by commenting outtypecheck_walk::tc_function(tc, mod_ast_storage, func_handle)atself-hosted/quartz.qz:742. With the call disabled, typecheck delta drops from +6552 MB to +208 MB — a 6.3 GB difference. -
Inside
tc_function, the leak is intc_stmt(tc, ast_storage, body)atself-hosted/middle/typecheck_walk.qz:3308. Disabling that line alone reproduces the same 6.3 GB drop. So the leak isn’t in scope setup, parameter binding, borrow refinement, or impl-Trait inference — it’s in the recursive statement walker. -
At ~3 MB per function × 2140 functions, the leak is per-call and proportional to function-body complexity. Whatever it is, every call to
tc_stmt(recursive) is contributing. -
macOS libc malloc is a confounder, but not the spender. Even when
vec_freeis called, libc keeps freed pages in its arena andmem_release(which callsmalloc_zone_pressure_relief) returns 0 — meaning there’s nothing in the pool to release. The 6.5 GB is actively held, not “freed but pooled.” This is a real leak, not a libc artifact. -
mmap-backed Vec helpers don’t help. The
__qz_vec_alloc_data/_realloc_data/_free_datainfrastructure was attempted in this session and reverted. The malloc → mmap transition leaks the old buffer in libc’s pool, net-regressing peak by 500 MB. Don’t repeat that experiment unless you’re committing to a full Vec-architecture rewrite (page-aligned from day one). Out of scope this session.
Bisection plan
The body of tc_stmt is at self-hosted/middle/typecheck_walk.qz:1363 and runs ~1500 lines of if node_kind == NODE_X branches. The leak is in one or more of these branches. Bisect by disabling branches and re-measuring.
The bisection harness (the same shape as the Phase 3b experiment):
# After each edit, full self-compile measurement loop:
./self-hosted/bin/quake build 2>&1 | tail -3
./self-hosted/bin/quartz --no-cache --memory-stats \
-I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
-I self-hosted/shared -I std -I tools \
self-hosted/quartz.qz > /dev/null 2>&1 | grep '\[mem\] typecheck'
One full cycle is ~50 seconds. Budget 8–10 cycles.
Round 1 — node-kind bisection. Disable the body of each major handler in turn (replace its statement with a comment, leave the dispatch intact). Measure typecheck delta after each. The kinds to test, in priority order (most common first):
| Order | Node kind | File:line | Comment |
|---|---|---|---|
| 1 | NODE_LET | typecheck_walk.qz:1374 | Let bindings — most numerous in any function. Calls tc_expr for init, plus tc_parse_type for type annotations. Strong suspect. |
| 2 | NODE_BLOCK | typecheck_walk.qz:2832 | Each block runs each(stmts, stmt: Int -> tc_stmt(...)). The closure literal allocates per call. Recursive descent compounds. |
| 3 | NODE_EXPR_STMT | typecheck_walk.qz:2867 | Wraps an expression as a statement. Calls tc_expr once. |
| 4 | NODE_RETURN | (search) | Calls tc_expr on return value, then unifies with function return type. |
| 5 | NODE_IF | (search) | Calls tc_expr on condition + recurse into both branches. |
| 6 | NODE_FOR | (search) | Iterator inference, scope push/pop, body recursion. |
| 7 | NODE_WHILE | (search) | Condition + body recursion. |
| 8 | NODE_MATCH | (search) | Subject expression + pattern bind + arm recursion. |
Important: when you disable a handler body, the children that flow into it stop being walked. That can cascade — disabling NODE_BLOCK means NO inner statements get walked at all, which masks downstream leaks. So bisect CAREFULLY:
- Don’t disable handlers whose disable would cascade to skipping ALL recursion. NODE_BLOCK is the cascading one — leave it last.
- Start with leaf-y handlers: NODE_LET, NODE_EXPR_STMT, NODE_RETURN.
- A handler that drops the typecheck delta from 6.5 GB → 1 GB is “the spender.” It might be one big leak or several smaller ones.
Round 2 — within-handler bisection. Once you’ve narrowed to one or two leaking handlers, disable individual tc_expr / tc_parse_type / interpolated-string allocations / vec_new() calls inside that handler one at a time.
Round 3 — fix. Apply the fix in place. Likely shapes:
- Cache
tc_parse_typeresults in aMap<String, Int>keyed by annotation text. The function is called many times with duplicate strings ("Int","Vec<Int>","Map<String, Int>"); each call allocates a fresh substringVecand a tower of recursivetc_parse_typecalls. The cache makes the second call O(1) and skips all the substring allocation. Top candidate. - Eliminate per-NODE_BLOCK closure allocations. Replace
each(stmts, stmt: Int -> tc_stmt(tc, ast_storage, stmt))with a top-level helper function that takestc/ast_storageas captures. Or use a plainfor stmt in stmts; tc_stmt(...)loop. - Free
liveness::g_func_infosaftertc_functionconsumes the per-function info. It’s a write-once read-once side table; nothing reads it aftertc_functionreturns. Currently held forever. - Stop allocating fresh strings for
tc_tv_fresh_with_origindesc. The interpolated"let #{var_name}"allocates a new String per fresh type variable. Either intern-via-handle, or pass an origin enum tag instead of a string.
Verification (at each fix attempt)
# 1. Full self-compile measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
-I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
-I self-hosted/shared -I std -I tools \
self-hosted/quartz.qz > /dev/null 2>/tmp/mem_after.txt
grep '\[mem\]' /tmp/mem_after.txt
# 2. Quake guard (mandatory before commit)
./self-hosted/bin/quake guard 2>&1 | tail -8
# Expected: "Guard PASSED — fixpoint verified (1985 ± 5 functions)"
# 3. Smoke tests
./self-hosted/bin/quake smoke 2>&1 | tail -6
# 4. Regression specs (run in a single batch)
for spec in vec_element_type_spec builtin_arity_spec expand_node_audit_spec \
comparisons_spec const_generics_spec generic_inference_spec \
generic_field_access_spec generic_struct_init_spec; do
echo "=== $spec ==="
FILE=spec/qspec/${spec}.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -3
done
Success criteria
Minimum viable: Self-compile peak RSS ≤ 10 GB (was 12.47 GB; -20%). At least one regression spec for the fix.
Target: Self-compile peak RSS ≤ 8 GB (was 12.47 GB; -36%). Wall time ≤ 15s (was 21s; -29%). Regression spec.
Stretch: Self-compile peak RSS ≤ 6 GB (was 12.47 GB; -52%) IF the fix turns out to be a single-cause tc_parse_type cache and the savings cascade through tc_function’s call graph.
Failure mode: Bisection takes > 6 cycles without narrowing. If that happens, abandon the round and try a completely different angle — e.g. instrument mem_current_rss snapshots inside tc_function itself rather than disabling code. Or sample function names at the moment of biggest leak (largest functions like parser$ps_parse_postfix may dominate).
Commit shape
Single commit. Title: Phase 3b.next: <one-line root-cause summary>. Body: bisection trail (which handler/which line), root cause, fix, before/after --memory-stats. End with a Verified block listing guard + smoke + regression results.
STRETCH 1 — Resolver full scope tracking (#11)
Trigger: Only if Phase 3b.next lands cleanly with budget remaining (~30–45 quartz minutes).
What it is
Eliminates the UFCS module-name collision for local variables. The bug:
import value
def main(): Int
var value = 42 # local variable shadows the module name
return value.to_s() # currently breaks: resolver rewrites to value::to_s()
end
The fix from Apr 7 patched the parameter case (resolver checks param names before module rewrite). The general case — any local binding in scope — is still broken. The fix is straightforward: extend the resolver’s scope tracking to include all bindings, not just parameters.
Files
self-hosted/resolver.qz— UFCS rewrite logic, parameter-name checkspec/qspec/ufcs_module_collision_spec.qz— existing 3 tests for the param case; add 3 for the local case
Approach
- Find the resolver’s
value::to_s()rewrite site (probably in the AST walker that handlesNODE_FIELD_ACCESSorNODE_CALLwith module-prefix). - Locate the parameter-shadowing check added in the Apr 7 fix.
- Extend it to track all
NODE_LETbindings in the current function scope (not just params). - Add 3 regression tests: local var with same name as module, local var assigned multiple times, local var passed to function.
Verification
Run the existing ufcs_module_collision_spec.qz plus the 3 new tests. quake guard + quake smoke as always.
Commit shape
Resolver: track local-var shadowing across full scope (closes #11). Cite the Apr 7 fix as precedent.
STRETCH 2 — Roadmap cleanup
Trigger: ~5 minutes at the end of session, regardless of whether stretches landed.
Item 1: Mark qz-http as DONE in ROADMAP.md
The roadmap entry at docs/ROADMAP.md:186 still says ”✅ DESIGN LOCKED IN” for qz-http. The implementation has been done for months:
std/net/http_server.qz— 3821 LOC, full HTTP/1.1 router + middlewarestd/net/http2.qz— 765 LOC, HTTP/2 with HPACKexamples/qz-http/main.qz— uses the full router/middleware/route_param API- HTTP/2 server deployed on VPS (mattkelly.io)
Update the entry header to ✅ SHIPPED — deployed on mattkelly.io. Add a one-line note pointing at std/net/http_server.qz and examples/qz-http/main.qz. Move the “demo platform” check off the blocking list.
Item 2: Update item #19 (Compiler memory optimization) with whatever Phase 3b.next achieved
If you shipped a fix this session, update item #19 in docs/ROADMAP.md:84 with the new peak RSS number and a one-line cite of the commit.
Item 3: If Phase 3b.next did NOT ship a fix, expand the handoff
Update docs/handoff/next-session-compiler-memory-phase3b.md with whatever new bisection data you collected. Don’t lose the work.
Commit shape
ROADMAP: mark qz-http shipped + Phase 3b.next progress (or just qz-http shipped if no Phase 3b.next progress).
What is NOT in this session
These were on the broader stack rank but explicitly out of scope here. Don’t drift into them.
- Scheduler park/wake refactor (#10) — has its own full handoff at
docs/HANDOFF_PRIORITY_SPRINT.md. Substantial scheduler hot-path work. Own session. - Async Mutex/RwLock (#15) — blocked by scheduler refactor.
- Move semantics S2.5 holes — borrow checker work, separate domain.
- PSQ-2, PSQ-6, send/recv shadowing — small dogfooding fixes, can wait until after the memory work pays its dividend.
- Package manager (#21) — explicitly deferred per user direction.
- Stdlib narrative guide / launch docs — explicitly deferred per user direction.
If Phase 3b.next blows up (more than 6 bisection cycles without a result), don’t drift into the above. Document the new findings in docs/handoff/next-session-compiler-memory-phase3b.md, commit the findings doc, and end the session. The honest report is more valuable than a forced shallow fix.
Prime directives reminder (v2)
- D1 (highest impact): This session’s only justification is closing the per-function leak. Don’t bikeshed on the bisection technique; pick a method, run it, narrow the search space. If the first cycle is inconclusive, switch methods.
- D2 (research first): rustc’s typeck has a per-fn
LocalDefIdMapthat’s dropped after each function. Go’stypes2releases per-package state. Both languages explicitly avoid the per-function accumulation pattern this session is fixing. Worth 5 minutes of reading rustc’stypeck/src/check/wfcheck.rsif you hit a wall. - D3 (pragmatism ≠ cowardice): Caching
tc_parse_typeis pragmatic. Lowering the entire Vec allocator to mmap is cowardice (the wrong fix at the wrong layer — already proven to net-regress). - D4 (work spans sessions): If 6 bisection cycles aren’t enough, don’t force it. Hand off cleanly.
- D5 (report reality): The minimum viable target is 10 GB. The aspirational is 8 GB. Don’t ship a fix and claim 8 GB if you measured 10. Don’t ship a partial fix that “should work” without verification.
- D6 (holes get filled or filed): Any side discoveries (other leaks, parser bugs, codegen oddities) get a one-line entry in the next handoff or in
docs/ROADMAP.mdopen-issues table. - D8 (binary discipline):
quake guardbefore every commit. Fix-specific backup before touchingself-hosted/*.qz. Don’t skip smoke.
Session-end checklist
# What must be true before you sign off, regardless of what landed:
./self-hosted/bin/quake guard:check # stamp valid
./self-hosted/bin/quake smoke # 4/4 + 22/22
git log --oneline -5 # commits land at the top
git status # working tree clean
# If you shipped a Phase 3b.next fix:
./self-hosted/bin/quartz --no-cache --memory-stats \
-I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
-I self-hosted/shared -I std -I tools \
self-hosted/quartz.qz > /dev/null 2>/tmp/mem_final.txt
grep '\[mem\]' /tmp/mem_final.txt
# Confirm the peak RSS number you put in the commit message matches reality.
Pointers (for the bisection)
tc_stmtbody:self-hosted/middle/typecheck_walk.qz:1363tc_function:self-hosted/middle/typecheck_walk.qz:3198tc_expr(top-level dispatcher): search^def tc_expr\bintypecheck_walk.qztc_parse_type:self-hosted/middle/typecheck.qz:929tc_tv_fresh_with_origin:self-hosted/middle/typecheck_util.qz:1581tc_tv_reset:self-hosted/middle/typecheck_util.qz:1851liveness::g_func_handles/g_func_infos:self-hosted/middle/liveness.qz:94-96liveness::analyze_all: called fromself-hosted/quartz.qz:580- Per-function loop in compile():
self-hosted/quartz.qz:682(thefor i in 0..func_countloop) - Phase 3b investigation handoff (PRIOR, READ THIS FIRST):
docs/handoff/next-session-compiler-memory-phase3b.md - Phase 3b commit:
c4086eea - Phase 3c commit:
c1eb4fd4 - Phase 3a commit:
889a758d