Phase 3b Investigation: tc_free is a no-op for a different reason than expected
Status: Investigation complete. No code-path fix shipped this session — the original Phase 3b plan (“free TC state after typecheck, free AST after MIR”) turns out to be aimed at the wrong target. The actual leak is elsewhere, and fixing it requires a different approach (next session).
What the original handoff assumed:
- tc_free is a no-op because libc holds freed pages in its arena.
- Adding
ast_freeafter MIR lowering will drop ~7 GB of AST data. - Together these cut peak RSS by 5–10 GB.
What we actually found:
tc_free’s coverage is only ~300 KB
Instrumented tc_dump_sizes(tc) to sum the entry counts of every Vec field
on TypecheckState (including nested Vec<Vec<Int>> registry tables).
Result on a full self-compile:
[mem] tc Vec entries (sum of sizes): 18276
[mem] tc nested Vec<Vec> entries: 72001
[mem] tc registry: structs=47 funcs=2140 traits=6 ptypes=106
[mem] tc interner.strings: 3042
[mem] global interner.strings: 32964
Total tracked entries: ~90k. At ~16 bytes per entry that is ~1.4 MB. The “typecheck adds 6.5 GB” delta we have been chasing is NOT IN tc_free’s covered Vecs. Even fixing every page-allocation issue under tc_free would recover at most a few MB.
The 6.3 GB lives in tc_function’s body walk
Bisected with a definitive test: commenting out the call to
typecheck_walk::tc_function in the per-function loop in
self-hosted/quartz.qz (line 742) drops the typecheck phase delta from
+6552 MB to +208 MB, a 6.3 GB difference.
Then commenting out only the tc_stmt(tc, ast_storage, body) call inside
tc_function reproduces the same 6.3 GB drop. So the leak lives inside
the tc_stmt body walk (or its callee tc_expr), not in scope/registry
bookkeeping or borrow-summary refinement.
Across 2140 functions, that is ~3 MB per function. None of it ends up in tc-tracked Vecs by the time the loop exits, which means it is in:
- Global state mutated during the walk.
liveness::g_func_handles/g_func_infos(per-function liveness info structs, never freed) — partial candidate but probably <1 GB total. - AST mutations.
ast_set_str1/ast_set_str2interns strings into the global interner, but the interner only holds 33k strings (~1 MB). - Per-walk closure allocations.
each(stmts, stmt: Int -> tc_stmt(...))in the NODE_BLOCK handler allocates a fresh closure environment on every block walked. ~32 bytes each, but blocks are dense — bounded above ~10 MB total though, not 6 GB. - Substring/type-name allocations in
tc_parse_type. Each call totc_parse_typeallocatesVecs and substring slices that are never freed. The per-call leak is small (~tens of bytes), but tc_parse_type is called O(node-count) times during the walk. Plausible candidate. tc_tv_fresh_with_origindesc strings. Each fresh type variable created during inference allocates a new String for its origin description (e.g."let x").tc_tv_resetpops the entries between functions but never frees the underlying String objects.
The real culprit is almost certainly multiple of the above accumulating
silently. The investigation needs to keep bisecting inside tc_stmt /
tc_expr to identify the dominant allocator.
macOS libc hides the problem
Even when we DO free things via vec_free, libc’s malloc keeps the pages in
its arena rather than returning them to the OS. Verified in C:
void *p = malloc(400 * 1024 * 1024); // 400 MB
memset(p, 0xab, 400 * 1024 * 1024); // RSS = 401 MB
free(p); // RSS = 401 MB (still!)
malloc_zone_pressure_relief(NULL, 0) returns 0 — there is nothing pooled
to release. The pages are tracked in libc’s free lists and not handed back
until the process exits.
madvise(MADV_FREE) and madvise(MADV_FREE_REUSABLE) on malloc-allocated
memory also do not drop RSS — verified empirically. The only way to drop
RSS on macOS for a freed block is to allocate via mmap directly and free
via munmap. mimalloc linked via DYLD_INSERT_LIBRARIES has the same
problem.
Why mmap-backed Vec helpers were attempted and reverted
We built __qz_vec_alloc_data / __qz_vec_realloc_data / __qz_vec_free_data
runtime helpers that route allocations >= 64 KB through mmap (with a hidden
16-byte size prefix) and free them via munmap. Verified to work for large
single-vec allocate-and-free: malloc 400 MB then free leaves RSS at 401;
the mmap-backed equivalent leaves RSS at 1.
But the helpers had a transition leak: when a Vec grows past the 64 KB
threshold via vec_grow, the OLD malloc-backed buffer is freed via libc
(stays in libc’s pool, never released to OS), and a NEW mmap-backed buffer
is allocated. Net peak excursion: +500 MB on the self-compile, with no
corresponding drop because the malloc-pool half is unreclaimable. Reverted.
The right fix would be to make Vec’s data buffer mmap-backed FROM THE START for any vec that might grow large (i.e. lower the threshold to ~one page, 0 transition leaks, but pay page-padding overhead per small vec). This is a Vec architecture change and was out of scope for this investigation.
What shipped this session
-
mem_releaseintrinsic (mem_release(): Int)- macOS: calls
malloc_zone_pressure_relief(NULL, 0). - Linux: calls
malloc_trim(0). - WASI: returns 0.
- Returns the number of bytes the runtime claims to have released. (On macOS this routinely returns 0 because the pool is empty; on Linux malloc_trim actually does work for the small-block heap.)
- Useful to call after batch-free phases. Currently a no-op on macOS,
useful on Linux, designed for future
__qz_vec_*_dataintegration.
- macOS: calls
-
This handoff document.
No changes to vec/sb/string allocators. No changes to the AST. No changes to tc_free’s coverage. The Phase 3b mmap-helper experiment was reverted to keep the self-compile baseline clean (no regression).
Next session — Phase 3b.next
Goal: localize the 3 MB-per-function leak inside tc_stmt / tc_expr
and fix it.
Bisection plan
Inside tc_stmt (typecheck_walk.qz line 1363 onwards):
-
First narrow by node kind. tc_stmt has dozens of
elsif node_kind == NODE_Xbranches. Pick the most common kinds — NODE_LET, NODE_EXPR_STMT, NODE_BLOCK, NODE_RETURN, NODE_IF, NODE_FOR, NODE_WHILE — and disable them one at a time. Whichever removal drops the typecheck phase delta the most is the leak source. -
Inside the leaking handler, look for:
vec_new()calls that don’t have a pairedvec_free.- String concatenation via
#{...}interpolation in hot paths. ast_set_*calls that intern into the global interner.- Calls to
tc_parse_type(which allocates substring Vecs/slices). - Closures created via lambda literals.
-
Verify with
tc_dump_sizes. After the fix, run with--memory-statsand check that the typecheck phase delta drops by the expected amount.
Likely actual fixes
- Cache type parses.
tc_parse_typeis called O(node-count) times with many duplicate annotation strings. AMap<String, Int>cache from annotation → resolved type ID would make the lookup O(1) and eliminate allocation. - Free liveness info per function after typecheck consumes it. The
g_func_infosglobal vec accumulates per-function tables; each one is read once duringtc_functionand never again. - Reuse closures. The block-walking pattern
each(stmts, stmt -> tc_stmt (tc, ast_storage, stmt))should not allocate a fresh closure per call. Either use a top-level helper function (no captures) or pre-create one closure and reuse it.
What success looks like
- Self-compile peak RSS: 15 GB → ~10 GB (50% reduction from current).
- typecheck phase delta: 6.5 GB → <1 GB (most of the 6.3 GB recovered).
- mir phase delta: likely also drops because mir works on top of the typecheck baseline.
What is NOT in scope for the next session
- Vec mmap-backing rewrite. Out of scope until we hit a leak that is purely in libc’s pool (current bottleneck is upstream of that).
- ast_free. The handoff originally targeted ast_free but the AST is at most ~370 MB based on the resolve-phase total — even fully freeing it only saves ~370 MB.
- Phase 3c (@cfg gating). Tracked separately. Worth doing if Phase 3b.next ships.
Pointers
self-hosted/quartz.qz— main pipeline. Line 682for i in 0..func_countis the per-function tc_function loop.mem_report("typecheck")at line 759.self-hosted/middle/typecheck_walk.qz—tc_function(line 3198),tc_stmt(line 1363),tc_expr(search file).self-hosted/middle/typecheck.qz—tc_parse_type(line 929).self-hosted/middle/typecheck_util.qz—tc_tv_reset,tc_tv_fresh_with_origin,tc_free.self-hosted/middle/liveness.qz—g_func_handles/g_func_infos.self-hosted/backend/codegen_runtime.qz—__qz_mem_releaseat the appropriate spot.
Prime directives this session
- D1 (highest impact): the original Phase 3b plan would not have moved the needle — the data is not in tc_free’s covered Vecs. Discovering this is load-bearing; pursuing the wrong fix would have wasted the next session too. The bisection cost was justified.
- D2 (research first): validated the libc pool / madvise / mimalloc / mmap behavior empirically before committing to any design.
- D5 (report reality): shipping the investigation honestly without a fix, rather than shipping a “looks like it does something” mmap helper that net-regressed self-compile peak by 500 MB.
- D6 (holes get filled or filed): the leak is filed here with bisection data and a continuation plan.
Phase 3b.next Session — CLOSED (Apr 15, 2026)
Status: ✅ FIX SHIPPED. Root cause found via multi-level bisection and fixed. Typecheck phase 5401 MB → 737 MB (-86%). Peak RSS 12.47 GB → 7.81 GB (-37%). Wall time 21.0s → 18.3s (-13%). Session target of 8 GB beaten.
Root cause (for the next handoff reader): 9 sites in
typecheck_registry.qz (tc_lookup_function_return, ..._return_annotation,
..._params, ..._param_fn_ann, tc_find_matching_overload, tc_lookup_struct,
tc_lookup_enum, tc_lookup_trait, tc_lookup_function_index) implemented the
suffix-fallback scan using str_byte_slice(name, len-suf, len).eq(suffix).
That pattern allocates a fresh substring for every candidate in the scan —
roughly 2100 substrings (one per registered function) per fallback call.
The fallback fires whenever a function is referenced without its module prefix, which is the normal case for module-internal calls. In the self-hosted compiler, about 10% of NODE_CALL typechecks hit the fallback (the other 90% resolve through the primary intern-id match). That’s 67k calls × 10% × 2100 substring allocs = 14 million allocations × ~30 bytes each = ~420 MB per typecheck run, amplified by libc’s inability to return pages across repeated calls.
Fix: replace all 9 sites with name.ends_with(suffix). The runtime’s
qz_str_ends_with uses memcmp — zero allocations. The name_len >= suffix_len
guard is left in place (harmless dead code, removed as part of the edit for
cleanliness is a separate style pass). Confirmed by measurement: typecheck
delta dropped from +5401 MB to +737 MB in a single build-measure cycle.
Bisection trail (for future similar problems)
This took 4 levels of bisection to find. Worth documenting because the techniques generalize.
Level 1 — per-function RSS delta. Instrumented the tc_function loop
in quartz.qz to print RSS delta per function > 8 MB. Found 153 functions
leaking, distributed roughly proportional to body size. Ruled out “one hot
function is leaking” — the leak was diffuse.
Level 2 — per-node-kind delta inside tc_stmt and tc_expr. Wrapped
tc_stmt and tc_expr with entry/exit RSS snapshots, bucketing totals by
NODE_X kind. Found NODE_CALL dominant at 5.76 GB / 69k calls (~87 KB/call).
NODE_IDENT was 15 bytes/call — ruling out borrow/scope/move checks as the
culprit. Localized the leak to tc_expr_call.
Level 3 — internal checkpoints in tc_expr_call. Added 7 RSS checkpoints
through tc_expr_call’s ~1800-line body. Found:
- UFCS rewrite block: 3 KB/call (2.9%) — small
- Function lookup / open UFCS / arity resolution: 26 KB/call (32%) — big
- Main arg walk loop: 11 KB/call (13%)
- Borrow/lambda/trait/return/container/linear/move section: 42 KB/call (51%) — big
Gotcha: checkpoints declared with var X = 0 inside conditional blocks get
reassigned with X = mem_current_rss() only if the branch runs. For branches
that don’t run, the uninitialized delta yields astronomic bogus numbers.
Declare all checkpoint vars at function top, reassign (not re-declare) inside
branches. Takes one false-positive cycle to learn.
Level 4 — drill into the arity-check section. Added two more checkpoints
splitting Open UFCS (~1 MB, negligible) vs arity resolution. Then disabled
tc_function_param_count and tc_function_required_count entirely — typecheck
dropped from 5334 MB to 4883 MB (-451 MB). That gave a concrete target: the
arity lookups were leaking 6.6 KB/call. Examined their source →
tc_lookup_function_params → suffix fallback → str_byte_slice(...).eq(suffix)
→ found the allocation.
What shipped this session
- 9-site
ends_withfix intypecheck_registry.qz(the main fix, -4.6 GB). tc.parse_type_cachefield — negligible immediate win (~12 MB) but the cache is in place for when tc_parse_type becomes the next hot path.- First-arg cache in
tc_expr_call— on-demand_fa_cache_type/_fa_cache_computedlocals that thread the first-arg type through the UFCS resolution cascade. Saves ~90 MB by eliminating redundanttc_exprwalks ofargs[0]. Invalidated aftertc_try_resolve_map_key_value_typeswhich mutates the binding type and thus the first-arg type. - Updated ROADMAP #19 with the new numbers and root-cause cite.
- This handoff closing out the investigation.
What’s left on the table (for a future session if needed to hit 6 GB)
From the Level 3 bisection data:
tc_expr_callpost-arg-walk section: still leaks 42 KB/call (~2.8 GB total of the remaining leak). Candidates: per-calltc_lookup_borrow_summary, lambda arity validation loops, trait bound validation, return-type lookup, container type propagation. Each may have smaller but cumulative allocations.- MIR phase still leaks 6.45 GB. Not touched this session — Phase 3b.next was scoped to typecheck. The MIR lowering pass has its own allocation patterns; a similar bisection there would be the next lever.
- Linux path not re-measured. This session was macOS-only. Linux is presumably also much better given the same code path.
Stretch: drilling tc_expr_call further could push peak RSS to ~6 GB. MIR phase optimization is the next big lever after that.
Phase 3b.next Session — PRIOR findings (Apr 15, 2026)
Status: Primary fix target missed. tc_parse_type cache landed but only saved ~12 MB (0.2%) — the tc_parse_type hypothesis was wrong. Deeper bisection narrowed the leak to NODE_CALL (5.7 GB of the 5.4 GB net typecheck delta, ~87 KB average allocation per NODE_CALL through tc_expr_call). Next session should drill into tc_expr_call’s sub-steps rather than guess from the outside.
What this session measured — trustworthy data
1. Per-function distribution (size-proportional)
Instrumented the per-function loop in quartz.qz:685 to print RSS delta for
every tc_function call exceeding 8 MB. Top leakers on a full self-compile:
[mem-fn] +136 MB: mir_lower$mir_lower_expr
[mem-fn] +134 MB: lexer$lexer_tokenize
[mem-fn] +114 MB: mir_lower_expr_handlers$mir_lower_call
[mem-fn] +93 MB: typecheck_expr_handlers$tc_expr_call
[mem-fn] +87 MB: mir_lower$mir_lower_actor_poll
[mem-fn] +73 MB: typecheck_walk$tc_stmt
[mem-fn] +68 MB: compile
[mem-fn] +67 MB: mir_lower$mir_lower_all
[mem-fn] +63 MB: typecheck$tc_parse_type
[mem-fn] +59 MB: typecheck_util$tc_free
[mem-fn] +56 MB: mir_lower_expr_handlers$mir_lower_concurrency
...153 total functions leaked >8 MB each...
The leak is size-proportional to the function body being checked. Big functions (1000+ LOC) leak 100+ MB each during their tc_function call. Small functions leak <8 MB. The cumulative distribution matches the 5.4 GB net typecheck delta.
2. Per-node-kind distribution inside tc_stmt
Instrumented tc_stmt with a per-kind RSS delta bucket:
| Node kind | Count | RSS delta | Per-stmt avg |
|---|---|---|---|
| NODE_BLOCK (32) | 13,132 | +14,041 MB | ~1 MB/stmt (composite — includes descent) |
| NODE_IF (23) | 8,986 | +9,749 MB | ~1 MB/stmt (composite) |
| NODE_EXPR_STMT (29) | 50,091 | +2,372 MB | ~47 KB/stmt (leaf) |
| NODE_LET (25) | 15,117 | +1,411 MB | ~93 KB/stmt (leaf) |
| NODE_FOR (44) | 1,008 | +1,457 MB | ~1.4 MB/stmt |
| NODE_WHILE (24) | 418 | +795 MB | ~2 MB/stmt |
| NODE_RETURN (22) | 5,065 | +211 MB | ~40 KB/stmt |
The composite statements’ delta includes their recursive descent cost. The true leaf contributors are NODE_EXPR_STMT, NODE_LET, and NODE_RETURN, each leaking ~40–95 KB per statement.
3. Per-node-kind distribution inside tc_expr
Instrumented tc_expr the same way. The signal is overwhelming:
| Expr kind | Count | RSS delta | Per-call avg |
|---|---|---|---|
| NODE_CALL (9) | 69,186 | +5,764 MB | ~87 KB/call |
| NODE_INTERP_STRING (43) | 1,088 | +450 MB | ~413 KB/call (descent cost) |
| NODE_BINARY (8) | 14,117 | +731 MB | ~52 KB/call |
| NODE_IDENT (6) | 209,691 | +3 MB | 15 bytes/call (trivial!) |
| NODE_UNARY (7) | 349 | +32 MB | ~93 KB/call |
| NODE_ARRAY (10) | 184 | +11 MB | ~60 KB/call |
| NODE_INDEX (13) | 4,369 | +2 MB | ~480 bytes/call |
| NODE_STRUCT_INIT (14) | 67 | +10 MB | ~150 KB/call |
| NODE_FIELD_ACCESS (15) | 9,920 | +9 MB | ~900 bytes/call |
NODE_CALL is the dominant allocator at 5.76 GB (more than the 5.4 GB net delta — the over-count is libc oscillation: RSS grows and shrinks across the tc_expr calls, but the NET held memory is 5.4 GB). NODE_IDENT is surprisingly cheap at 15 bytes per call, which rules out the borrow-check and scope-lookup hot path. NODE_FIELD_ACCESS is also cheap.
Conclusion: the leak is concentrated in tc_expr_call
(typecheck_expr_handlers.qz:1319) — NOT distributed across tc_expr’s
dispatcher branches. Next session’s bisection should start there.
4. Hypotheses that were tested and failed
- tc_parse_type cache (wrapper around tc_parse_type_impl keyed on annotation string, skipping forall/impl). Saved ~12 MB of 5,401 MB (0.2%). Confirms tc_parse_type is not called often enough to matter. The cache is KEPT in the commit as a small but measurable win and future groundwork — when tc_parse_type does become a hot path (e.g. after tc_expr_call is fixed and its mass gets out of the way), the cache will already be there.
- Replacing
each(stmts, stmt -> tc_stmt(...))with plain for loops in NODE_BLOCK and NODE_INTERP_STRING. Zero change in typecheck delta. Closure allocation is not a contributor at this scale. - NODE_IDENT safety-check disable (would have tested per-ident borrow/ linear/move/partial-move checks). NOT tested this session because the 15 bytes/call signal already ruled out NODE_IDENT before I got to it.
What SHIPPED this session
tc_parse_typecache —tc.parse_type_cache: Map<String, Int>, populated for all scope-independent annotations (skippingforallandimpl). ~12 MB savings measured. Kept because it’s a cheap, correct, reusable improvement that will pay off more once the NODE_CALL leak is fixed and tc_parse_type becomes relatively more important.- This updated handoff. Full bisection data for next session.
Not shipped: a fix for the actual leak. Per the handoff’s failure-mode guidance (>6 bisection cycles without conclusive narrowing), continuing to guess at the fix this session would have been cowardice masquerading as pragmatism.
Next session — Phase 3b.next.2 — drill into tc_expr_call
Primary target
typecheck_expr_handlers.qz:1319 — the tc_expr_call function. ~600 lines,
handles all function call typing including UFCS rewrite, trait method
resolution, arity validation, argument type checking, named args, default
values, type parameter inference.
Suggested bisection order
-
First pass — disable whole sub-blocks and measure. Add
return type_constants::TYPE_INTat strategic points and measure typecheck delta. In order:- (a) Return TYPE_INT right after
var func_name = ast::ast_get_str1(...). Skips everything. This establishes the baseline “how much do the children of a NODE_CALL contribute vs. tc_expr_call itself” — i.e. separates descent cost from tc_expr_call’s OWN cost. - (b) Keep arg walking but disable UFCS rewrite (lines 1715–1960+).
If delta drops significantly, UFCS rewrite is the leak. Strong
candidate: UFCS calls
tc_try_resolve_vec_element_typeandtc_try_resolve_map_key_value_typeswhich do recursive re-walks of the first arg. - (c) Keep UFCS rewrite but disable
tc_try_resolve_*. These re-walk args redundantly. - (d) Disable the arity-check loop and the type parameter inference blocks.
- (a) Return TYPE_INT right after
-
Second pass — within the leaking sub-block, look for:
- Allocations proportional to argument count (most function calls have 1–5 args, so a per-arg allocation × 69k calls × 3 args avg = 207k allocations).
str_byte_slicecalls that materialize substrings.vec_new()calls that don’t have pairedvec_free.- Interpolated error strings
"#{...}"in the hot path (unlikely — errors are rare). ast_set_str1/str2mutations that intern new strings.
-
Third pass — fix in place. Likely shapes:
- Deduplicate arg walks.
tc_try_resolve_vec_element_typeetc. walkfirst_argwhich was ALREADY walked at line 1721. Caching the per-node type via aMap<AstNodeId, Int>or annotating the AST with the resolved type would eliminate the second walk. - Build the mangled name lazily: tc_mangle always allocates
"#{struct_name}$#{func_name}"even when the lookup will fail. Cache with aMap<(String, String), String>or intern it. - Stop re-checking arg types across the UFCS rewrite. The rewrite
changes
func_namebut the arg types don’t change.
- Deduplicate arg walks.
Pre-flight checklist
cd /Users/mathisto/projects/quartz
git log --oneline -6 # verify Phase 3b.next progress commit is on top
./self-hosted/bin/quake guard:check
./self-hosted/bin/quake smoke 2>&1 | tail -6
# Capture baseline (with tc_parse_type cache):
./self-hosted/bin/quartz --no-cache --memory-stats \
-I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
-I self-hosted/shared -I std -I tools \
self-hosted/quartz.qz > /dev/null 2>/tmp/mem_baseline.txt
grep '\[mem\]' /tmp/mem_baseline.txt
# Expected (post-cache): typecheck ~5389 MB, peak ~12438 MB
# Fix-specific backup:
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-mem3b-next2-golden
Instrumentation you can reuse
These are deleted as part of the Phase 3b.next commit, but here is the shape for quick re-addition:
# Add near top of typecheck_walk.qz:
var _mem_bisect_totals = vec_new<Int>()
var _mem_bisect_counts = vec_new<Int>()
def _mem_bisect_record(kind: Int, delta: Int): Void
while _mem_bisect_totals.size <= kind
_mem_bisect_totals.push(0)
_mem_bisect_counts.push(0)
end
_mem_bisect_totals[kind] = _mem_bisect_totals[kind] + delta
_mem_bisect_counts[kind] = _mem_bisect_counts[kind] + 1
end
def _mem_bisect_report(): Void
var i = 0
while i < _mem_bisect_totals.size
var total = _mem_bisect_totals[i]
if total > 1048576
eputs("[kind #{i}] +#{total / 1048576} MB / #{_mem_bisect_counts[i]} stmts")
end
i += 1
end
end
# Wrap tc_expr via an _tc_expr_dispatch helper:
def tc_expr(tc: TypecheckState, ast_storage: ast::AstStorage, node: AstNodeId): Int
if node < 0
return type_constants::TYPE_ERROR
end
var _k = ast::ast_get_kind(ast_storage, node)
var _r = mem_current_rss()
var _result = _tc_expr_dispatch(tc, ast_storage, node)
_mem_bisect_record(_k + 1000, mem_current_rss() - _r)
return _result
end
# ...and call _mem_bisect_report() right before mem_report("typecheck").
Context for why tc_expr_call is likely the leak
tc_expr_call uniquely among tc_expr handlers:
- Does SEVERAL recursive tc_expr calls on args (primary + UFCS rewrite’s first-arg re-walk + tc_try_resolve_*‘s walks).
- Mutates the AST (
ast_set_str1,ast_set_str2,ast_set_kind) — these intern into the global string interner. - Calls tc_parse_type on type_arg (cached now, but used to be hot).
- Does fuzzy name lookups into the function registry (string compares).
- Does UFCS container-type inference via
tc_try_resolve_vec_element_type/tc_try_resolve_map_key_value_typeswhich are non-trivial helpers.
All of these compound. A single NODE_CALL can trigger 5+ tc_expr calls (one per arg, one receiver re-walk, plus resolve helpers) and multiple string allocations.
Don’t drift into
- Scheduler park/wake refactor
- Async Mutex/RwLock
- Any unrelated roadmap items
Stay focused on tc_expr_call. One session. Fix it or document progress.