.size Hazard Audit
Scope: systematic audit of .size (field access, no parens) across self-hosted/**/*.qz, std/**/*.qz, tools/**/*.qz, and high-signal spec files. Objective: enumerate every site where .size resolves against an Int-typed receiver that actually holds a Vec/Map/Set/String handle at runtime, which produces a silent read of offset 0 (capacity) or the first field of an unrelated struct with a literal size field.
Date: 2026-04-12
Summary
| Category | Count |
|---|---|
Total .size sites examined (self-hosted + std + tools) | ~260 |
| Confirmed BUG sites (Int-typed receiver holding a Vec/Map/etc handle) | 3 in production code, 9 in spec files |
SAFE sites (receiver is Vec<T> / Map<K,V> / String / user struct with literal size field) | ~248 |
| UNKNOWN / needs deeper trace | 0 |
| Sites already fixed today (context) | lint.qz:113/1807/2156, codegen.qz:892 (cg_emit_extern_declarations) |
The confined blast radius is explained by two partial mitigations already in place:
- TC ptype override in
self-hosted/middle/typecheck.qzlines 273–277 upgradesstr_split,str_chars,String$split,String$charsfrom rawTYPE_INTto theVec<String>ptype, so the.sizerewrite intc_expr_field_accesstriggers. (Note:str_charsshould beVec<Int>notVec<String>— see follow-up bugs below.) - MIR rescue in
self-hosted/backend/mir.qzline 2396mir_intrinsic_return_type()maps known intrinsic names to container annotations, andmir_lower_stmt_handlers::mir_lower_stmt_letcalls it to tagvar x = intrinsic(...)bindings with the rightmir_ctx_mark_struct_vartype so the MIR-level field-access handler can resolve downstream. This rescue only fires for intrinsics present in the list — and even then it only rescues the directNODE_LETbinding, not chained results or user functions.
Everything not covered by (1) or (2) relies on the typechecker’s tc_infer_expr_type_annotation walker, which requires either an explicit type annotation on the binding or a call whose callee is a user function with a registered return annotation. Builtins are absent from tc.registry.func_return_annotations, so they get no string annotation, only a kind.
Root-Cause Anatomy
The bug is a gap between three concurrent type-resolution systems:
| Layer | What it knows | What it doesn’t |
|---|---|---|
TC (typecheck_builtins.qz) | str_split → TYPE_INT (or PTYPE_VEC<String> if overridden) | No string-level annotation for builtins |
TC annotation inference (typecheck_generics.qz::tc_infer_expr_type_annotation) | User function return annotations, explicit bindings | Builtin return annotations (returns "") |
MIR (mir.qz::mir_intrinsic_return_type) | A hand-maintained map of ~30 intrinsic → annotation | Any intrinsic not in the list; anything transitive |
When TC sees x.size where x: TYPE_INT:
tc_expr_field_access(lines 979–1006) checkstc_base_kind(object_type) == TYPE_VEC/STRING/MAP/SET. ForTYPE_INTit does NOT match. ptype overrides rescue the 4 string-container builtins; nothing else.- Annotation fallback at lines 945–956 tries
tc_scope_lookup_annotationthentc_infer_expr_type_annotation. Forvar x = readdir(p)with no explicit: Vec<String>, both return""becausereaddir’s callee path intc_infer_expr_type_annotationcallstc_lookup_function_return_annotationwhich only searchestc.registry.func_names(user functions, not builtins). - Falls through to
tc_resolve_expr_struct_namewhich also returns"". Theresult = TYPE_INTbranch at line 1023 is taken and the AST node remains aNODE_FIELD_ACCESS. - MIR
mir_lower_expr_handlers::mir_lower_exprhits the NODE_FIELD_ACCESS path inmir_lower.qz:1615. It callsmir_infer_expr_type(base), which for anIDENTconsultsmir_ctx_get_var_type. That was set atNODE_LETtime bymir_intrinsic_return_type— but only for intrinsics present in the hand-rolled map. - When all inference fails, MIR goes to
mir_find_field_globally("size"). This iterates the struct registry and returns the first registered struct that contains a literal field namedsize, using that struct’s field index. If no match, the final fallback ismir_emit_load_offset(base, 0)— which reads the Vec header’scapacityslot at offset 0 (the Vec layout is[capacity, size, data_ptr]).
So the observed effect is either “capacity instead of size” or “some unrelated struct field at a random offset” depending on what structs exist in the current link.
BUG Sites (Production Code)
| File:line | Current code | Proposed fix | Risk / Path |
|---|---|---|---|
/Users/mathisto/projects/quartz/tools/doc.qz:76 | var entries = readdir(root) / var count = entries.size | var count = vec_size(entries) | HIGH — quartz doc tool. readdir registered as TYPE_INT in typecheck_builtins.qz:903; not in mir_intrinsic_return_type. Symptom: doc generator walks the wrong number of entries or segfaults. |
/Users/mathisto/projects/quartz/tools/doc.qz:87 | var sub_entries = readdir(full_path) / if sub_entries.size > 0 | if vec_size(sub_entries) > 0 | HIGH — same root cause. Causes directory recursion to take the wrong branch. |
/Users/mathisto/projects/quartz/tools/doc.qz:76 already counted. |
(Only three sites in production code — the rest of production tools/std/self-hosted code is SAFE because it either uses vec_size() explicitly or declares receivers as Vec<T>, which routes .size through the TC rewrite.)
BUG Sites (Spec Files — latent, depend on future stdlib impls)
spec/qspec/collection_stubs_spec.qz tests stdlib functions that do not exist yet (per docs/ROADMAP.md:496). These tests do not run green today. Once the functions are implemented, the tests will trip the .size bug unless the ptype override list is extended first.
| File:line | Current code | Proposed fix | Risk |
|---|---|---|---|
spec/qspec/collection_stubs_spec.qz:171 | var pairs = enumerate(v) / assert_eq(pairs.size, 3) | assert_eq(vec_size(pairs), 3) — or fix enumerate ptype in TC | LOW (stub) |
spec/qspec/collection_stubs_spec.qz:183 | var pairs = enumerate(v) / assert_eq(pairs.size, 0) | same | LOW |
spec/qspec/collection_stubs_spec.qz:198 | var zipped = zip(a, b) / assert_eq(zipped.size, 3) | assert_eq(vec_size(zipped), 3) | LOW |
spec/qspec/collection_stubs_spec.qz:214 | var zipped = zip(a, b) / assert_eq(zipped.size, 2) | same | LOW |
spec/qspec/collection_stubs_spec.qz:228 | var parts = partition(...) / var matching = parts[0] / assert_eq(matching.size, 2) | assert_eq(vec_size(matching), 2) | LOW |
spec/qspec/collection_stubs_spec.qz:229 | var non_matching = parts[1] / assert_eq(non_matching.size, 2) | assert_eq(vec_size(non_matching), 2) | LOW |
spec/qspec/set_ufcs_spec.qz:108 | var members = set_members(s) / if members.size == 3 | if vec_size(members) == 3 — or fix set_members TC registration | MEDIUM — set_members is registered as TYPE_INT at typecheck_builtins.qz:487 while Set$members at line 570 is TYPE_VEC. Non-UFCS callers hit the bug; UFCS callers don’t. |
Line 120 of set_ufcs_spec.qz (s.members().size) is SAFE because the TC dispatches to Set$members which returns TYPE_VEC, triggering the rewrite.
UNKNOWN Sites
None. Every .size site I examined in self-hosted/std/tools production code resolves to one of: (a) Vec<T>-typed binding, (b) user struct with literal size field, (c) String, or (d) one of the already-patched ptype intrinsic returns.
Dangerous Function Inventory
These functions either return : Int at the signature level, or are registered as TYPE_INT builtins, but at runtime their return value is a Vec/Map/Set/string container handle. Any .size access on a variable bound from one of these is a latent bug unless explicit type-annotation rescue is applied.
Builtins registered as TYPE_INT that should be collection types
self-hosted/middle/typecheck_builtins.qz:
| Line | Name | Actual return | Status |
|---|---|---|---|
| 130 | str_split | Vec<String> | FIXED via ptype override (typecheck.qz:274) |
| 157 | str_chars | Vec<Int> | PARTIALLY FIXED — ptype override at typecheck.qz:275 declares it as Vec<String>, which is wrong: str_chars returns codepoints (Vec<Int> per std/string.qz:207 and docs/INTRINSICS.md:138). Cosmetic today (sizes match), but an indexer of str_chars(s)[i] gets the wrong element type. |
| 164 | str_bytes | Vec<Int> | UNFIXED — no ptype override. Currently only used in unicode_byte_spec.qz via vec_size(), so no live bug, but parts.size on a str_bytes result would break. |
| 175 | String$split | Vec<String> | FIXED |
| 188 | String$size | Int (but String$chars at 196 returns handle) | — |
| 196 | String$chars | Vec<Int> | PARTIAL — typecheck.qz:277 declares Vec<String>. Same issue as str_chars. |
| 204 | String$bytes | Vec<Int> | UNFIXED |
| 487 | set_members | Vec<T> | UNFIXED — TC registers TYPE_INT, Set$members (line 570) correctly registers TYPE_VEC. Non-UFCS callers break. |
| 528 | Vec$get | element T | registered as TYPE_INT; not a .size bug but type for downstream ops |
| 530 | Vec$slice | Vec<T> | UNFIXED — UFCS v.slice(a, b).size would break |
| 871 | file_bytes | Vec<Int> | UNFIXED |
| 872 | file_chars | Vec<Int> | UNFIXED |
| 873 | file_lines | Vec<String> | UNFIXED |
| 887 | enumerate | Vec<Pair> | UNFIXED |
| 889 | group_by | Map | UNFIXED |
| 890 | partition | Pair<Vec, Vec> | UNFIXED — result index returns Vec but typed as Int |
| 903 | readdir | Vec<String> | UNFIXED — causes tools/doc.qz bugs |
| 909 | zip | Vec<Pair> | UNFIXED |
| 348 | vec_save / strvec_save | status Int (not a bug) | — |
User-level functions returning : Int that are actually handles
std/toml/value.qz:
| Line | Name | Actual return |
|---|---|---|
| 88 | toml_table_keys | Vec<String> |
| 129 | toml_as_array | Vec<TomlValue> |
| 137 | toml_as_table | Map |
All callers in std/toml/*.qz and tools/lint.qz use vec_size() explicitly — so no live bugs, but these are landmines for any future caller who writes arr.size.
Functions that return Int for node-handle reasons (NOT collection handles)
These are SAFE because callers treat them as opaque AstNodeIds:
ast_get_left,ast_get_right,ast_get_extra,ast_get_int_val— returnAstNodeId(child handles or flags), never.size-accessed.
Recommended Fix Order
Top-priority (actual live bugs):
tools/doc.qz:76,tools/doc.qz:87— tworeaddir(...).sizeusages. Swap forvec_size().
Near-term (patch the TC ptype table to close the entire class):
-
Extend
typecheck.qz:273–277ptype overrides to cover every intrinsic that returns a container but is registered asTYPE_INT. Specifically:str_bytes,String$bytes→Vec<Int>- Correct
str_chars,String$charsfromVec<String>toVec<Int> set_members→Vec<Int>(or whatever Set’s T is; need ptype with type var) readdir→Vec<String>file_bytes→Vec<Int>,file_chars→Vec<Int>,file_lines→Vec<String>enumerate,zip→Vec<Vec<Int>>(approximation for pair-of-ints)partition→Vec<Vec>(pair of vecs)group_by→MapVec$slice→Vec<T>(needs generic propagation; may require another mechanism)
-
Update
mir_intrinsic_return_typeinmir.qz:2396to mirror the TC ptype table so that any MIR-level rescue path (NODE_LET binding annotation) stays in sync. Today the two lists diverge — TC hasstr_splitandstr_chars, MIR has a superset (dir_list, etc.). This divergence is a bug magnet; either derive both from one registry, or delete MIR’s redundant list once the TC path is authoritative. -
Fix spec files that use
.sizeon collection-stub results (9 sites incollection_stubs_spec.qzandset_ufcs_spec.qz). These are BEST fixed by the TC ptype additions above; the specs then pass without rewrites.
Long-Term / World-Class Fix (Directive #2)
The whole “two-track” type tracking (TC ptypes + MIR intrinsic rescue) is a symptom of builtins being second-class citizens in the type system. There is no world-class language where the typechecker knows less about its builtins than about user functions. The fix is to promote builtins to the same status as user functions: every intrinsic gets a full annotation string (not just a kind), stored in tc.registry.func_return_annotations with the builtin’s name.
Concrete plan:
Phase 1: Unify builtin and user-function return metadata (~1 day quartz-time)
- Extend
tc_register_builtinto accept an optional annotation string. Default""for scalar builtins. - At init, populate the annotation alongside the type kind for every collection-returning builtin. Use the comments already next to each registration (“
# Fn(String) -> Vec<String>”) as the source of truth. - Make
tc_lookup_function_return_annotationcheck the builtin table as a fallback. With this one change,tc_infer_expr_type_annotation’s NODE_CALL path automatically starts working forvar x = readdir(p)without requiring the caller to add an explicit annotation. - Delete
mir_intrinsic_return_type— it exists only because the TC couldn’t answer the question; once TC can answer, MIR reads the annotation via the normal call-callee lookup path.
This alone closes 90% of the .size bug class: any binding from a builtin now gets an annotation, and the .size rewrite at tc_expr_field_access:945–956 fires correctly.
Phase 2: Make .size on TYPE_INT a hard error, not a silent fall-through (~0.5 day)
The real design flaw exposed by this audit is that tc_expr_field_access returns TYPE_INT as a “maybe a size field, maybe not, who knows” fallback when it can’t resolve a struct. That is a silent-compromise path. The world-class behavior:
- After the builtin-type rewrite block fails, after the annotation fallback fails, after the struct-name resolution fails, emit a typecheck error (
QZ0603equivalent) saying:cannot determine the type of '<receiver>' for field access '.size'. If this is meant to be a Vec/Map/Set/String, add an explicit type annotation. Then returnTYPE_ERROR. - Delete the MIR fallback paths at
mir_lower.qz:1655–1669— with TC erroring, MIR will never see an unresolved.sizeon a non-struct base. - The QZ0603 warning currently emitted by MIR becomes an unreachable branch; delete it.
This enforces “no silent compromises” (Directive #4). The failure mode moves from “program runs, produces wrong answer” to “compiler rejects program, tells user exactly what to do.” That is the only acceptable state for a systems language.
Phase 3: Generic builtin annotations with type-param substitution (~1–2 days)
Cases like Vec$slice<T> or partition<T> need their annotations to propagate the caller’s T. This is the same machinery tc_infer_expr_type_annotation already uses for user generic functions (tc_infer_type_param_mapping at line 156). Extend that machinery to builtins by giving them the same type-param metadata slot. Once done:
v: Vec<Point>; v.slice(0, 5).size— slice’s return annotationVec<T>substitutesT := Pointfrom v’s annotation, rewrites tovec_size(...).- Same machinery handles
enumerate(v).size,zip(a, b).size, etc.
Phase 4: Parity audit across the entire intrinsic surface (~0.5 day)
Walk every tc_register_builtin(...TYPE_INT) site and verify: is the actual return value an Int or a handle masquerading as an Int? Where it’s a handle, either (a) use the new generic annotation mechanism from Phase 3, or (b) promote to a ptype. No exceptions, no “this one is fine, skip it” — parity is cheap to maintain, expensive to recover after it slips.
What this buys
.sizebugs become impossible by construction. You cannot reach the offset-0 fallback because you cannot reach the MIR without the TC having either rewritten the call or emitted an error.- Every tool/editor/LSP gets better type inference for free —
tc_infer_expr_type_annotationnow works on builtin-call sites, enabling hover-type display, autocomplete, and cross-module UFCS resolution for collection-returning intrinsics. - The “two lists you must keep in sync” anti-pattern disappears. One table, one source of truth.
- Fixes a class of adjacent bugs:
.push,.get, index-type inference, UFCS dispatch on builtin-call results. All routed through the same annotation channel.
This is the harder path. Per Directive #1, we take it.
Commit Sequencing Recommendation
Per Directive #8 (binary discipline):
- Apply the tactical 2-line fix to
tools/doc.qz(lines 76, 87) first and commit. This is fast, low-risk, and clears the only confirmed live production bug. No compiler rebuild needed. - Apply the TC ptype additions (readdir, file_*, enumerate, zip, partition, group_by, set_members, str_bytes, String$bytes, str_chars→Vec
correction) as a single commit with quake guard+ smoke tests. This is the interim patch covering current known intrinsics. - Plan Phases 1–4 as a multi-session roadmap item (
project_builtin_annotation_unification.md) and execute them in sequence.