`.size` Hazard Audit

Scope: systematic audit of .size (field access, no parens) across self-hosted/**/*.qz, std/**/*.qz, tools/**/*.qz, and high-signal spec files. Objective: enumerate every site where .size resolves against an Int-typed receiver that actually holds a Vec/Map/Set/String handle at runtime, which produces a silent read of offset 0 (capacity) or the first field of an unrelated struct with a literal size field.

Date: 2026-04-12

Summary

Category	Count
Total `.size` sites examined (self-hosted + std + tools)	~260
Confirmed BUG sites (Int-typed receiver holding a Vec/Map/etc handle)	3 in production code, 9 in spec files
SAFE sites (receiver is `Vec<T>` / `Map<K,V>` / `String` / user struct with literal `size` field)	~248
UNKNOWN / needs deeper trace	0
Sites already fixed today (context)	lint.qz:113/1807/2156, codegen.qz:892 (cg_emit_extern_declarations)

The confined blast radius is explained by two partial mitigations already in place:

TC ptype override in self-hosted/middle/typecheck.qz lines 273–277 upgrades str_split, str_chars, String$split, String$chars from raw TYPE_INT to the Vec<String> ptype, so the .size rewrite in tc_expr_field_access triggers. (Note: str_chars should be Vec<Int> not Vec<String> — see follow-up bugs below.)
MIR rescue in self-hosted/backend/mir.qz line 2396 mir_intrinsic_return_type() maps known intrinsic names to container annotations, and mir_lower_stmt_handlers::mir_lower_stmt_let calls it to tag var x = intrinsic(...) bindings with the right mir_ctx_mark_struct_var type so the MIR-level field-access handler can resolve downstream. This rescue only fires for intrinsics present in the list — and even then it only rescues the direct NODE_LET binding, not chained results or user functions.

Everything not covered by (1) or (2) relies on the typechecker’s tc_infer_expr_type_annotation walker, which requires either an explicit type annotation on the binding or a call whose callee is a user function with a registered return annotation. Builtins are absent from tc.registry.func_return_annotations, so they get no string annotation, only a kind.

Root-Cause Anatomy

The bug is a gap between three concurrent type-resolution systems:

Layer	What it knows	What it doesn’t
TC (`typecheck_builtins.qz`)	`str_split → TYPE_INT` (or `PTYPE_VEC<String>` if overridden)	No string-level annotation for builtins
TC annotation inference (`typecheck_generics.qz::tc_infer_expr_type_annotation`)	User function return annotations, explicit bindings	Builtin return annotations (returns `""`)
MIR (`mir.qz::mir_intrinsic_return_type`)	A hand-maintained map of ~30 intrinsic → annotation	Any intrinsic not in the list; anything transitive

When TC sees x.size where x: TYPE_INT:

tc_expr_field_access (lines 979–1006) checks tc_base_kind(object_type) == TYPE_VEC/STRING/MAP/SET. For TYPE_INT it does NOT match. ptype overrides rescue the 4 string-container builtins; nothing else.
Annotation fallback at lines 945–956 tries tc_scope_lookup_annotation then tc_infer_expr_type_annotation. For var x = readdir(p) with no explicit : Vec<String>, both return "" because readdir’s callee path in tc_infer_expr_type_annotation calls tc_lookup_function_return_annotation which only searches tc.registry.func_names (user functions, not builtins).
Falls through to tc_resolve_expr_struct_name which also returns "". The result = TYPE_INT branch at line 1023 is taken and the AST node remains a NODE_FIELD_ACCESS.
MIR mir_lower_expr_handlers::mir_lower_expr hits the NODE_FIELD_ACCESS path in mir_lower.qz:1615. It calls mir_infer_expr_type(base), which for an IDENT consults mir_ctx_get_var_type. That was set at NODE_LET time by mir_intrinsic_return_type — but only for intrinsics present in the hand-rolled map.
When all inference fails, MIR goes to mir_find_field_globally("size"). This iterates the struct registry and returns the first registered struct that contains a literal field named size, using that struct’s field index. If no match, the final fallback is mir_emit_load_offset(base, 0) — which reads the Vec header’s capacity slot at offset 0 (the Vec layout is [capacity, size, data_ptr]).

So the observed effect is either “capacity instead of size” or “some unrelated struct field at a random offset” depending on what structs exist in the current link.

BUG Sites (Production Code)

File:line	Current code	Proposed fix	Risk / Path
`/Users/mathisto/projects/quartz/tools/doc.qz:76`	`var entries = readdir(root) / var count = entries.size`	`var count = vec_size(entries)`	HIGH — `quartz doc` tool. `readdir` registered as `TYPE_INT` in `typecheck_builtins.qz:903`; not in `mir_intrinsic_return_type`. Symptom: doc generator walks the wrong number of entries or segfaults.
`/Users/mathisto/projects/quartz/tools/doc.qz:87`	`var sub_entries = readdir(full_path) / if sub_entries.size > 0`	`if vec_size(sub_entries) > 0`	HIGH — same root cause. Causes directory recursion to take the wrong branch.
`/Users/mathisto/projects/quartz/tools/doc.qz:76` already counted.

(Only three sites in production code — the rest of production tools/std/self-hosted code is SAFE because it either uses vec_size() explicitly or declares receivers as Vec<T>, which routes .size through the TC rewrite.)

BUG Sites (Spec Files — latent, depend on future stdlib impls)

spec/qspec/collection_stubs_spec.qz tests stdlib functions that do not exist yet (per docs/ROADMAP.md:496). These tests do not run green today. Once the functions are implemented, the tests will trip the .size bug unless the ptype override list is extended first.

File:line	Current code	Proposed fix	Risk
`spec/qspec/collection_stubs_spec.qz:171`	`var pairs = enumerate(v) / assert_eq(pairs.size, 3)`	`assert_eq(vec_size(pairs), 3)` — or fix `enumerate` ptype in TC	LOW (stub)
`spec/qspec/collection_stubs_spec.qz:183`	`var pairs = enumerate(v) / assert_eq(pairs.size, 0)`	same	LOW
`spec/qspec/collection_stubs_spec.qz:198`	`var zipped = zip(a, b) / assert_eq(zipped.size, 3)`	`assert_eq(vec_size(zipped), 3)`	LOW
`spec/qspec/collection_stubs_spec.qz:214`	`var zipped = zip(a, b) / assert_eq(zipped.size, 2)`	same	LOW
`spec/qspec/collection_stubs_spec.qz:228`	`var parts = partition(...) / var matching = parts[0] / assert_eq(matching.size, 2)`	`assert_eq(vec_size(matching), 2)`	LOW
`spec/qspec/collection_stubs_spec.qz:229`	`var non_matching = parts[1] / assert_eq(non_matching.size, 2)`	`assert_eq(vec_size(non_matching), 2)`	LOW
`spec/qspec/set_ufcs_spec.qz:108`	`var members = set_members(s) / if members.size == 3`	`if vec_size(members) == 3` — or fix `set_members` TC registration	MEDIUM — `set_members` is registered as `TYPE_INT` at `typecheck_builtins.qz:487` while `Set$members` at line 570 is `TYPE_VEC`. Non-UFCS callers hit the bug; UFCS callers don’t.

Line 120 of set_ufcs_spec.qz (s.members().size) is SAFE because the TC dispatches to Set$members which returns TYPE_VEC, triggering the rewrite.

UNKNOWN Sites

None. Every .size site I examined in self-hosted/std/tools production code resolves to one of: (a) Vec<T>-typed binding, (b) user struct with literal size field, (c) String, or (d) one of the already-patched ptype intrinsic returns.

Dangerous Function Inventory

These functions either return : Int at the signature level, or are registered as TYPE_INT builtins, but at runtime their return value is a Vec/Map/Set/string container handle. Any .size access on a variable bound from one of these is a latent bug unless explicit type-annotation rescue is applied.

Builtins registered as `TYPE_INT` that should be collection types

self-hosted/middle/typecheck_builtins.qz:

Line	Name	Actual return	Status
130	`str_split`	`Vec<String>`	FIXED via ptype override (`typecheck.qz:274`)
157	`str_chars`	`Vec<Int>`	PARTIALLY FIXED — ptype override at `typecheck.qz:275` declares it as `Vec<String>`, which is wrong: `str_chars` returns codepoints (`Vec<Int>` per `std/string.qz:207` and `docs/INTRINSICS.md:138`). Cosmetic today (sizes match), but an indexer of `str_chars(s)[i]` gets the wrong element type.
164	`str_bytes`	`Vec<Int>`	UNFIXED — no ptype override. Currently only used in `unicode_byte_spec.qz` via `vec_size()`, so no live bug, but `parts.size` on a `str_bytes` result would break.
175	`String$split`	`Vec<String>`	FIXED
188	`String$size`	Int (but `String$chars` at 196 returns handle)	—
196	`String$chars`	`Vec<Int>`	PARTIAL — `typecheck.qz:277` declares `Vec<String>`. Same issue as `str_chars`.
204	`String$bytes`	`Vec<Int>`	UNFIXED
487	`set_members`	`Vec<T>`	UNFIXED — TC registers `TYPE_INT`, `Set$members` (line 570) correctly registers `TYPE_VEC`. Non-UFCS callers break.
528	`Vec$get`	element `T`	registered as `TYPE_INT`; not a `.size` bug but type for downstream ops
530	`Vec$slice`	`Vec<T>`	UNFIXED — UFCS `v.slice(a, b).size` would break
871	`file_bytes`	`Vec<Int>`	UNFIXED
872	`file_chars`	`Vec<Int>`	UNFIXED
873	`file_lines`	`Vec<String>`	UNFIXED
887	`enumerate`	`Vec<Pair>`	UNFIXED
889	`group_by`	`Map`	UNFIXED
890	`partition`	`Pair<Vec, Vec>`	UNFIXED — result index returns `Vec` but typed as `Int`
903	`readdir`	`Vec<String>`	UNFIXED — causes `tools/doc.qz` bugs
909	`zip`	`Vec<Pair>`	UNFIXED
348	`vec_save` / `strvec_save`	status `Int` (not a bug)	—

User-level functions returning `: Int` that are actually handles

std/toml/value.qz:

Line	Name	Actual return
88	`toml_table_keys`	`Vec<String>`
129	`toml_as_array`	`Vec<TomlValue>`
137	`toml_as_table`	`Map`

All callers in std/toml/*.qz and tools/lint.qz use vec_size() explicitly — so no live bugs, but these are landmines for any future caller who writes arr.size.

Functions that return `Int` for node-handle reasons (NOT collection handles)

These are SAFE because callers treat them as opaque AstNodeIds:

ast_get_left, ast_get_right, ast_get_extra, ast_get_int_val — return AstNodeId (child handles or flags), never .size-accessed.

Recommended Fix Order

Top-priority (actual live bugs):

tools/doc.qz:76, tools/doc.qz:87 — two readdir(...).size usages. Swap for vec_size().

Near-term (patch the TC ptype table to close the entire class):

Extend typecheck.qz:273–277 ptype overrides to cover every intrinsic that returns a container but is registered as TYPE_INT. Specifically:
- str_bytes, String$bytes → Vec<Int>
- Correct str_chars, String$chars from Vec<String> to Vec<Int>
- set_members → Vec<Int> (or whatever Set’s T is; need ptype with type var)
- readdir → Vec<String>
- file_bytes → Vec<Int>, file_chars → Vec<Int>, file_lines → Vec<String>
- enumerate, zip → Vec<Vec<Int>> (approximation for pair-of-ints)
- partition → Vec<Vec> (pair of vecs)
- group_by → Map
- Vec$slice → Vec<T> (needs generic propagation; may require another mechanism)
Update mir_intrinsic_return_type in mir.qz:2396 to mirror the TC ptype table so that any MIR-level rescue path (NODE_LET binding annotation) stays in sync. Today the two lists diverge — TC has str_split and str_chars, MIR has a superset (dir_list, etc.). This divergence is a bug magnet; either derive both from one registry, or delete MIR’s redundant list once the TC path is authoritative.
Fix spec files that use .size on collection-stub results (9 sites in collection_stubs_spec.qz and set_ufcs_spec.qz). These are BEST fixed by the TC ptype additions above; the specs then pass without rewrites.

Long-Term / World-Class Fix (Directive #2)

The whole “two-track” type tracking (TC ptypes + MIR intrinsic rescue) is a symptom of builtins being second-class citizens in the type system. There is no world-class language where the typechecker knows less about its builtins than about user functions. The fix is to promote builtins to the same status as user functions: every intrinsic gets a full annotation string (not just a kind), stored in tc.registry.func_return_annotations with the builtin’s name.

Concrete plan:

Phase 1: Unify builtin and user-function return metadata (~1 day quartz-time)

Extend tc_register_builtin to accept an optional annotation string. Default "" for scalar builtins.
At init, populate the annotation alongside the type kind for every collection-returning builtin. Use the comments already next to each registration (“# Fn(String) -> Vec<String>”) as the source of truth.
Make tc_lookup_function_return_annotation check the builtin table as a fallback. With this one change, tc_infer_expr_type_annotation’s NODE_CALL path automatically starts working for var x = readdir(p) without requiring the caller to add an explicit annotation.
Delete mir_intrinsic_return_type — it exists only because the TC couldn’t answer the question; once TC can answer, MIR reads the annotation via the normal call-callee lookup path.

This alone closes 90% of the .size bug class: any binding from a builtin now gets an annotation, and the .size rewrite at tc_expr_field_access:945–956 fires correctly.

Phase 2: Make `.size` on `TYPE_INT` a hard error, not a silent fall-through (~0.5 day)

The real design flaw exposed by this audit is that tc_expr_field_access returns TYPE_INT as a “maybe a size field, maybe not, who knows” fallback when it can’t resolve a struct. That is a silent-compromise path. The world-class behavior:

After the builtin-type rewrite block fails, after the annotation fallback fails, after the struct-name resolution fails, emit a typecheck error (QZ0603 equivalent) saying: cannot determine the type of '<receiver>' for field access '.size'. If this is meant to be a Vec/Map/Set/String, add an explicit type annotation. Then return TYPE_ERROR.
Delete the MIR fallback paths at mir_lower.qz:1655–1669 — with TC erroring, MIR will never see an unresolved .size on a non-struct base.
The QZ0603 warning currently emitted by MIR becomes an unreachable branch; delete it.

This enforces “no silent compromises” (Directive #4). The failure mode moves from “program runs, produces wrong answer” to “compiler rejects program, tells user exactly what to do.” That is the only acceptable state for a systems language.

Phase 3: Generic builtin annotations with type-param substitution (~1–2 days)

Cases like Vec$slice<T> or partition<T> need their annotations to propagate the caller’s T. This is the same machinery tc_infer_expr_type_annotation already uses for user generic functions (tc_infer_type_param_mapping at line 156). Extend that machinery to builtins by giving them the same type-param metadata slot. Once done:

v: Vec<Point>; v.slice(0, 5).size — slice’s return annotation Vec<T> substitutes T := Point from v’s annotation, rewrites to vec_size(...).
Same machinery handles enumerate(v).size, zip(a, b).size, etc.

Phase 4: Parity audit across the entire intrinsic surface (~0.5 day)

Walk every tc_register_builtin(...TYPE_INT) site and verify: is the actual return value an Int or a handle masquerading as an Int? Where it’s a handle, either (a) use the new generic annotation mechanism from Phase 3, or (b) promote to a ptype. No exceptions, no “this one is fine, skip it” — parity is cheap to maintain, expensive to recover after it slips.

What this buys

.size bugs become impossible by construction. You cannot reach the offset-0 fallback because you cannot reach the MIR without the TC having either rewritten the call or emitted an error.
Every tool/editor/LSP gets better type inference for free — tc_infer_expr_type_annotation now works on builtin-call sites, enabling hover-type display, autocomplete, and cross-module UFCS resolution for collection-returning intrinsics.
The “two lists you must keep in sync” anti-pattern disappears. One table, one source of truth.
Fixes a class of adjacent bugs: .push, .get, index-type inference, UFCS dispatch on builtin-call results. All routed through the same annotation channel.

This is the harder path. Per Directive #1, we take it.

Commit Sequencing Recommendation

Per Directive #8 (binary discipline):

Apply the tactical 2-line fix to tools/doc.qz (lines 76, 87) first and commit. This is fast, low-risk, and clears the only confirmed live production bug. No compiler rebuild needed.
Apply the TC ptype additions (readdir, file_*, enumerate, zip, partition, group_by, set_members, str_bytes, String$bytes, str_chars→Vec correction) as a single commit with quake guard + smoke tests. This is the interim patch covering current known intrinsics.
Plan Phases 1–4 as a multi-session roadmap item (project_builtin_annotation_unification.md) and execute them in sequence.

.size Hazard Audit