Quartz v5.25

Result.unwrap() Returns 0 — Root Cause and Fix Plan

Status: Investigation complete. Root cause identified. Fix plan ready. Affects: 2 tests in spec/qspec/option_narrowing_spec.qz (narrow_result_ok, narrow_result_err_else). Scope: Not actually a layout bug — it’s a codegen/tail-call correctness bug with a latent escape-analysis gap. See “Root cause” below.


1. Problem

def narrow_result_ok(): Int
  r = Result::Ok(55)
  if r is Ok
    return r.unwrap()
  end
  return -1
end

Expected return: 55. Actual return: 0.

The sibling test using Option works:

def narrow_some_unwrap(): Int
  opt = Option::Some(42)
  if opt is Some
    return opt.unwrap()   # OR opt! — both work, returns 42
  end
  return -1
end

Reproduced locally: 5/7 tests pass, 2 Result tests fail with “Expected 55, got 0” / “Expected 77, got 0”.


2. Investigation findings

2.1 Layouts are identical, not the bug

Both Option<T> and Result<T, E> use the same two-word layout:

[tag: i64 @ offset 0, payload: i64 @ offset 1]
  • Option::Some(v)[0, v], Option::None[1, 0]
  • Result::Ok(v)[0, v], Result::Err(e)[1, e]

Confirmed by:

  • self-hosted/backend/mir.qz:3297-3301 — built-in Option/Result both always have payload (mir_enum_has_payload returns 1).
  • self-hosted/backend/mir.qz:3335-3346 — variant tag maps: Option Some=0/None=1, Result Ok=0/Err=1. The Quartz hint “tag check has off-by-one” is wrong — they’re the same.
  • self-hosted/backend/mir_lower_expr_handlers.qz:318 — explicit comment: # Result layout: [tag:i64@0, payload:i64@1] Ok=tag 0, Err=tag 1.

The narrowing LLVM IR (generated from narrow_result_ok) correctly constructs the Result on the stack:

%alloc_1.p = alloca [2 x i64], align 8
...
store i64 0,   ptr %sptr_2               ; tag = 0 (Ok) at offset 0
store i64 55,  ptr %sgep_4               ; payload = 55 at offset 1
store i64 %v1, ptr %r                    ; %r = pointer to the alloca

The narrowing tag check (if r is Ok) then loads offset 0, compares to 0, and branches to then1. All good so far.

2.2 Option.unwrap() and Result.unwrap() take different codegen paths

self-hosted/middle/typecheck_expr_handlers.qz rewrites UFCS method calls per-type:

  • opt.unwrap() where opt: Option → func renamed to "unwrap" (line 1893).
  • r.unwrap() where r: Result → func renamed to "unwrap_ok" (line 1908).

"unwrap" is a handled intrinsic, "unwrap_ok" is not.

  • self-hosted/backend/cg_intrinsic_system.qz:441-484 — the unwrap intrinsic handler emits inline LLVM IR: inttoptr, load tag, compare to 0, branch to panic or ok, then load payload at offset 1, produce result register. No function call. This is what opt.unwrap() compiles to — and that’s why it works.

  • "unwrap_ok" is registered as a UFCS name in self-hosted/backend/intrinsic_registry.qz:628 (as Result$unwrap) but the UFCS category has no real handler; the name unwrap_ok falls through to the user-land library definition in std/prelude.qz:97-102:

    def unwrap_ok<T, E>(res: Result<T, E>): T
      match res
        Result::Ok(v) => v
        Result::Err(e) => panic("called unwrap_ok on Err")
      end
    end

    This is a real Quartz function, monomorphized and emitted as define i64 @unwrap_ok(i64 noundef %p0) in the output IR. The match arm for Ok loads offset 0 for the tag, compares to 0, then loads offset 1 as the payload. Verified in the generated IR (lines ~2986-3055 of the spec output).

So: Option’s unwrap is inlined intrinsic LLVM; Result’s unwrap is a function call to @unwrap_ok.

2.3 The offending tail call

Inspection of the generated IR for narrow_result_ok:

then1:
  %v9 = load i64, ptr %r, align 8                ; load pointer to the stack alloca
  ...
  %v11 = tail call i64 @unwrap_ok(i64 %v9)       ; <-- THE BUG
  ret i64 %v11

Contrast the passing Option test (using opt.unwrap() in a minimal case) — no call at all; the unwrap is fully inlined.

Quartz’s backend emits a tail call whenever the call’s result is the block’s return value. From self-hosted/backend/codegen_instr.qz:407-418:

# Tail call detection: call result is block's return value
var is_tail = 0
if var_idx < 0 and callee_has_narrow_ret == 0
  var blk = state.current_block
  var term_kind = blk.mir_block_get_term_kind()
  if term_kind == mir::TERM_RETURN
    var ret_val = blk.mir_block_get_term_data()
    if ret_val == dest
      is_tail = 1
    end
  end
end

Then codegen_instr.qz:433-436 emits tail call i64 @<name>(...) when is_tail == 1. There is no check for whether any argument is (or could be) a pointer into the caller’s stack.

2.4 Why tail call breaks it

In LLVM, the tail marker is a semantic assertion by the frontend that the call is compatible with “pop caller frame before calling callee” tail-call optimization. Critically, one of the preconditions (LLVM LangRef, ‘call’ instruction, tail marker) is:

The callee does not access allocas or byval arguments from the caller.

Equivalently: if you pass a pointer that aliases the caller’s stack frame to a tail call, you have undefined behavior. The optimizer is free to treat the caller’s stack as dead at the call site, fold the call into a jump, and reuse / overwrite the caller’s stack slots. On the backend, with frame pointer set and a 2-word alloca living at a fixed caller-frame offset, the actual outcome we observe is: unwrap_ok reads %p0 (which is the integer value of the ex-caller-frame pointer), inttoptrs it, and loads offset 0 — but the backing memory has already been deallocated / overwritten, and the tag it reads is 0 for different reasons than we expected, and the payload at offset 1 is whatever happens to be at that stack address now. The observed result is 0.

Empirical confirmation. Stripping tail callcall in the generated .ll and re-running the test fixes it:

$ sed 's/tail call/call/g' /tmp/rtest.ll > /tmp/rtest_notail.ll
$ llc /tmp/rtest_notail.ll -o /tmp/rtest_notail.s
$ clang /tmp/rtest_notail.s -o /tmp/rtest_notail_bin -lm -lpthread
$ /tmp/rtest_notail_bin
55                               # correct

vs. the same IR with tail call present prints 0.

2.5 Why Option.unwrap() doesn’t hit this

Because it’s an inline intrinsic — there’s no call at all. The bug is latent there; it would surface the moment any Option method were to go through a library definition like Result’s does (e.g., if you ever inline-desugared opt.unwrap() into a call to a Quartz-level unwrap_option helper, it would break identically).

2.6 Second-order: escape analysis doesn’t catch call-argument escapes

mir_compute_escaped_regs in self-hosted/backend/mir.qz:816-958 is the escape analyzer that decides whether a MIR_ALLOC_STACK needs to be promoted to heap. It tracks:

  • TERM_RETURN of a stack-origin register → escape (line 917, 949).
  • MIR_STORE of a stack-origin value into a non-stack pointer → escape (line 900-912).
  • MIR_STORE_VAR to a global → escape (line 857-860).
  • Transitive propagation through nested field stores (line 960+).

It does NOT check MIR_CALL arguments. A pointer to a stack alloca passed as a call argument does not cause the alloca to escape. This is consistent with the current design — the assumption is “callees don’t outlive arguments, so we don’t need to heap-promote” — but combined with the unconditional tail call emission in codegen_instr.qz, it becomes unsound. The tail-call semantics require that stack-alloca pointers cannot reach the callee at all.


3. External research

3.1 LLVM tail call semantics

From the LLVM Language Reference, 'call' Instruction section (all modern LLVM versions, including 17, 20, 23-git): the tail marker permits sibling call optimization (jump, not call), and the frontend attests that the call satisfies rules including:

  • Caller and callee have compatible prototypes (or at least compatible return types).
  • No variables in the call reference allocas or byvals in the caller.
  • (with tail specifically, optimizer is free but not required; with musttail, it is mandatory.)

The relevant wording on alloca/byval safety is quoted in numerous LLVM docs and LLVM-dev threads — passing a caller-alloca pointer through a tail call is undefined behavior. The inalloca attribute exists precisely because passing stack arguments to calls needs a different, explicit mechanism with defined stack lifetime (LLVM InAlloca docs — “When the call site is reached, the argument allocation must have been the most recent stack allocation that is still live, or the behavior is undefined.”).

Difference between tail and musttail:

  • tail — advisory. Optimizer may perform sibling-call optimization if safe. Frontend is still responsible for asserting it’s safe (i.e., no caller-alloca/byval pointers among args, caller ABI compatible, etc.). If the frontend lies, UB.
  • musttail — mandatory. Optimizer must perform sibcall, and if it can’t (ABI mismatch, needed cleanup, etc.) it’s a hard error. Strict rules on calling convention and argument compatibility.

Sources:

3.2 Result<T,E> runtime layout in production languages

The hint in the task mentions “Is it just a tagged union with the same shape as Option, or does it have different layout constraints?” — the answer across all major languages is: Result is just a tagged union with the same shape as any other binary sum; Option and Result do NOT have fundamentally different layout constraints. There’s nothing special about Result that would explain a layout-specific bug.

  • Rust. Result<T, E> is a regular enum with two variants, stored as discriminant + largest variant payload. There’s no special-case. Rust does apply niche optimization — using invalid bit patterns as implicit discriminants — which can make Result<&T, ()> the size of &T, etc. But niche optimization is a size/performance opt, not a layout change that would put payload at a different offset than for Option<T>. Rust Reference – type layout, Niche optimizations in Rust – 0xAtticus, Niche optimization in Rust (Medium).

  • Haskell (GHC). Either a b (Haskell’s Result) is a boxed algebraic data type. Each constructor (Left, Right) is allocated on the heap as a closure with an info-table header (containing the constructor tag) followed by pointer fields for the payload. Same shape as Maybe’s Just. No layout distinction between Either and any other 2-constructor ADT. GHC heap objects wiki, GHC.Runtime.Heap.Inspect.

  • OCaml. Variants with zero-argument constructors are unboxed integers starting at 0; variants with arguments are heap blocks with a 1-byte tag in the header (per-constructor) and the payload as block fields. result (defined in stdlib as type ('a, 'b) result = Ok of 'a | Error of 'b) follows the standard rules — both Ok and Error are boxed, tagged blocks with one payload word. Identical shape to option’s Some. OCaml docs – memory representation of values, Real World OCaml – runtime memory layout.

  • Swift. Result<Success, Failure> is just an enum with two associated-value cases. Laid out with the same discriminant + associated-value rules as any other 2-case enum. Nothing special.

Takeaway for the fix. Don’t touch the layout. The layout is correct and matches what every major language does. The bug is in the tail call emission path, which passes a stack pointer across a call whose ABI assumes no caller-alloca pointers. Any fix that “reshuffles” Result’s layout would be wrong.


4. Root cause hypothesis (with confidence)

Primary cause (very high confidence, empirically confirmed):

narrow_result_ok builds Result::Ok(55) on the caller’s stack as alloca [2 x i64]. The method call r.unwrap() is rewritten by typecheck to unwrap_ok(r). unwrap_ok is not an inline intrinsic — it’s a Quartz library function (std/prelude.qz:97) — so the backend emits a normal LLVM call. Because the call result is the block’s return value, the backend marks the call tail. Passing the pointer %r (which is a caller-alloca address) to a tail-marked LLVM call is undefined behavior per LLVM LangRef; LLVM’s sibling-call optimizer pops the caller frame (or otherwise invalidates the alloca) before unwrap_ok executes, so unwrap_ok reads garbage / overwritten memory and returns 0.

Proof: sed-replacing tail call with call in the generated LLVM IR immediately produces the correct result (55 / 77). No other change required.

Not the cause (things I checked and ruled out):

  • Result layout is not different from Option’s. Both are [tag@0, payload@1]. Confirmed in mir.qz and mir_lower_expr_handlers.qz.
  • Tag values are not swapped. Ok=0, Err=1, Some=0, None=1. Confirmed.
  • Result$unwrap does not read the wrong offset. The @unwrap_ok body correctly loads offset 0 for tag, offset 1 for payload.
  • There is no extra field for Err’s type. The layout is 2 words regardless of variant.

Secondary issue (medium confidence, latent, should be filed):

The backend’s tail-call detector in codegen_instr.qz:407-418 does not verify that the call’s arguments are free of stack-alloca pointers before emitting tail. This is a general correctness bug in the backend; it is currently masked for almost every other call site because:

  1. Most call args are not local allocas but heap-promoted things (Vec, Map, String, etc., all malloc’d).
  2. Most calls to library helpers that take Option or Result go through intrinsics that are inlined, not real calls.
  3. The ones that do take stack-alloca pointers typically happen not in tail position.

Result.unwrap() is the unlucky intersection: Result is stack-allocated (small @value-sized type), the method rewrites to a real library call, and the call is in tail position (return r.unwrap()).


5. Fix plan

There are three candidate fixes, ordered from “narrowest correct” to “broadest correct.” Choose option C (harder-but-right per Prime Directives 1, 2, 3) as the primary, with A as a cheap belt-and-suspenders.

Option A — Stop emitting unsafe tail call (minimal correctness fix)

Change: In codegen_instr.qz tail-call detection, refuse to emit tail if any of the call’s arguments is a register whose origin is (or is derived from) a MIR_ALLOC_STACK in the current function.

Where:

  • File: self-hosted/backend/codegen_instr.qz
  • Function: the call-emission routine containing lines ~407-442 (the block # Tail call detection through # Emit call).
  • Logic: compute the escaped-origin bitmap via mir_compute_escaped_regs (already available on MirFunc) OR walk the call’s args and check reg_origin for any stack alloca origin. If any arg is stack-origin, force is_tail = 0.

Concretely:

# After computing is_tail per the existing rule, veto it if any arg
# has a current-function stack-alloca origin.
if is_tail == 1
  for i in 0..arg_count
    if state.reg_has_stack_alloca_origin(args[i]) == 1
      is_tail = 0
      break
    end
  end
end

where reg_has_stack_alloca_origin consults the same origin-tracking data already used by mir_compute_escaped_regs (extract it into a reusable mir_reg_origin_table(func): Vec<Int> helper and pass it down to codegen, or precompute once per function).

Pros: Fixes the immediate bug. Fixes the whole class of future bugs where stack-alloca pointers leak through tail calls, not just Result. Cons: Does nothing about the Result.unwrap() being slower than Option.unwrap() — it still goes through a function call instead of being inlined. Leaves an asymmetry that will bite us again (Result is a first-class type and should feel as fast as Option).

Estimated effort: 0.5 quartz-days (2 hours).

Option B — Inline unwrap_ok / unwrap_err / unwrap_or_ok as intrinsics

Change: Give Result’s unwrap the same treatment Option’s unwrap gets: register unwrap_ok, unwrap_err, unwrap_or_ok as inline intrinsics in cg_intrinsic_system.qz, emitting inline LLVM IR that panics on the wrong tag and loads payload from offset 1 (for unwrap_ok) or offset 1 (for unwrap_err — same layout, different panic message, different tag check).

Where:

  • File: self-hosted/backend/cg_intrinsic_system.qz
    • Current name == "unwrap" handler at line 441. Add parallel handlers:
      • name == "unwrap_ok" — tag check == 0 (Ok), panic on Err, load offset 1.
      • name == "unwrap_err" — tag check == 1 (Err), panic on Ok, load offset 1.
      • name == "unwrap_or_ok" — tag check == 0, branchy load: payload or fallback.
  • File: self-hosted/backend/intrinsic_registry.qz — add _r("unwrap_ok", INTRINSIC_CAT_SYSTEM), _r("unwrap_err", INTRINSIC_CAT_SYSTEM), _r("unwrap_or_ok", INTRINSIC_CAT_SYSTEM) alongside the existing _r("unwrap", INTRINSIC_CAT_SYSTEM) at line 536.
  • File: self-hosted/middle/typecheck_builtins.qz — register tc_register_builtin(tc, "unwrap_ok", ...), "unwrap_err", "unwrap_or_ok", "unwrap_or" (also missing) alongside the existing unwrap at line 660. Without this, the intrinsic won’t resolve at typecheck.
  • File: std/prelude.qzdelete def unwrap_ok, def unwrap_err, def unwrap_or_ok since they’re now covered by intrinsics. Per Prime Directive 7 (no compat layers), delete outright in the same commit.
  • File: self-hosted/backend/cg_intrinsic_system.qz — adjust panic message string pool: the existing unwrap handler emits “panic with backtrace + abort”; match that exactly for unwrap_ok/unwrap_err, reusing the same string constants (@.str.1, etc.) OR emit new ones with specific messages (“called unwrap_ok on Err”).

Pros: Matches Option’s performance — no call overhead. Symmetric. Dead-code-eliminates the std/prelude wrappers. Eliminates the tail-call bug for the case that triggered it without depending on Option A. Consistent with the “everything is an intrinsic” pattern the backend already uses for unwrap, expect, unwrap_or (see intrinsic_registry.qz:581-583). Cons: Doesn’t fix the underlying tail-call-with-alloca-pointer bug — latent elsewhere. Must also do Option A (belt and suspenders).

Estimated effort: 1 quartz-day (4 hours) — including symmetric handlers for all three methods, updating typecheck, deleting the std/prelude wrappers, and verifying fixpoint.

Do A + B in a single commit. A closes the underlying correctness gap (no latent class-C bug reappearing in a year when some unrelated feature adds a new stack-allocated type and a new library call). B gets Result parity with Option, deletes dead library code, and is the directly-visible fix for the failing tests.

Order of operations in the commit:

  1. Land the intrinsic handlers + registry + typecheck registrations + delete prelude wrappers (B).
  2. Land the tail-call escape check (A).
  3. Rebuild quake guard — verify fixpoint.
  4. Run option_narrowing_spec standalone — confirm 7/7.
  5. Run smoke tests (brainfuck, style_demo, expr_eval) to catch regressions.
  6. Run a targeted QSpec subset touching Result (force_unwrap_spec, any *_result* specs) to check nothing else breaks.
  7. Commit.

Estimated effort: 1.5 quartz-days (6 hours) including testing and fixpoint.

Files that will change (final list)

FileChangeLines (approx)
self-hosted/backend/cg_intrinsic_system.qzAdd unwrap_ok, unwrap_err, unwrap_or_ok handlers, modeled on existing unwrap at line 441.+100
self-hosted/backend/intrinsic_registry.qzRegister the three new names at SYSTEM category (~line 536).+3
self-hosted/middle/typecheck_builtins.qztc_register_builtin for the three new names (~line 660 area).+3
std/prelude.qzDelete unwrap_ok, unwrap_err, unwrap_or_ok (lines 97-119).−23
self-hosted/backend/codegen_instr.qzAdd stack-alloca escape check to tail-call detector (~lines 407-418).+10
self-hosted/backend/mir.qz(Optional refactor, if (A) needs it) Export a reusable mir_reg_origin_table helper that codegen_instr can consume.+15

What correct behavior should look like (test expectations)

After the fix:

  1. spec/qspec/option_narrowing_spec.qz — 7/7 tests pass. narrow_result_ok returns 55. narrow_result_err_else returns 77.
  2. The generated LLVM IR for narrow_result_ok contains no call @unwrap_ok at all — it’s inlined to a load-tag/branch/load-payload sequence just like Option. Verify with: quartz /tmp/rtest.qz | grep -c unwrap_ok → should be 0 (or only if unwrap_ok is called indirectly elsewhere).
  3. Fixpoint (gen1 == gen2 byte-identical) holds.
  4. Smoke tests pass (brainfuck, style_demo, expr_eval).
  5. As a regression guard, the existing force_unwrap_spec.qz still passes.

6. Quartz-time estimate

  • Option C (full fix): 1.5 quartz-days ≈ 6 hours.
    • Intrinsic handlers + registration + typecheck registration: 2 hours.
    • Delete prelude wrappers, verify nothing else references them: 30 minutes.
    • Tail-call escape check with origin-table plumbing: 1 hour.
    • quake guard + fixpoint + smoke tests + targeted QSpec subset: 1.5 hours.
    • Buffer for unanticipated issues (likely: typecheck arity mismatch, monomorphization interaction): 1 hour.

If only Option A is done: ~2 hours.


7. Risk assessment

Low risk. This is a targeted intrinsic addition plus a small backend correctness fix.

  • Binary discipline risk: Must run quake guard before committing. Compiler source changes, so fixpoint must be re-verified. Take a fix-specific backup at self-hosted/bin/backups/quartz-pre-result-unwrap-golden before starting, per Rule 1.
  • Fixpoint risk: Inlining unwrap_ok changes the emitted IR for every Result.unwrap() call site in the compiler itself. If the compiler uses Result.unwrap() internally (it does — Result is used for resolver, typecheck, etc.), the gen1 and gen2 IR will differ from the pre-fix baseline but must be internally consistent. A clean fixpoint should still obtain because the new inline code is deterministic and the same inputs produce the same IR. If fixpoint fails, most likely cause is interaction with @value escape analysis on the poll-callee registry — unlikely for this change but worth monitoring.
  • Test regression risk: Low. Result.unwrap() is a hot path but the behavior change is semantic equivalence (both versions should return the same value on valid input; the library version was returning garbage on stack-allocated Results). All call sites that work today continue to work.
  • Typecheck / monomorphization interaction: The existing Result$unwrap builtin (line 677 of typecheck_builtins.qz) is registered as TYPE_INT return. The new unwrap_ok intrinsic should match that signature. If typecheck currently infers a polymorphic return from the generic unwrap_ok<T,E> in prelude, deleting that definition might change inference. Must check that the intrinsic-return path correctly propagates the T type to the caller (it should — the unwrap intrinsic already does this for Option).
  • Error messaging risk: Today, unwrapping an Err calls panic("called unwrap_ok on Err") via the library function, which routes through Quartz’s panic path. The intrinsic version should produce an equivalent panic message (new string constant @.str.xxx = "called unwrap_ok on Err" with same qz_print_backtrace + abort sequence as the existing unwrap intrinsic). Trivial to get right; just don’t forget it.
  • unwrap_or for Result (unwrap_or_ok) is also broken — same root cause. Since we’re fixing unwrap_ok and unwrap_err, do unwrap_or_ok in the same commit. Easy.

Single-failure mode to watch: If quake guard fails fixpoint after the change, the most likely cause is that the compiler’s own Result-heavy modules (resolver, typecheck, middle/*) compile differently in gen1 vs gen2 because of some stale cache or escape-analysis interaction. Recovery: cp self-hosted/bin/backups/quartz-pre-result-unwrap-golden self-hosted/bin/quartz and diagnose from the working binary. Take this backup before touching code.


8. Summary (TL;DR)

The “Result$unwrap layout bug” isn’t a layout bug. Layouts are identical: [tag@0, payload@1] with Ok=0, Err=1, same as Option with Some=0, None=1.

The real bug: Option.unwrap() is an inline intrinsic (cg_intrinsic_system.qz:441), but Result.unwrap() is typecheck-rewritten to unwrap_ok() which is a normal Quartz library function in std/prelude.qz:97. The backend emits it as tail call @unwrap_ok(...). The tail marker asserts to LLVM that no argument aliases a caller alloca — Quartz violates this, because r on the caller side is a pointer to alloca [2 x i64]. LLVM is then free to pop the caller frame before the callee runs; the callee reads garbage; the test returns 0.

Empirical proof: sed 's/tail call/call/g' on the generated IR makes the test pass.

Fix: promote unwrap_ok, unwrap_err, unwrap_or_ok to inline intrinsics (matching Option’s treatment), delete the std/prelude wrappers, and independently harden the backend tail-call detector to never emit tail when any argument has a caller-alloca origin. One commit.

Effort: ~1.5 quartz-days. Risk: low. Unblocks: 2 tests immediately; plus prevents a whole class of future bugs where small stack-allocated types (Result, maybe future Tuples, small records, @value structs) get their pointers passed to tail calls and silently corrupt.