Bug 1: “Never type in let binding context” — exit 176

Spec file: spec/qspec/never_type_spec.qz Failing test: "never type in let binding context" (line 98) Observed: exit 176 instead of expected 42. Verdict: the test name is misleading. The root cause is not Never-type handling. The root cause is a tail-call optimization miscompilation when a stack-allocated @value Option is passed to a non-inlined function that’s called in tail position. The Never-type arms just happen to prevent the inliner from eliding the call, which is what exposes the TCO bug.

1. Problem (as observed)

The test compiles and runs this program:

def get_or_die(opt: Int): Int
  return match opt
    Some(x) => x
    None => panic("was None")
    _ => panic("impossible")
  end
end
def main(): Int
  return get_or_die(Option::Some(42))
end

Option::Some(42) is expected to return 42, but the program exits with 176. The 176 is not random — it is 0xb0, the low byte of a libc pointer (0x1f9cb80b0 = lsl::sAllocatorBuffer) that happens to sit on the freed stack slot where the payload used to live.

0x1f9cb80b0 & 0xff = 0xb0 = 176.

2. Reproduced minimal case (no Never at all)

def my_unwrap(opt: Int): Int
  var total = 0
  for i in 0..100
    total += i
  end
  for j in 0..100
    total += j * 2
  end
  return match opt
    Some(x) => x + total - total
    None => 0
    _ => 0
  end
end
def main(): Int
  return my_unwrap(Option::Some(42))
end

Exits 0 (garbage). No panic, no Never. The common ingredient is:

Caller constructs a stack-allocated @value Option (alloca of [2 x i64]).
Caller passes ptrtoint %alloc to i64 as the argument to a non-inlined callee.
Caller’s final statement is return callee(...).
Quartz codegen emits tail call i64 @callee(i64 %v1).
LLVM performs tail call elimination: caller deallocates its frame, branches to callee.
Callee’s own frame overlaps the freed caller frame; the payload is overwritten before it’s read.

A simpler version where unwrap is small enough to inline does NOT reproduce — the inliner eliminates the call, so TCO never happens.

3. Current Quartz state

3.1 Where the `tail call` marker is emitted

self-hosted/backend/codegen_instr.qz:407-442

# Tail call detection: call result is block's return value
var is_tail = 0
if var_idx < 0 and callee_has_narrow_ret == 0
  var blk = state.current_block
  var term_kind = blk.mir_block_get_term_kind()
  if term_kind == mir::TERM_RETURN
    var ret_val = blk.mir_block_get_term_data()
    if ret_val == dest
      is_tail = 1
    end
  end
end

# Disable tail call for decomposed calls (arg count mismatch after expansion)
if callee_decompose == 1
  is_tail = 0
end

# Emit call ...
if is_tail == 1
  codegen_util::cg_emit(out, " = tail call i64 @")
else
  codegen_util::cg_emit(out, " = call i64 @")
end

The current rule: emit tail call whenever the call result is the return-value register. There is no check on whether any argument is a pointer into the caller’s stack frame. That’s the bug.

3.2 Where stack-allocated Options are constructed

@value enum constructors (Option::Some, etc.) allocate [N x i64] via alloca in the caller, then ptrtoint the pointer to i64 and pass it as a plain i64 argument. The generated IR looks like:

%alloc_1.p = alloca [2 x i64], align 8
%alloc_1 = bitcast [2 x i64]* %alloc_1.p to ptr
%v1 = ptrtoint ptr %alloc_1 to i64
store i64 %v0, ptr %alloc_1     ; tag = 0 (Some)
%sgep_4 = getelementptr ... %alloc_1, i64 1
store i64 %v3, ptr %sgep_4      ; payload = 42
%v6 = tail call i64 @get_or_die(i64 %v1)  ; <-- BUG: TCO-eligible, arg is stack addr
ret i64 %v6

After LLVM’s tail-call pass: caller does add sp, sp, #0x20 then branches to callee; callee does sub sp, sp, #0x50 and stores callee-saved regs into the space that used to hold the Option payload.

3.3 ARM64 disassembly evidence

From the repro, get_or_die has epilog stp x0, x0, [sp, #0x8]; add sp, sp, #0x50; ret. At the ret, x0 (the return value) is 0x1f9cb80b0, not 42. Tracing backwards: the ldr x0, [x0, #0x8] that should load the payload at offset 8 actually reads from a stack slot that the callee clobbered with a callee-saved register during its prolog. That register’s old value is a libc pointer because the caller (qz_main) had a function pointer sitting there from a previous TLS access path.

4. External research

4.1 LLVM rules for `tail call`

From LLVM Language Reference — call instruction:

The optional tail and musttail markers indicate that the optimizers should perform tail call optimization (TCO). … tail is a hint that the TCO is eligible … The marker has the following semantics:

The call will not cause unbounded stack growth if it is part of a recursive cycle in the call graph.

Arguments with the in alloca or inalloca attribute are forbidden.

The callee must not access any memory that is local to the caller, such as allocas or spill slots.

Quartz is violating the third bullet: the caller passes ptrtoint %alloca to i64, and the callee absolutely accesses that memory (it loads the tag and payload). LLVM’s escape analysis cannot see through ptrtoint reliably — from LLVM’s perspective the i64 argument is just an integer, and it assumes the callee doesn’t touch caller-local memory. So LLVM performs TCO and the program miscompiles.

From LLVM issue #72555 — Subtle issue with [[clang::musttail]]:

If the function has any alloca instructions, safely keeping allocas in the entry block requires analysis to prove that the tail-called function does not read or write the stack object.

This is exactly the missing analysis in Quartz.

4.2 How other languages handle this

Rust: does not emit LLVM tail call markers by default. Rust’s codegen emits plain call for most user-level calls. The become keyword (RFC 1888, “Guaranteed TCO”) is still unimplemented precisely because of the alloca-escape issue — the MIR-to-LLVM layer needs to prove no caller-local memory escapes before it can emit musttail. See Rust tracking issue #112788.

Swift: emits tail only when SIL’s escape analysis proves no alloc_stack is reachable from the call arguments. @inout and UnsafePointer arguments to a tail-position call disable the marker.

Zig: emits call tail only for @call(.always_tail, ...) and enforces at comptime that no argument is a pointer to a caller-local. Ordinary tail-position calls get plain call.

GHC (Haskell): has its own STG tail-call discipline that never uses LLVM’s tail marker. Closures live on a separate heap, not the C stack.

Common thread: nobody trusts LLVM to figure this out from opaque i64 arguments. They either skip the marker entirely, or they do their own escape analysis in the frontend.

4.3 Rust’s actual handling of `let x = if ... else { panic!() }` (the surface pattern the test thought it was testing)

From Rust Reference — Never type and Never Type initiative:

panic!() has type ! (never).
When ! appears at a coercion site, the compiler inserts an implicit coercion absurd: ! -> T.
The typechecker unifies the other branch’s type T with the overall expression type, and the ! branch is elided from value computation (it’s terminated by unreachable).
Codegen lowers panics as normal calls that terminate with unreachable. There’s no “value of panic” flowing through phi nodes.

Quartz already does this correctly: mir_lower_expr_handlers.qz:2261-2266 marks panic/exit/unreachable calls as terminating their block with TERM_UNREACHABLE. The Never-type aspect of the failing test is not broken.

5. Root cause hypothesis

codegen_instr.qz:408-417 emits tail call purely based on “is the call result the block’s return value?” without checking whether any argument is a pointer into the caller’s stack frame.

When that happens, LLVM’s TCO pass (correctly, per its own rules) deallocates the caller’s frame before branching to the callee. The @value Option sitting in the caller’s alloca is overwritten by the callee’s prolog (callee-saved register spills). The callee loads garbage and returns it. The lower byte of that garbage is 176 in the test, 0 in the simpler repro — both are wrong.

Why the Never test surfaced it: the panic arms prevent LLVM’s inliner from inlining get_or_die into main. Without inlining, there’s a real call site to optimize, which triggers the TCO. Non-panicking versions of the same pattern inline cleanly and don’t hit the bug.

6. Fix plan

Phase 1: Stop emitting `tail call` when any argument may point to caller stack (CORRECT FIX)

File: self-hosted/backend/codegen_instr.qz

Function: the call-emission path starting at line 407.

Change: add an argument-escape check before setting is_tail = 1. The check needs to answer: does any arg[i] transitively originate from an alloca in the current function?

Options for the check, in order of increasing precision:

(a) Conservative: disable tail call whenever ANY argument is an MirReg whose def-site is a stack-alloca intrinsic (stack_alloc, value_struct_new, etc.) or a ptrtoint of one. This is easy to implement at the MIR level: when MIR lowers stack-allocated Option/struct constructors, tag the resulting reg with a “stack-pointer” bit. In codegen, if any call arg has that bit, force is_tail = 0.
(b) Precise: compute a reaching-defs / taint set per register during MIR→LLVM lowering. Any reg whose taint set includes an alloca is a “stack pointer.” Tail-calls are rejected if any arg is a stack pointer.
(c) Future-proof: stop using ptrtoint for stack-allocated struct values entirely. Pass them as typed ptr arguments. Then LLVM’s own escape analysis sees the alloca pointer and refuses TCO automatically. This is the right long-term shape (see Phase 3).

Recommended: implement (a) now — it’s 50 lines, fully correct, and closes the bug. Add (c) as a follow-up because it pays off in other ways (better LLVM optimization, clearer IR).

Specific changes:

self-hosted/backend/mir.qz: add a bit MirReg::escapes_via_stack (or reuse an existing flag byte). When MIR_STACK_ALLOC / MIR_VALUE_CTOR / any intrinsic that allocas produces a register, set the bit. Propagate through MIR_PTRTOINT and through any copy/move.
self-hosted/backend/codegen_instr.qz:415: before is_tail = 1, loop over args[] and if any arg’s MirReg has the stack-pointer bit, keep is_tail = 0.
Add a regression test covering the minimal repro (my_unwrap with a non-inlined body) so the bug cannot silently return.

Quartz-time estimate: 0.5 day. Risk: low. Propagating one bit through MIR is mechanical. Worst case: some legitimately-tail-callable sites lose TCO; none of them are in hot loops in current Quartz code.

Phase 2: Activate the failing QSpec test

After Phase 1, the existing never_type_spec.qz test at line 98 should pass unchanged. Run the full never_type_spec.qz and stress_type_system_spec.qz suites to confirm.

Phase 3 (follow-up, not blocking): switch `@value` struct/enum args to typed `ptr`

Stop emitting ptrtoint %alloca to i64 followed by i64 arguments. Emit ptr arguments directly. Rewrite mir_sizeof_type-aware calls to use ptr types where the arg type is a stack-allocated struct. This requires threading struct-type information through the MIR→LLVM call lowering path. Benefits:

LLVM’s own TCO escape analysis handles it (no manual tracking needed).
Opt passes (DSE, GVN, memcpyopt) get better alias information.
Debug info becomes more accurate.
Matches what Rust/Swift/Zig do.

Quartz-time estimate: 2 days. Touches codegen_instr.qz, codegen_util.qz, and the MIR→LLVM type-name layer. Risk: medium. Needs careful handling of mixed i64/ptr arg lists and of existing callees that expect i64.

Phase 4 (follow-up): do not inline the dead `mir_emit_store_var` after a terminated block

Unrelated but adjacent: in mir_lower_expr_handlers.qz:2989-2997 (match arm bodies) and similar sites, ctx.mir_emit_store_var(result_name, arm_val) is called even when the current block is already terminated by TERM_UNREACHABLE. The resulting store is dead code but adds IR noise. Guard these with a term_kind < 0 check before emitting. Not a correctness bug, just cleanup.

Quartz-time estimate: 0.25 day.

7. Out of scope

Actual Never-type handling: already correct. Panic calls correctly terminate blocks with TERM_UNREACHABLE. Match arms correctly propagate the terminator. The never_type_spec.qz tests that pass today pass for the right reasons.
The exit 176 magic number: incidental. Will become exit 0 or any other garbage value once the TCO bug is fixed (the fix makes the test pass; the specific garbage value no longer matters).