Next Session — Pthread Intrinsics Refactor + High-Value Polish

Baseline: a4cff8ad (post-async-mutex fix, trunk) Primary target: Add raw pthread_mtx_lock / pthread_mtx_unlock intrinsics and stop abusing the Quartz Mutex intrinsic on bare pthread_mutex_t blocks inside async_mutex_new / async_rwlock_new. Secondary: Pick 1-2 items from the roadmap menu below based on remaining budget. Scope: One focused session. Concurrency-correctness theme continues from Apr 15 sprint — this is the polish tail.

Prior sprint (Apr 15): scheduler park/wake sprint fully closed in 6 commits. See next-session-scheduler-park-wake.md for the original sprint handoff and ROADMAP Tier 2 #6 / Tier 3 #13+#15 for the complete writeup.

TL;DR

Primary (2-3h): The Apr 15 fix for ASYNC-MUTEX-MISSED-WAKEUP defended a semantic smell instead of removing it. mir_emit_async_mutex_lock / mir_emit_async_rwlock_{read,write} all call the mutex_lock(imtx) intrinsic which treats imtx (a bare pthread_mutex_t) as a Quartz Mutex struct with a protected-value slot at byte 64. I fixed the symptom by growing imtx from 64 → 72 bytes to keep the OOB read/write in-bounds. The fix is correct but the pattern is wrong. Ship raw pthread_mtx_lock(imtx) / pthread_mtx_unlock(imtx) intrinsics, swap all call sites, shrink imtx back to 64 bytes.
Menu: After the primary lands, pick from the prioritized “Secondary targets” below. Top pick: Tier 3 #11 (Resolver full scope tracking) — 1-2 days, eliminates UFCS module collisions for local vars, small but high-value compiler polish.
Failure mode: if the primary hits unexpected issues, the 72-byte fix is already load-bearing and committed. Reverting the refactor attempt is safe; the functional bug is already fixed.

Pre-flight (≤ 5 min)

cd /Users/mathisto/projects/quartz
git log --oneline -6
# Expected top 6:
#   a4cff8ad Fix ASYNC-MUTEX-MISSED-WAKEUP — TWO interacting bugs, both load-bearing
#   584b1434 Fix multi_wake_worker TOCTOU race — resolves SCHED-WAKE-SIGSEGV residue
#   d1c58715 Delete dead spin-park codegen + widen QZ0210 to any non-async-target function
#   810f6af7 ROADMAP: Phase 5 async mutex investigation — false start, filed ASYNC-MUTEX-MISSED-WAKEUP
#   737b0a4e QZ0210: reject sched_park() from main() — codify the async-only invariant
#   ab9a7829 Fix __qz_sched_wake CAS retry loop — eliminates sched_park lost-wake race
git status
./self-hosted/bin/quake guard:check
./self-hosted/bin/quake smoke 2>&1 | tail -6

# Baseline stress numbers to beat (post-sprint):
# sched_park_spec: 0/8000 (all 4 tests)
# async_mutex_spec: 0/2000 (8 tests)
# async_rwlock_spec: 0/1500 (6 tests)
# async_channel_spec: 0/300 (6 tests, needs QUARTZ_COMPILER env var — see below)

# Fix-specific backup (standard Rule 1 — do this BEFORE touching cg_intrinsic_conc_*.qz)
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-pthread-intrinsics-golden

Primary: Pthread intrinsics refactor

Why this matters

The current mutex_lock intrinsic emits (per cg_intrinsic_conc_sched.qz:53-93):

; mutex_lock: acquire lock, return protected value
%ml.mx = inttoptr i64 %<mx> to ptr
%ml.mtx = getelementptr i8, ptr %ml.mx, i64 0       ; pthread_mutex_t at offset 0
call i32 @pthread_mutex_lock(ptr %ml.mtx)
%ml.val.gep = getelementptr i8, ptr %ml.mx, i64 64  ; value slot at offset 64
%ml.val.p = bitcast ptr %ml.val.gep to ptr
%v<d> = load i64, ptr %ml.val.p                     ; load protected value

And mutex_unlock(mtx, new_val) writes new_val to byte 64 and then calls pthread_mutex_unlock on byte 0.

This is the correct Quartz Mutex<T> protocol where Mutex<T> = { pthread_mutex_t (64b), value (8b) } = 72 bytes total, emitted by mutex_new at line 11-51 of the same file.

The smell: mir_emit_async_mutex_lock / mir_emit_async_rwlock_read / mir_emit_async_rwlock_write (in mir_lower_expr_handlers.qz) store a bare pthread_mutex_t pointer at mtx[6] / rw[5] and then pass it to mutex_lock(imtx) / mutex_unlock(imtx, 0). They’re using mutex_lock purely for its pthread_mutex_lock effect and discarding the “protected value” result. The load at byte 64 is wasted; the write at byte 64 is a dead write.

The hazard: Before the Apr 15 fix, imtx was only 64 bytes, so the byte-64 load/store was OOB. Adjacent heap allocations got corrupted (bug A of the two-bug ASYNC-MUTEX-MISSED-WAKEUP saga). The fix grew imtx to 72 bytes to keep the OOB in-bounds, but the wrong pattern is still there. A future developer writing a new concurrency primitive might reach for mutex_lock(imtx) and get it wrong again. Raw pthread intrinsics make the intent explicit and eliminate the footgun.

Step-by-step plan

Phase 1: Add the intrinsics (1h)

Register names in self-hosted/backend/intrinsic_registry.qz near line 100:
```
_r("pthread_mtx_lock", INTRINSIC_CAT_CONCURRENCY)
_r("pthread_mtx_unlock", INTRINSIC_CAT_CONCURRENCY)
```
Grep for _r("mutex_lock" to find the exact spot.

Add codegen handlers in self-hosted/backend/cg_intrinsic_conc_sched.qz near line 53 (after mutex_lock / mutex_unlock handlers). Two new blocks:

# pthread_mtx_lock(imtx) — raw pthread_mutex_lock, no protected-value slot.
# Used by async_mutex_lock / async_rwlock_{read,write} for their internal
# wait-list mutex. DOES NOT read byte 64 — imtx is a bare pthread_mutex_t.
if name == "pthread_mtx_lock"
  var mx = args[0].to_s()
  codegen_util::cg_emit_line(out, "  ; pthread_mtx_lock: raw pthread_mutex_lock (no value slot)")
  codegen_util::cg_emit(out, "  %pml")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit(out, ".p = inttoptr i64 %v")
  codegen_util::cg_emit(out, mx)
  codegen_util::cg_emit_line(out, " to ptr")
  codegen_util::cg_emit(out, "  call i32 @pthread_mutex_lock(ptr %pml")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit_line(out, ".p)")
  # Race detector: acquire after pthread lock (matches mutex_lock handler).
  if state.race_mode == 1
    codegen_util::cg_emit(out, "  call void @__qz_race_acquire(ptr %pml")
    codegen_util::cg_emit(out, d)
    codegen_util::cg_emit_line(out, ".p)")
  end
  codegen_util::cg_emit(out, "  %v")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit_line(out, " = add i64 0, 0")
  return 1
end

# pthread_mtx_unlock(imtx) — raw pthread_mutex_unlock, no value store.
if name == "pthread_mtx_unlock"
  var mx = args[0].to_s()
  codegen_util::cg_emit_line(out, "  ; pthread_mtx_unlock: raw pthread_mutex_unlock (no value slot)")
  codegen_util::cg_emit(out, "  %pmu")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit(out, ".p = inttoptr i64 %v")
  codegen_util::cg_emit(out, mx)
  codegen_util::cg_emit_line(out, " to ptr")
  # Race detector: release BEFORE pthread unlock (matches mutex_unlock handler).
  if state.race_mode == 1
    codegen_util::cg_emit(out, "  call void @__qz_race_release(ptr %pmu")
    codegen_util::cg_emit(out, d)
    codegen_util::cg_emit_line(out, ".p)")
  end
  codegen_util::cg_emit(out, "  call i32 @pthread_mutex_unlock(ptr %pmu")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit_line(out, ".p)")
  codegen_util::cg_emit(out, "  %v")
  codegen_util::cg_emit(out, d)
  codegen_util::cg_emit_line(out, " = add i64 0, 0")
  return 1
end

Note the race-detector hooks match the mutex_lock/unlock handler ordering (acquire AFTER lock, release BEFORE unlock). Grep state.race_mode in cg_intrinsic_conc_sched.qz to copy the exact pattern.

Typecheck registration — pthread_mtx_lock and pthread_mtx_unlock are internal-only (not user-facing). They should NOT be registered in typecheck_builtins.qz. User code calling them by name would hit QZ0200 (“unknown function”). That’s correct — these exist only for the MIR emitter’s internal use.

Phase 2: Swap the call sites (30 min)

In self-hosted/backend/mir_lower_expr_handlers.qz, find every occurrence where imtx (loaded from slot 5 or 6 of an async lock struct) is passed to mutex_lock / mutex_unlock. Current count after the Apr 15 fix: 6 lock sites, 6 unlock sites across three functions:

mir_emit_async_rwlock_read (~line 939-1005): 1 acquire lock, 1 re-check lock, 1 unlock, 1 re-check unlock, 1 final unlock = 1 lock + 1 unlock + 1 lock(rc) + 1 unlock(rc) pattern. Grep mutex_lock and mutex_unlock in the function body.
mir_emit_async_rwlock_write (~line 1045-1130): same shape as read.
mir_emit_async_mutex_lock (~line 1140-1250): same shape — lock, re-check lock, unlock, re-check unlock, final unlock.

For each:

Replace ctx.mir_emit_intrinsic("mutex_lock", as_int(lk_a)) where lk_a contains imtx with ctx.mir_emit_intrinsic("pthread_mtx_lock", as_int(lk_a)).
Replace ctx.mir_emit_intrinsic("mutex_unlock", as_int(ulk_a)) with ctx.mir_emit_intrinsic("pthread_mtx_unlock", as_int(ulk_a)), and drop the second arg (the dummy 0 — pthread_mtx_unlock takes only imtx).

Phase 3: Shrink the allocations (15 min)

In self-hosted/backend/cg_intrinsic_conc_async.qz:

async_mutex_new (~line 37): revert malloc(i64 72) back to malloc(i64 64) and memset(..., 64). Update the comment to explain that the 64-byte allocation is correct because the raw pthread_mtx_lock intrinsic doesn’t touch byte 64.
async_rwlock_new (~line 437): revert malloc(i64 72) back to malloc(i64 64) and memset(..., 64). Same comment update.

Phase 4: Rebuild + verify (30 min)

./self-hosted/bin/quake build 2>&1 | tail -8
./self-hosted/bin/quake smoke 2>&1 | tail -6

Single-run the three affected specs to verify the intrinsics resolve correctly:

FILE=spec/qspec/async_mutex_spec.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -10
FILE=spec/qspec/async_rwlock_spec.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -10
FILE=spec/qspec/async_channel_spec.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -10

Then the load-bearing stress gate — must match or beat Apr 15 sprint results:

export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

# Rebuild spec binaries against the new compiler
for spec in async_mutex_spec async_rwlock_spec; do
  ./self-hosted/bin/quartz --no-cache -I std -I . -I self-hosted/shared -I spec/qspec/fixtures \
    spec/qspec/${spec}.qz 2>/dev/null > /tmp/${spec}.ll
  llc -filetype=obj /tmp/${spec}.ll -o /tmp/${spec}.o
  clang /tmp/${spec}.o -o /tmp/${spec}_bin -lpthread -lm
done

# Mutex: must be 0/2000 (matches Apr 15)
pass=0; hang=0; other=0
for i in $(seq 1 2000); do
  timeout 10 /tmp/async_mutex_spec_bin > /dev/null 2>&1
  code=$?
  if [ $code -eq 124 ]; then hang=$((hang+1))
  elif [ $code -eq 0 ]; then pass=$((pass+1))
  else other=$((other+1)); fi
done
echo "async_mutex_spec post-refactor: pass=$pass hang=$hang other=$other / 2000"

# RwLock: must be 0/1500 (matches Apr 15)
pass=0; hang=0; other=0
for i in $(seq 1 1500); do
  timeout 10 /tmp/async_rwlock_spec_bin > /dev/null 2>&1
  code=$?
  if [ $code -eq 124 ]; then hang=$((hang+1))
  elif [ $code -eq 0 ]; then pass=$((pass+1))
  else other=$((other+1)); fi
done
echo "async_rwlock_spec post-refactor: pass=$pass hang=$hang other=$other / 1500"

If both are 0 / (2000 or 1500), ship. If not, the refactor introduced a regression — revert, investigate, likely you mixed up an unlock site or forgot to drop the dummy 0 arg.

Then:

./self-hosted/bin/quake guard
git add self-hosted/backend/intrinsic_registry.qz \
        self-hosted/backend/cg_intrinsic_conc_sched.qz \
        self-hosted/backend/cg_intrinsic_conc_async.qz \
        self-hosted/backend/mir_lower_expr_handlers.qz
git commit -m "Add raw pthread_mtx_{lock,unlock} intrinsics — remove mutex_lock(imtx) smell"

Commit message draft:

Add raw pthread_mtx_{lock,unlock} intrinsics — remove mutex_lock(imtx) smell

The Apr 15 ASYNC-MUTEX-MISSED-WAKEUP fix (commit a4cff8ad) grew async_mutex_new
and async_rwlock_new's internal imtx allocation from 64 → 72 bytes to keep the
mutex_lock(imtx) intrinsic's byte-64 load (the "protected value" slot in the
Quartz Mutex<T> protocol) in-bounds. That fix was correct but defended a
semantic smell instead of removing it: imtx is a bare pthread_mutex_t, not a
Quartz Mutex with a value slot, and the byte-64 load/store is waste.

Adds two new intrinsics, both internal (not in typecheck_builtins):
- pthread_mtx_lock(imtx): raw pthread_mutex_lock, no value-slot load
- pthread_mtx_unlock(imtx): raw pthread_mutex_unlock, no value-slot store

Swaps all six mutex_lock(imtx) and six mutex_unlock(imtx, 0) call sites in
mir_emit_async_mutex_lock / mir_emit_async_rwlock_{read,write} over to the
new intrinsics, and shrinks both imtx allocations back to 64 bytes
(exactly pthread_mutex_t-sized on macOS arm64).

Stress results (must match Apr 15 sprint):
- async_mutex_spec:  0 hangs + 0 crashes / 2000 runs
- async_rwlock_spec: 0 hangs + 0 crashes / 1500 runs
- Scheduler sweep: all green
- quake guard: fixpoint verified

Exit criteria

async_mutex_spec stress 2000/2000 clean
async_rwlock_spec stress 1500/1500 clean
Full scheduler sweep green (same list as Apr 15)
quake guard + quake smoke green
imtx allocations back to 64 bytes
Zero mutex_lock(imtx) or mutex_unlock(imtx, 0) call sites remain in mir_lower_expr_handlers.qz (grep-verify)

Failure mode

If Phase 4 stress reveals a regression:

Check race-mode ordering: mutex_lock calls __qz_race_acquire AFTER pthread_mutex_lock; mutex_unlock calls __qz_race_release BEFORE pthread_mutex_unlock. If you flipped either, the race detector’s vector clocks are wrong and downstream tests may fail.
Check that you dropped the second arg from every mutex_unlock call, not just one or two. A missing arg count is a silent codegen bug.
If unclear: revert Phase 2+3 (source changes), keep Phase 1 (new intrinsics declared, unused). The primary is optional polish — the 72-byte fix from Apr 15 is still load-bearing and correct.

After the primary lands (2-3h), pick one or two of these based on energy and remaining context. Ordered by impact/effort ratio.

A. Resolver full scope tracking (Tier 3 #11, 1-2d) — TOP PICK

What: Eliminate the UFCS module-local-collision hole. Currently the resolver only tracks parameter names in scope; local var/const bindings can collide with imported module names during UFCS dispatch. Fix: the resolver tracks every binding in scope, not just params.

Why: World-class bar. A language that confuses user.name() (local user field access) with mod_user::name() (module call) because the resolver doesn’t know about let user = ... is a footgun. The “full scope tracking” approach matches Rust / Swift / TypeScript.

Where: self-hosted/middle/resolver.qz. Grep for how params are currently pushed into the scope stack — the fix is to push every NODE_LET binding through the same pathway.

Risk: Medium. The resolver is core infrastructure; breakage cascades to typecheck and MIR. Run the full QSpec suite before committing (user in a separate terminal — never from Claude Code directly per CLAUDE.md).

Spec coverage needed: Add 3-5 tests in a new resolver_local_shadow_spec.qz that exercise let user = ...; user.name() and let send = ...; send(ch, x) (the second overlaps with the known std/ffi/socket.qz send shadowing issue — may accidentally fix it as a side effect).

Estimate: 1-2d traditional, 6-12 quartz-hours.

B. Pattern matrix exhaustiveness (Tier 3 #12, 3-5d)

What: Rust-style exhaustiveness checking for match expressions. Currently Quartz only checks top-level variants; nested patterns, struct destructuring, and guard conditions are not analyzed. Implement a pattern matrix based on Rust’s usefulness algorithm (or the Luc Maranget paper).

Why: Fills a compiler hole. World-class pattern matching requires exhaustiveness for nested patterns.

Where: self-hosted/middle/typecheck_match.qz. The current code does variant-set exhaustiveness; the refactor extends to matrices.

Risk: High complexity. Worth a dedicated session — do NOT bundle with the primary unless the primary finishes very fast.

Estimate: 3-5d traditional, 1-2 quartz-days.

C. Re-exports / `pub import` (Tier 3 #14, medium)

What: Let modules re-export imported names so import json gives users everything without them needing import json::parse, json::serialize, ....

Why: Clean public API surfaces. Required for package-manager-era library ergonomics.

Where: self-hosted/frontend/parser.qz (new pub import syntax), self-hosted/middle/resolver.qz (propagate re-exports into the module’s exported symbol set).

Risk: Medium. Touches parser + resolver. Needs design thought — how do re-exports interact with priv? Aliasing? Glob re-exports?

Estimate: 1-2 quartz-days.

D. `@x` sigil for implicit self (Tier 3 #16, small-med)

What: @x as shorthand for self.x in impl / extend blocks. Already works inside some contexts; needs design alignment and uniform support.

Why: Ergonomic win. Rubyists will love it.

Where: self-hosted/frontend/parser.qz (tokenize @ prefix), self-hosted/middle/typecheck_walk.qz (rewrite @x to self.x in member context).

Risk: Low. Purely syntactic sugar.

Estimate: 4-6 hours.

E. Concurrency stress expansion

What: The Apr 15 sprint stress-tested sched_park_spec, async_mutex_spec, async_rwlock_spec. There are 40 concurrency-related specs (*sched*, *async*, *channel*, *spawn*, *actor*, *select*, *concurren*). Run each one at 1000+ iterations, look for latent bugs similar to ASYNC-MUTEX-MISSED-WAKEUP. Specifically worth checking:

async_channel_spec — baselines 300/300 (I checked Apr 15 post-session) but only at 300 runs; extend to 2000+. Requires QUARTZ_COMPILER env var since the spec uses subprocess_compile.
select_spec — uncharted.
actor_*_spec — uncharted.
structured_concurrency_spec — uncharted.
spawn_await_spec — part of the Apr 15 sweep but not deep-stressed.

Why: The Apr 15 sprint proved these specs harbored 8-14% hang rates under stress. There may be more. Every latent concurrency bug that ships is one more report-from-users-post-launch.

Where: Write a reusable stress harness in tools/ that takes a spec file, runs N iterations (default 1500), reports pass/hang/other, and saves failing binaries + crash reports.

Risk: Low (read-only, stress testing). High expected value if it finds anything; potentially zero value if all specs are already clean.

Estimate: 3-6 hours for the harness + baseline. Any found bug could add 2-8 hours to fix.

Why this isn’t the top pick: Speculative. We don’t know if there are more bugs. The Resolver fix (A) is a known-good win; this is a gamble.

F. ASYNC-MUTEX wait-list grows unboundedly (latent, not yet investigated)

What: Look at mir_emit_async_mutex_lock’s wait-list append code. Every contended acquire allocates a wait node via ctx.mir_emit_alloc(2) (line ~1124 post-refactor). There’s a free(head) in async_mutex_unlock that pops and frees. Quick audit: does the free happen on every wake path? What about cancellation? If not, there’s a memory leak per contention.

Why: Possible latent leak. The Apr 15 fix was correctness, not memory.

Where: self-hosted/backend/cg_intrinsic_conc_async.qz:114-270 (async_mutex_unlock emission).

Risk: Low (audit only). If a leak exists, the fix is one more free call.

Estimate: 1-2 hours. Cheap to check.

G. VS Code extension publish (Tier 1 #1, 3-4h)

What: Tier 1 launch-blocker. Build .vsix, register in Marketplace, add problem matchers for QSpec output.

Why: User-visible launch item. Not compiler work, but a concrete ship-blocker.

Where: tools/vscode-quartz/ (verify it exists; if not, needs creating from scratch).

Risk: Low on the compiler side; all the risk is in Marketplace setup + extension manifest details.

Estimate: 3-4 hours for build + submit.

Why this isn’t the top pick: Not compiler correctness. The theme of the next session should continue from Apr 15 unless explicitly redirected.

Recommended sprint shape

Based on theme continuity (concurrency correctness → polish → compiler quality) and session budget (one focused context window):

Time	Work
0:00-0:15	Pre-flight (git log, guard check, smoke, backup golden)
0:15-2:30	Primary: Pthread intrinsics refactor (Phase 1-4)
2:30-3:00	Commit primary + roadmap update
3:00-5:30	Secondary A: Resolver full scope tracking
5:30-6:00	Commit secondary + guard + handoff-if-needed

If resolver proves larger than expected (likely — it’s 1-2d traditional), file as “in progress” and hand off. The primary must be shipped before ending the session; the secondary is discretionary.

Alternative sprint if primary proves hairier than expected:

Skip secondary A entirely.
Take target F (async_mutex leak audit) — 1-2 hours, quick win.
Or take target D (@x sigil) — 4-6 hours, small ergonomic win.

Prime directives check

D1 (highest impact): Primary is medium impact (polish, not a bug). Secondary A is the highest-impact available short-budget item. Accept this.
D2 (research first): For the primary, Rust’s parking_lot uses a very similar “raw pthread” pattern internally. No research needed — just good taste and removing the smell.
D3 (pragmatism): The 72-byte fix from Apr 15 is already correct and shipped. This refactor is pure cleanup. Justifiable only because it removes a footgun class.
D4 (multi-session): Fine. If the secondary (resolver) doesn’t fit, hand off cleanly. The primary must ship in one sitting because it touches atomic code paths — don’t leave the refactor half-done.
D5 (report reality): Stress numbers must match Apr 15. If they don’t, don’t paper over it — revert or root-cause.
D6 (fill or file): If the secondary (resolver) gets scoped down, file the remainder as an explicit follow-up. Don’t silent-drop.
D7 (delete freely): The refactor IS a deletion: 12+ call sites swap from a wrong-signature intrinsic to a right-signature one, and the imtx allocations shrink. No “legacy” comments, no compat shims.
D8 (binary discipline): quake guard before commit. Fix-specific backup (quartz-pre-pthread-intrinsics-golden) before touching cg_intrinsic_conc_*. Scheduler stress sweep is the load-bearing smoke test for this work.
D9 (quartz-time): Primary 2-3h. Secondary 6-12h. Don’t pad, don’t shrink.
D10 (corrections are calibration): If the race detector integration breaks in the new intrinsics, the user may correct on ordering — update and move, don’t defend.

Key files quick reference

Area	File	Lines (at baseline a4cff8ad)
Intrinsic name registry	`self-hosted/backend/intrinsic_registry.qz`	100-103
`mutex_lock` codegen handler (reference for pattern)	`self-hosted/backend/cg_intrinsic_conc_sched.qz`	53-93
`mutex_unlock` codegen handler (reference for pattern)	`self-hosted/backend/cg_intrinsic_conc_sched.qz`	95-145
`async_mutex_new` (Phase 3 revert here)	`self-hosted/backend/cg_intrinsic_conc_async.qz`	15-80
`async_rwlock_new` (Phase 3 revert here)	`self-hosted/backend/cg_intrinsic_conc_async.qz`	414-470
`async_mutex_unlock` runtime (reference — uses `@pthread_mutex_lock` directly)	`self-hosted/backend/cg_intrinsic_conc_async.qz`	114-280
`mir_emit_async_rwlock_read` (Phase 2 call-site swaps)	`self-hosted/backend/mir_lower_expr_handlers.qz`	~893-1005
`mir_emit_async_rwlock_write` (Phase 2 call-site swaps)	`self-hosted/backend/mir_lower_expr_handlers.qz`	~1045-1130
`mir_emit_async_mutex_lock` (Phase 2 call-site swaps)	`self-hosted/backend/mir_lower_expr_handlers.qz`	~1140-1250
Stress test fixtures (for regression gate)	`spec/qspec/async_mutex_spec.qz`, `async_rwlock_spec.qz`	—

feedback_crash_reports_first.md — for silent SIGSEGVs on macOS, check ~/Library/Logs/DiagnosticReports/<binary>-*.ips FIRST. Saved my bacon twice on Apr 15.
project_sched_wake_fix.md — full writeup of the Apr 15 scheduler sprint. Contains the ASYNC-MUTEX-MISSED-WAKEUP root cause in detail.
ROADMAP Tier 2 #6, Tier 3 #13, Tier 3 #15 — all closed, all documented. If you want the full post-mortem, read them in the roadmap.
Apr 15 commits (git log a4cff8ad --oneline -7) — the full sprint, from handoff to close. ab9a7829 is the primary fix to read first; a4cff8ad is the ASYNC-MUTEX fix (this handoff’s direct predecessor).

Open questions to ask if stuck

Should pthread_mtx_lock return anything? The current design returns %v<d> = add i64 0, 0 (dummy zero) because MIR expression slots need a value. This is fine for the existing call sites (they discard the result). But if someone wants to call it in an assignment context later, the zero is a misleading result. Consider: should these intrinsics be emitted as statements (no result) instead of expressions? The answer probably depends on how mir_emit_intrinsic handles void-like calls — grep mir_emit_intrinsic usages where the result is discarded for precedent.
Should the intrinsic names have a namespace? pthread_mtx_lock is a bit naked. Alternatives: __pthread_lock, __raw_mutex_lock, mutex_lock_raw. Pick what matches Quartz’s existing internal-intrinsic naming conventions (grep "pthread_" in intrinsic_registry.qz — there may already be precedent).
Is there a reason the existing design uses mutex_lock(imtx) instead of raw pthread? Check the commit history for async_mutex_new / mir_emit_async_mutex_lock — maybe the original author had a reason I’m not seeing (e.g., wanted the race-detector hooks for free, or planned to later store the current holder in the value slot). If so, the refactor needs to preserve that intent.