Unikernel: page fault after ~1h24m uptime (PF at 0xFF…FF88)
Status: open — NOT root-caused. Production mitigation in place (see §“Mitigation” below). Next-session task: collect RIP from the enhanced PF ISR and disassemble against the ELF to pinpoint the faulting instruction.
Filed: 2026-04-20, immediately after the virtio-ring-wrap fix
(commit 2a066cb2) surfaced this second failure mode.
Symptom
Kernel runs normally for 1h24m (14 960 HTTP requests served, all via
/api/stats.json + /health + baked pages), then:
[rx: u=10715 c=76250 len=680]
HTTP: request received (616 bytes)
PF at 0xffffffffffffff88 err=0x0000000000000000
No further log output. systemd reports active (running) because
QEMU keeps running (@c("cli; 1: hlt; jmp 1b") in the ISR never
exits, and QEMU doesn’t know the guest halted). CPU pegs at 100% of
one core (HLT spinning).
Error-code 0x00 decodes to read, supervisor, non-present, data.
CR2 = 0xFFFFFFFFFFFFFF88 = -120 as signed i64 = 2^64 - 120
unsigned. This is deep into the kernel canonical half of the x86_64
address space — unmapped in our identity-paging microvm. So the
kernel dereferenced a pointer that is effectively 0 - 120 (or any
equivalent small-negative computation).
What IS and ISN’T the story
Ruled out:
- Stack overflow. Our 64 KiB stack lives in the low physical half
(~0x600000); an overflow wouldn’t land at
-120. - PMM exhaustion.
pmm_pages_usedstable at 2 482 / 16 384 across the crash. Log didn’t print “OOM” (whichpmm_alloc_pagewould). - The virtio-net 16-bit ring wrap bug — that’s fixed (commit
2a066cb2, docs/bugs/UNIKERNEL_TX_IDX_WRAP_HANG.md).tx_stalls=0,c=76250(passed the 65 536 wrap twice), the wrap isn’t involved. - Quartz
volatile_load<U32>sign-extension. Codegen emitszext, notsext(verified incg_intrinsic_memory.qz), so U32 → i64 loads can’t produce kernel-half addresses on their own.
Not ruled out (candidate root causes for a future session):
-
TCP-send chunker arithmetic.
src_ptr = body_ptr + (sent - hdr_len)athello_x86.qz:2232. No direct path produces-120, but ifbody_ptris 0 (dynamic-route response, no asset body) AND a logic bug makes theelsebranch fire (shouldn’t,sent < hdr_lenguards it), we’d compute0 + (-hdr_len) = -hdr_len. Forhdr_len = 120, that’s-120exactly. Leading hypothesis./api/stats.jsonhas a header of ~180 bytes and body also ~180 bytes;/healthhashdr_len = ~120, body 0. A specific race betweenhdr_lenandtotal_lencomputation that producessent > hdr_lenwhenbody_ptr = 0would do it. -
If-None-Match parsing at
asset_if_none_match(). Ifreq_lenis 0 or very small andnlen + ETAG_HEX_LEN + 1exceeds it,limit = req_len - nlen - ETAG_HEX_LEN - 1could go negative.while i < limitwith a negative (huge unsigned) limit would loop very long, reading past buffer bounds. Could PF anywhere. -
Something in the recent-records ring (
recent_record). Bounded by% RECENT_SLOTS (=64)and entries are 96 bytes, well within the two pages allocated. Probably safe, but worth a closer look with RIP evidence in hand.
Mitigation (shipped)
Rather than chase a production hang we can’t reproduce locally, we added belt-and-suspenders uptime plumbing that makes the bug invisible to users:
-
RuntimeMaxSec=45minonquartz-unikernel.service. Forces systemd to cycle the guest every 45 min — well under the 1h24m observed hang threshold. Combined withRestart=always, the unikernel is never allowed to run long enough to hit the PF in production. Browsers see a sub-second reconnect at each cycle. -
quartz-unikernel-healthcheck.timer(every 2 min). Curls/healthfrom the host; on two consecutive failures (covering the mid-restart window) it runssystemctl restart. This is defense-in-depth in case a hang happens within a 45-min cycle window, faster than RuntimeMaxSec would catch. -
Enhanced
page_fault_isr(commit TBD) now prints RIP, CS, RFLAGS, RSP, SS in addition to CR2 + error code. Next time the PF fires (e.g., during a future long test run that disables the RuntimeMaxSec cycle), the log will tell us exactly which instruction address did the bad dereference.addr2lineagainsttmp/baremetal/quartz-unikernel.elfwill map RIP to a source line and we can finish the diagnosis in ten minutes.
How to reproduce for investigation
The cycle timer prevents natural repro. To trigger the bug deliberately:
# On the VPS — disable the cycle timer for a focused diagnostic run:
ssh mattkelly.io 'systemctl edit quartz-unikernel --runtime
# drop a [Service] override with RuntimeMaxSec= (empty, to clear)
# save and exit
systemctl restart quartz-unikernel
'
# From the host — hammer /api/stats.json for ~90 min:
seq 1 1000000 | xargs -P 50 -I{} curl -sk --max-time 8 -o /dev/null \
https://mattkelly.io/api/stats.json
# When the PF prints, read the full diagnostic:
ssh mattkelly.io 'grep -A3 "PF at" /var/log/quartz-unikernel.log'
# Expect output like:
# PF at 0xffffffffffffff88 err=0x0000000000000000
# RIP=0x0000000000123456 CS=0x0000000000000008
# RFL=0x0000000000010206 RSP=0x... SS=0x...
# Then on the host (need llvm-addr2line or objdump):
addr2line -e tmp/baremetal/quartz-unikernel.elf 0x123456
# → should name the faulting function + line
Regression lock
When the PF is root-caused and fixed, add a test that runs 2 000 000+ HTTP requests through QEMU in CI (or a scripted local loop) without a reset, and asserts:
tx_stalls = 0- process RSS stable
/healthresponsive throughout- serial log contains no
PF atmessage
References
docs/bugs/UNIKERNEL_TX_IDX_WRAP_HANG.md— the sibling bug fixed in commit2a066cb2that this PF exposed (previously masked by the 10-min virtio-wrap hang).docs/bugs/UNIKERNEL_TX_STALL_209KB.md— third open issue (TX stall at 209 KB asset boundary; DEF-D in KERNEL_EPIC).