Quartz Benchmark Methodology
Version: 2.2 | Date: Feb 28, 2026 Suite: 11 benchmark pairs (Quartz vs C) + infrastructure Runner:
benchmarks/run.shwith optionalhyperfine— usequake bench_compilefor compile-only
Overview
This document describes the methodology, design principles, and known limitations of
the Quartz benchmark suite. The goal is honest, reproducible measurement that
fairly compares Quartz to C, accounting for Quartz’s existential type model (everything
is i64 at runtime).
Benchmark Inventory
| # | Benchmark | Stress Target | QZ Source | C Source | Fair? |
|---|---|---|---|---|---|
| 1 | fibonacci | Function call overhead | fibonacci/fib.qz | fibonacci/fib.c | ✅ |
| 2 | sum | Basic loop, accumulator | sum/sum.qz | sum/sum.c | ✅ |
| 3 | sieve | Array access, loops | sieve/sieve.qz | sieve/sieve.c | ✅ |
| 4 | matrix | Nested loops, indexing | matrix/matrix.qz | matrix/matrix.c | ✅ |
| 5 | string_concat | String allocation | string_concat/concat.qz | string_concat/concat.c | ✅ |
| 6 | binary_trees | Allocation, recursion (malloc) | binary_trees/trees.qz | binary_trees/trees.c | ✅ |
| 7 | binary_trees_bump | Allocation, recursion (bump alloc) | binary_trees/trees_bump.qz | binary_trees/trees_bump.c | ✅ |
| 8 | nbody | Integer arithmetic, arrays | nbody/nbody.qz | nbody/nbody.c | ✅ |
| 9 | hash_map_bench | Hash table, string ops | hash_map_bench.qz | c/hash_map_bench.c | ✅ |
| 10 | linked_list | Pointer manipulation | linked_list.qz | c/linked_list.c | ✅ |
| 11 | json_parse | String parsing, tokenization | json_parse/json_parse.qz | c/json_parse.c | ✅ |
All benchmarks use equivalent algorithms in both languages. The C versions use
long (64-bit) to match Quartz’s universal i64.
Fairness Principles
- Same algorithm: Both implementations use identical algorithmic approaches
- Same data types: C uses
long(notint) to match Quartz’si64 - Same allocator:
malloc/freein both (unless explicitly comparing allocators) - Same optimization: Both compiled with
-O2(Quartz viallc -O2, C viaclang -O2) - Same I/O: Both print results to verify correctness (prevents dead code elimination)
Known Fairness Issues (Resolved)
- Hash map: C version now uses dynamic resizing at 75% load + backward-shift deletion (not a trivial open-addressing stub)
- JSON parse: C equivalent now exists (
c/json_parse.c) for fair comparison - Linked list: Quartz version fixed to free deleted nodes (no memory leak)
- Binary trees: Both malloc and bump-allocator variants provided
Compilation
Quartz Compilation Pipeline
# Step 1: Quartz → LLVM IR
./self-hosted/bin/quartz program.qz > program.ll
# Step 2: LLVM IR → object file (with optimization)
llc -O2 -filetype=obj -o program.o program.ll
# Step 3: Object file → native binary
clang -O2 program.o -o program -lm
Note: clang -O2 -x ir is NOT used because it miscompiles the self-hosted compiler
IR (SIGSEGV on self-compile). llc -O2 provides codegen-level optimization only.
C Compilation
clang -O2 -o program program.c -lm
Optimization Equivalence
| Stage | Quartz | C |
|---|---|---|
| Frontend optimization | None (Quartz emits unoptimized IR) | clang frontend optimizations |
| IR optimization | None (llc doesn’t run opt passes) | clang -O2 runs full LLVM opt pipeline |
| Codegen optimization | llc -O2 (instruction selection, scheduling, register allocation) | Same (clang uses llc internally) |
[!IMPORTANT] Quartz’s LLVM IR does not go through
opt. C code compiled withclang -O2benefits from the full LLVM optimization pipeline (inlining, loop unrolling, vectorization, etc.). This means C benchmarks may have an optimization advantage beyond algorithmic differences. This is intentional — it represents real-world compilation.
Measurement
Runtime Measurement
Primary method: hyperfine (if installed)
- Warmup: 1 run
- Measured runs: ≥5
- Reports: mean ± stddev, min, max, median
Fallback method: time (built-in)
- Reports: real (wall clock), user (CPU), sys (kernel)
Memory Measurement
Peak RSS (Resident Set Size) measured via /usr/bin/time -l on macOS:
- Reports maximum resident set size in bytes
- Converted to KB in output
Running
# Full suite (compile + run)
bash benchmarks/run.sh
# With hyperfine
bash benchmarks/run.sh --hyperfine
# Compile only (no execution)
bash benchmarks/run.sh --compile-only
# Via Quake
./self-hosted/bin/quake bench_compile
Hardware Specs Template
When reporting benchmark results, include:
Machine: [model]
CPU: [model, cores, frequency]
RAM: [size, type]
OS: [version]
Quartz: [version, git hash]
LLVM: [version]
C compiler: [clang/gcc version]
Known Limitations
The i64-Everywhere Model
Quartz uses i64 universally at runtime. This means:
- No narrow integer optimizations (can’t pack arrays of
u8tighter thani64unless usingVec<U8>) - No register-width flexibility (everything is 64-bit)
- Struct fields are all
i64(no struct packing optimization)
The sieve benchmark uses Vec<U8> (1 byte per element, introduced in Memory Model V2)
to demonstrate that Quartz CAN use narrow storage when the programmer opts in.
No Floating-Point Benchmarks
The nbody benchmark uses integer arithmetic (scaled by 1000) instead of doubles,
because Quartz’s existential model makes float benchmarks tricky — floats are bitcast
to/from i64 at every operation boundary, adding overhead that’s inherent to the model
but not representative of algorithmic performance.
Single-Threaded Only
All benchmarks are single-threaded. Quartz has concurrency primitives (spawn, await,
channels) but no parallel benchmark has been created yet.
Missing Benchmark Categories
| Category | Description | Status |
|---|---|---|
| Struct-heavy | Create/access/destroy thousands of structs | Planned |
| Bare-metal | No stdlib, direct syscalls | Blocked on --target freestanding |
| Sorting | QuickSort, MergeSort on large arrays | Easy to add |
| Graph algorithms | BFS/DFS on adjacency lists | Easy to add |
| Regex | Pattern matching on large text | Requires PCRE2 runtime |
Adding a New Benchmark
- Create
benchmarks/<name>/<name>.qzwithdef main(): Intreturning 0 - Create
benchmarks/<name>/<name>.c(orbenchmarks/c/<name>.c) with equivalent C - Add entry to
benchmarks/run.shBENCHMARKSarray - Verify both compile and produce identical output
- Run
bash benchmarks/run.sh --hyperfineto validate measurement
Test That Output Matches
diff <(./build/<name>_qz) <(./build/<name>_c)
Interpretation Guidelines
- Equal times = Quartz generates comparable code to C for this workload
- Quartz slower = likely due to:
- Missing LLVM
optpasses (inlining, vectorization) - Runtime overhead (tagged pointers for closures, bounds checks on Vec)
- The i64-everywhere model (no narrow types, bitcast overhead for floats)
- Missing LLVM
- C slower = unlikely, but could happen if Quartz’s algorithm is simpler (would indicate the benchmark isn’t fair — fix the C version)
The benchmarks represent what Quartz can achieve today. As the compiler adds optimization passes (P.5, inlining, loop unrolling), these numbers should improve.