Quartz Benchmark Methodology

Version: 2.2 | Date: Feb 28, 2026 Suite: 11 benchmark pairs (Quartz vs C) + infrastructure Runner: benchmarks/run.sh with optional hyperfine — use quake bench_compile for compile-only

Overview

This document describes the methodology, design principles, and known limitations of the Quartz benchmark suite. The goal is honest, reproducible measurement that fairly compares Quartz to C, accounting for Quartz’s existential type model (everything is i64 at runtime).

Benchmark Inventory

#	Benchmark	Stress Target	QZ Source	C Source	Fair?
1	`fibonacci`	Function call overhead	`fibonacci/fib.qz`	`fibonacci/fib.c`	✅
2	`sum`	Basic loop, accumulator	`sum/sum.qz`	`sum/sum.c`	✅
3	`sieve`	Array access, loops	`sieve/sieve.qz`	`sieve/sieve.c`	✅
4	`matrix`	Nested loops, indexing	`matrix/matrix.qz`	`matrix/matrix.c`	✅
5	`string_concat`	String allocation	`string_concat/concat.qz`	`string_concat/concat.c`	✅
6	`binary_trees`	Allocation, recursion (malloc)	`binary_trees/trees.qz`	`binary_trees/trees.c`	✅
7	`binary_trees_bump`	Allocation, recursion (bump alloc)	`binary_trees/trees_bump.qz`	`binary_trees/trees_bump.c`	✅
8	`nbody`	Integer arithmetic, arrays	`nbody/nbody.qz`	`nbody/nbody.c`	✅
9	`hash_map_bench`	Hash table, string ops	`hash_map_bench.qz`	`c/hash_map_bench.c`	✅
10	`linked_list`	Pointer manipulation	`linked_list.qz`	`c/linked_list.c`	✅
11	`json_parse`	String parsing, tokenization	`json_parse/json_parse.qz`	`c/json_parse.c`	✅

All benchmarks use equivalent algorithms in both languages. The C versions use long (64-bit) to match Quartz’s universal i64.

Fairness Principles

Same algorithm: Both implementations use identical algorithmic approaches
Same data types: C uses long (not int) to match Quartz’s i64
Same allocator: malloc/free in both (unless explicitly comparing allocators)
Same optimization: Both compiled with -O2 (Quartz via llc -O2, C via clang -O2)
Same I/O: Both print results to verify correctness (prevents dead code elimination)

Known Fairness Issues (Resolved)

Hash map: C version now uses dynamic resizing at 75% load + backward-shift deletion (not a trivial open-addressing stub)
JSON parse: C equivalent now exists (c/json_parse.c) for fair comparison
Linked list: Quartz version fixed to free deleted nodes (no memory leak)
Binary trees: Both malloc and bump-allocator variants provided

Compilation

Quartz Compilation Pipeline

# Step 1: Quartz → LLVM IR
./self-hosted/bin/quartz program.qz > program.ll

# Step 2: LLVM IR → object file (with optimization)
llc -O2 -filetype=obj -o program.o program.ll

# Step 3: Object file → native binary
clang -O2 program.o -o program -lm

Note: clang -O2 -x ir is NOT used because it miscompiles the self-hosted compiler IR (SIGSEGV on self-compile). llc -O2 provides codegen-level optimization only.

C Compilation

clang -O2 -o program program.c -lm

Optimization Equivalence

Stage	Quartz	C
Frontend optimization	None (Quartz emits unoptimized IR)	clang frontend optimizations
IR optimization	None (`llc` doesn’t run opt passes)	clang `-O2` runs full LLVM opt pipeline
Codegen optimization	`llc -O2` (instruction selection, scheduling, register allocation)	Same (clang uses llc internally)

[!IMPORTANT] Quartz’s LLVM IR does not go through opt. C code compiled with clang -O2 benefits from the full LLVM optimization pipeline (inlining, loop unrolling, vectorization, etc.). This means C benchmarks may have an optimization advantage beyond algorithmic differences. This is intentional — it represents real-world compilation.

Measurement

Runtime Measurement

Primary method: hyperfine (if installed)

Warmup: 1 run
Measured runs: ≥5
Reports: mean ± stddev, min, max, median

Fallback method: time (built-in)

Reports: real (wall clock), user (CPU), sys (kernel)

Memory Measurement

Peak RSS (Resident Set Size) measured via /usr/bin/time -l on macOS:

Reports maximum resident set size in bytes
Converted to KB in output

Running

# Full suite (compile + run)
bash benchmarks/run.sh

# With hyperfine
bash benchmarks/run.sh --hyperfine

# Compile only (no execution)
bash benchmarks/run.sh --compile-only

# Via Quake
./self-hosted/bin/quake bench_compile

Hardware Specs Template

When reporting benchmark results, include:

Machine:      [model]
CPU:          [model, cores, frequency]
RAM:          [size, type]
OS:           [version]
Quartz:       [version, git hash]
LLVM:         [version]
C compiler:   [clang/gcc version]

Known Limitations

The i64-Everywhere Model

Quartz uses i64 universally at runtime. This means:

No narrow integer optimizations (can’t pack arrays of u8 tighter than i64 unless using Vec<U8>)
No register-width flexibility (everything is 64-bit)
Struct fields are all i64 (no struct packing optimization)

The sieve benchmark uses Vec<U8> (1 byte per element, introduced in Memory Model V2) to demonstrate that Quartz CAN use narrow storage when the programmer opts in.

No Floating-Point Benchmarks

The nbody benchmark uses integer arithmetic (scaled by 1000) instead of doubles, because Quartz’s existential model makes float benchmarks tricky — floats are bitcast to/from i64 at every operation boundary, adding overhead that’s inherent to the model but not representative of algorithmic performance.

Single-Threaded Only

All benchmarks are single-threaded. Quartz has concurrency primitives (spawn, await, channels) but no parallel benchmark has been created yet.

Missing Benchmark Categories

Category	Description	Status
Struct-heavy	Create/access/destroy thousands of structs	Planned
Bare-metal	No stdlib, direct syscalls	Blocked on `--target freestanding`
Sorting	QuickSort, MergeSort on large arrays	Easy to add
Graph algorithms	BFS/DFS on adjacency lists	Easy to add
Regex	Pattern matching on large text	Requires PCRE2 runtime

Adding a New Benchmark

Create benchmarks/<name>/<name>.qz with def main(): Int returning 0
Create benchmarks/<name>/<name>.c (or benchmarks/c/<name>.c) with equivalent C
Add entry to benchmarks/run.sh BENCHMARKS array
Verify both compile and produce identical output
Run bash benchmarks/run.sh --hyperfine to validate measurement

Test That Output Matches

diff <(./build/<name>_qz) <(./build/<name>_c)

Interpretation Guidelines

Equal times = Quartz generates comparable code to C for this workload
Quartz slower = likely due to:
- Missing LLVM opt passes (inlining, vectorization)
- Runtime overhead (tagged pointers for closures, bounds checks on Vec)
- The i64-everywhere model (no narrow types, bitcast overhead for floats)
C slower = unlikely, but could happen if Quartz’s algorithm is simpler (would indicate the benchmark isn’t fair — fix the C version)

The benchmarks represent what Quartz can achieve today. As the compiler adds optimization passes (P.5, inlining, loop unrolling), these numbers should improve.