Quartz v5.25

Quartz Benchmark Methodology

Version: 2.2 | Date: Feb 28, 2026 Suite: 11 benchmark pairs (Quartz vs C) + infrastructure Runner: benchmarks/run.sh with optional hyperfine — use quake bench_compile for compile-only


Overview

This document describes the methodology, design principles, and known limitations of the Quartz benchmark suite. The goal is honest, reproducible measurement that fairly compares Quartz to C, accounting for Quartz’s existential type model (everything is i64 at runtime).


Benchmark Inventory

#BenchmarkStress TargetQZ SourceC SourceFair?
1fibonacciFunction call overheadfibonacci/fib.qzfibonacci/fib.c
2sumBasic loop, accumulatorsum/sum.qzsum/sum.c
3sieveArray access, loopssieve/sieve.qzsieve/sieve.c
4matrixNested loops, indexingmatrix/matrix.qzmatrix/matrix.c
5string_concatString allocationstring_concat/concat.qzstring_concat/concat.c
6binary_treesAllocation, recursion (malloc)binary_trees/trees.qzbinary_trees/trees.c
7binary_trees_bumpAllocation, recursion (bump alloc)binary_trees/trees_bump.qzbinary_trees/trees_bump.c
8nbodyInteger arithmetic, arraysnbody/nbody.qznbody/nbody.c
9hash_map_benchHash table, string opshash_map_bench.qzc/hash_map_bench.c
10linked_listPointer manipulationlinked_list.qzc/linked_list.c
11json_parseString parsing, tokenizationjson_parse/json_parse.qzc/json_parse.c

All benchmarks use equivalent algorithms in both languages. The C versions use long (64-bit) to match Quartz’s universal i64.

Fairness Principles

  1. Same algorithm: Both implementations use identical algorithmic approaches
  2. Same data types: C uses long (not int) to match Quartz’s i64
  3. Same allocator: malloc/free in both (unless explicitly comparing allocators)
  4. Same optimization: Both compiled with -O2 (Quartz via llc -O2, C via clang -O2)
  5. Same I/O: Both print results to verify correctness (prevents dead code elimination)

Known Fairness Issues (Resolved)

  • Hash map: C version now uses dynamic resizing at 75% load + backward-shift deletion (not a trivial open-addressing stub)
  • JSON parse: C equivalent now exists (c/json_parse.c) for fair comparison
  • Linked list: Quartz version fixed to free deleted nodes (no memory leak)
  • Binary trees: Both malloc and bump-allocator variants provided

Compilation

Quartz Compilation Pipeline

# Step 1: Quartz → LLVM IR
./self-hosted/bin/quartz program.qz > program.ll

# Step 2: LLVM IR → object file (with optimization)
llc -O2 -filetype=obj -o program.o program.ll

# Step 3: Object file → native binary
clang -O2 program.o -o program -lm

Note: clang -O2 -x ir is NOT used because it miscompiles the self-hosted compiler IR (SIGSEGV on self-compile). llc -O2 provides codegen-level optimization only.

C Compilation

clang -O2 -o program program.c -lm

Optimization Equivalence

StageQuartzC
Frontend optimizationNone (Quartz emits unoptimized IR)clang frontend optimizations
IR optimizationNone (llc doesn’t run opt passes)clang -O2 runs full LLVM opt pipeline
Codegen optimizationllc -O2 (instruction selection, scheduling, register allocation)Same (clang uses llc internally)

[!IMPORTANT] Quartz’s LLVM IR does not go through opt. C code compiled with clang -O2 benefits from the full LLVM optimization pipeline (inlining, loop unrolling, vectorization, etc.). This means C benchmarks may have an optimization advantage beyond algorithmic differences. This is intentional — it represents real-world compilation.


Measurement

Runtime Measurement

Primary method: hyperfine (if installed)

  • Warmup: 1 run
  • Measured runs: ≥5
  • Reports: mean ± stddev, min, max, median

Fallback method: time (built-in)

  • Reports: real (wall clock), user (CPU), sys (kernel)

Memory Measurement

Peak RSS (Resident Set Size) measured via /usr/bin/time -l on macOS:

  • Reports maximum resident set size in bytes
  • Converted to KB in output

Running

# Full suite (compile + run)
bash benchmarks/run.sh

# With hyperfine
bash benchmarks/run.sh --hyperfine

# Compile only (no execution)
bash benchmarks/run.sh --compile-only

# Via Quake
./self-hosted/bin/quake bench_compile

Hardware Specs Template

When reporting benchmark results, include:

Machine:      [model]
CPU:          [model, cores, frequency]
RAM:          [size, type]
OS:           [version]
Quartz:       [version, git hash]
LLVM:         [version]
C compiler:   [clang/gcc version]

Known Limitations

The i64-Everywhere Model

Quartz uses i64 universally at runtime. This means:

  • No narrow integer optimizations (can’t pack arrays of u8 tighter than i64 unless using Vec<U8>)
  • No register-width flexibility (everything is 64-bit)
  • Struct fields are all i64 (no struct packing optimization)

The sieve benchmark uses Vec<U8> (1 byte per element, introduced in Memory Model V2) to demonstrate that Quartz CAN use narrow storage when the programmer opts in.

No Floating-Point Benchmarks

The nbody benchmark uses integer arithmetic (scaled by 1000) instead of doubles, because Quartz’s existential model makes float benchmarks tricky — floats are bitcast to/from i64 at every operation boundary, adding overhead that’s inherent to the model but not representative of algorithmic performance.

Single-Threaded Only

All benchmarks are single-threaded. Quartz has concurrency primitives (spawn, await, channels) but no parallel benchmark has been created yet.

Missing Benchmark Categories

CategoryDescriptionStatus
Struct-heavyCreate/access/destroy thousands of structsPlanned
Bare-metalNo stdlib, direct syscallsBlocked on --target freestanding
SortingQuickSort, MergeSort on large arraysEasy to add
Graph algorithmsBFS/DFS on adjacency listsEasy to add
RegexPattern matching on large textRequires PCRE2 runtime

Adding a New Benchmark

  1. Create benchmarks/<name>/<name>.qz with def main(): Int returning 0
  2. Create benchmarks/<name>/<name>.c (or benchmarks/c/<name>.c) with equivalent C
  3. Add entry to benchmarks/run.sh BENCHMARKS array
  4. Verify both compile and produce identical output
  5. Run bash benchmarks/run.sh --hyperfine to validate measurement

Test That Output Matches

diff <(./build/<name>_qz) <(./build/<name>_c)

Interpretation Guidelines

  1. Equal times = Quartz generates comparable code to C for this workload
  2. Quartz slower = likely due to:
    • Missing LLVM opt passes (inlining, vectorization)
    • Runtime overhead (tagged pointers for closures, bounds checks on Vec)
    • The i64-everywhere model (no narrow types, bitcast overhead for floats)
  3. C slower = unlikely, but could happen if Quartz’s algorithm is simpler (would indicate the benchmark isn’t fair — fix the C version)

The benchmarks represent what Quartz can achieve today. As the compiler adds optimization passes (P.5, inlining, loop unrolling), these numbers should improve.