Bytes — Binary Data Type
Quartz’s Bytes type provides first-class binary data handling with ergonomic construction, structured parsing, and (planned) pattern matching.
Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Naming | Bytes | Concrete, unambiguous. Data too abstract, Buffer implies mutability |
| Endianness | Big-endian default | Network byte order. Most binary data is protocols/file formats |
| String interop | Always copy | Different memory layouts (String = char*, Bytes = Vec header). Different invariants (String = valid text, Bytes = arbitrary octets) |
| Mutability | Immutable output | Bytes.build {} is mutable during construction, result is immutable. Aligns with future let/var split |
| Representation | Vec<Int>-backed | Each byte stored as i64 in Vec data array. Reuses all existing Vec infrastructure including LS.5 slicing and LS.6 custom indexing |
Layer 1: Bytes Type
Construction
# Empty
var data = Bytes.new()
# With capacity hint
var data = Bytes.new(1024)
# From string (copies UTF-8 bytes)
var data = Bytes.from("hello") # [104, 101, 108, 108, 111]
# From array (each value clamped to 0-255)
var data = Bytes.from([0x48, 0x65, 0x6C])
Access
data.size # number of bytes
data[0] # single byte (0-255) via custom index
data[1..4] # slice → new Bytes, via sliceable trait
Positional Readers
Read typed values at a specific byte offset without advancing a cursor:
data.read_u8(at: 0) # unsigned 8-bit
data.read_u16be(at: 0) # unsigned 16-bit, big-endian
data.read_u16le(at: 0) # unsigned 16-bit, little-endian
data.read_u32be(at: 0) # unsigned 32-bit, big-endian
data.read_u32le(at: 0) # unsigned 32-bit, little-endian
data.read_i8(at: 0) # signed 8-bit (sign-extended)
data.read_i16be(at: 0) # signed 16-bit, big-endian
data.read_i32be(at: 0) # signed 32-bit, big-endian
Endianness convention: Methods ending in be are big-endian (network order), le are little-endian. Unadorned u16/u32 without suffix default to big-endian.
String Interop
# Bytes → String (copies, interprets as UTF-8)
var s = data.to_string()
# Hex representation
var hex = data.to_hex() # "48656c6c6f"
# Round-trip
Bytes.from("abc").to_string() == "abc" # true
Comparison
var a = Bytes.from([1, 2, 3])
var b = Bytes.from([1, 2, 3])
a.eq(b) # true
Layer 2: b[...] Literals & Builder
Simple Byte Literal
# Plain byte values
var magic = b[0x89, 0x50, 0x4E, 0x47] # PNG magic bytes
var empty = b[] # empty Bytes
Typed Segment Builder
Segments specify how values are encoded into bytes:
var packet = b[
u8: 1, # 1 byte
u16be: 1024, # 2 bytes, big-endian → [0x04, 0x00]
u32le: sequence, # 4 bytes, little-endian
bytes: payload, # raw Bytes concatenation
string: "OK" # UTF-8 encoded string bytes
]
| Segment | Size | Encoding |
|---|---|---|
u8: | 1 byte | Single byte |
u16be: / u16: | 2 bytes | Big-endian (default) |
u16le: | 2 bytes | Little-endian |
u32be: / u32: | 4 bytes | Big-endian (default) |
u32le: | 4 bytes | Little-endian |
bytes: | variable | Raw byte concatenation |
string: | variable | UTF-8 encoded |
Block Builder
For more complex construction with control flow:
var packet = Bytes.build {
u8(0x01)
u16be(content.size)
if has_checksum
u32be(compute_crc(content))
end
bytes(content)
}
Layer 3: ByteReader Cursor
Sequential reading with automatic position tracking:
var reader = data.reader()
var version = reader.u8() # read 1 byte, advance
var length = reader.u16be() # read 2 bytes, advance
var body = reader.bytes(length) # read N bytes, advance
reader.remaining() # bytes left
reader.eof() # true if no bytes remain
Typical Protocol Parsing
def parse_packet(data: Bytes): Packet
var r = data.reader()
var version = r.u8()
var msg_type = r.u8()
var payload_len = r.u16be()
var payload = r.bytes(payload_len)
return Packet {
version: version,
msg_type: msg_type,
payload: payload
}
end
Layer 4: Binary Pattern Matching (Planned)
Status: Stretch goal. Will be implemented after Layers 1–3 are stable.
Destructure binary data directly in match expressions:
match data
b[0x89, "PNG", rest: bytes] => handle_png(rest)
b[0xFF, 0xD8, rest: bytes] => handle_jpeg(rest)
b[v: u8, len: u16be, body: bytes] => process(v, body)
end
Sub-byte Extraction
Extract nibbles and bit fields:
match header_byte
b[version: u4, ihl: u4] => # IPv4: version=4, ihl=5
end
# version = (byte >> 4) & 0x0F
# ihl = byte & 0x0F
Pattern Semantics
- Literal values (
0x89,"PNG") are matched exactly - Named segments (
v: u8,len: u16be) bind extracted values to variables rest: bytescaptures all remaining bytes (must be last segment)- Sub-byte (
v: u4) uses bit-shift and mask extraction - Non-matching patterns fall through to the next
matcharm