Files

T

charles d21ff797b0 Add read-update-write benchmark

Update the README with a new benchmark group and performance results.
Include new data files and update the Rust benchmark implementation.
Regenerate UPB bindings and fix a data path in the C benchmark runner.

2026-05-04 23:37:36 -07:00

11 KiB

Raw Blame History

roto

Zero-allocation Rust protobuf reader and writer.

Overview

Instead of deserializing binary protobuf data into Rust structs, roto scans a message once on construction — recording the byte offset of each field — then reads fields on demand directly from the original bytes. No heap allocation, no data copying, no full deserialization upfront.

Writing works the same way: you provide a fixed buffer and a builder writes fields directly into it, returning a slice of the bytes written.

Design

protoc generates a CodeGeneratorRequest message; protoc-gen-roto (in src/bin/protoc-gen-roto.rs) reads this from stdin, generates Rust source files, and writes a CodeGeneratorResponse to stdout. protoc then writes those .rs files to disk. The generated files are included directly in the crate that uses the protobuffers.

Sample usage:

protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/

This will generate a file, src/hackers.rs.

Generated code

For each protobuf message roto generates two types:

Reader struct MessageName<'a> — borrows the original byte slice, zero-copy.
Builder struct MessageNameBuilder<'b> — writes into a caller-provided &mut [u8].

Nested message types are placed in a pub mod message_name { ... } module (snake_case of the parent message name) within the same generated file.

Sample usage

Given this proto definition:

message Hello {
    string hello_world = 1;
    message InnerWorld {
        string thought = 1;
    }
    InnerWorld inner_world = 2;
}

Reading

fn parse_proto(data: &[u8]) -> roto::Result<String> {
    // Scan the data once, recording field offsets
    let hello = Hello::new(data)?;

    // String fields return &str borrowed from the original bytes (zero-copy)
    let hello_world: &str = hello.hello_world()?;

    // Nested message fields return &[u8]; construct the nested reader from those bytes
    let inner_bytes: &[u8] = hello.inner_world()?;
    let inner_world = hello::InnerWorld::new(inner_bytes)?;
    let thought: &str = inner_world.thought()?;

    Ok(format!("{} is about {}", hello_world, thought))
}

Fields absent from the binary data return Err(roto::RotoError::FieldNotFound).

Writing

Nested messages must be serialized into a scratch buffer first, then embedded as raw bytes in the outer builder.

fn build_proto(buf: &mut [u8]) -> roto::Result<&[u8]> {
    // Serialize the inner message first
    let mut inner_buf = [0u8; 256];
    let inner_bytes = hello::InnerWorldBuilder::builder(&mut inner_buf)
        .thought("some thought")?
        .finish()?;

    // Build the outer message, embedding the serialized inner bytes
    HelloBuilder::builder(buf)
        .hello_world("some world")?
        .inner_world(inner_bytes)?
        .finish() // returns Result<&'b mut [u8]> — the written portion of buf
}

Builder methods consume self and return Result<Self>, enabling ?-based chaining. finish() returns Result<&'b mut [u8]> — a slice of the portion of the buffer that was written.

Updating messages

You can read a message, modify specific fields, and use .with() to copy the remaining fields from the original binary.

fn update_proto(data: &[u8], buf: &mut [u8]) -> roto::Result<&[u8]> {
    let msg = Message::new(data)?;

    let mut builder = MessageBuilder::builder(buf);
    if msg.foo()? == "bar" {
        builder = builder.foo("foosbar")?;
    }

    builder.with(&msg)?.finish()
}

Repeated fields

Repeated fields return a RepeatedFieldIterator<'a>. Each item yields Result<(&[u8], WireType)>.

let hello = Hello::new(data)?;
for item in hello.tags() {
    let (value_bytes, _wire_type) = item?;
    // decode value_bytes according to the expected wire type
}

Runtime API

The core runtime in src/lib.rs provides:

ProtoAccessor<'a> — scans a message's fields and reads values at recorded offsets.
ProtoBuilder<'a> — writes fields into a provided &mut [u8] buffer.
FieldIterator<'a> / RepeatedFieldIterator<'a> — iterators over fields and repeated fields.
Tag, WireType — protobuf encoding primitives.
read_varint, write_varint, skip_value — low-level wire-format helpers.
RotoError, Result<T> — error type and alias.

High-level design

On construction (MessageName::new(data)), the generated reader struct iterates the binary once using FieldIterator and records the byte offset of each field's tag. Subsequent field accesses call ProtoAccessor::get_value_at(offset) — no re-scanning. For repeated fields, the start and end offsets of the field range are recorded to bound iteration efficiently.

Benchmarks

Two benchmark suites share the same binary data files and the same four measurement groups:

Group	What is timed
`shallow_parse`	Become ready to read any field (one scan / full decode)
`deep_parse`	Walk the full tree: Campaign → Operations → Hackers
`field_access`	Read individual fields on an already-parsed message
`iterate`	Count top-level and nested repeated fields
`read_update_write`	Parse, update a field, and serialize back to a buffer

1 — Generate the shared data files (do this once)

Data files are written to data/bench/.

cargo run --release --bin gen_bench_data -- --preset tiny
cargo run --release --bin gen_bench_data -- --preset small
cargo run --release --bin gen_bench_data -- --preset medium
cargo run --release --bin gen_bench_data -- --preset large

For even larger inputs use --preset huge (~500 MB) or set the knobs directly:

# ~50 MB: 500 operations × 100 KB stolen_data each
cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb

2 — Rust benchmark (criterion)

cargo bench --bench hackers_bench

HTML reports are written to target/criterion/. Run a single group:

cargo bench --bench hackers_bench -- shallow_parse

3 — C / upb benchmark

Requires protobuf ≥ 21 with protoc-gen-upb (ships with modern protoc).

cd upb_test
make          # compiles hackers_bench from the pre-generated upb files
./hackers_bench

To regenerate the upb C files from proto/hackers.proto:

cd upb_test && make regen

4 — Results

Measured on Linux x86-64 with the four standard presets. Rust times are criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.

`shallow_parse` — cost to become ready to read any field

Size	Bytes	roto (ns)	upb (ns)	roto speedup
tiny	588	32.7	606.2	18.5×
small	20,265	182.9	22,619.2	123.7×
medium	2,071,053	16,632.0	5,346,977.2	321×
large	102,608,384	1,618.6	41,132,079.7	25,411×

roto's cost is O(number of top-level fields): it records field offsets by jumping past nested blobs using their length prefixes. upb fully decodes the entire tree — including all nested messages and raw byte payloads — into arena-allocated structs.

`deep_parse` — parse + walk Campaign → Operations → every Hacker handle

Size	Bytes	roto (ns)	upb (ns)	roto speedup
tiny	588	385.3	596.8	1.55×
small	20,265	13,374.0	22,321.6	1.67×
medium	2,071,053	1,454,400.0	4,227,384.3	2.91×

roto pays one extra ::new() scan per nesting level; upb's walk is pure pointer-chasing because everything was decoded upfront. roto is still faster overall because its per-level scans cost less than upb's full decode.

`field_access` — individual field reads on a pre-parsed message (`small` preset)

Field	roto (ns)	upb (ns)	upb speedup
`campaign::name`	14.3	1.11	12.9×
`campaign::total_bytes_stolen`	7.1	1.74	4.1×
`operation::codename`	13.8	1.76	7.8×
`operation::timestamp`	9.7	1.40	6.9×
`operation::successful`	7.0	1.13	6.1×
`hacker::handle`	14.4	1.56	9.2×
`hacker::skill_level` (f32)	7.7	1.76	4.4×
`hacker::is_elite` (bool)	7.5	1.14	6.6×
`worm::polymorphic` (bool)	7.5	1.76	4.2×
`worm::payload` (bytes)	16.6	1.75	9.5×

After parsing, upb field reads are direct struct-member lookups (~1–2 ns). roto re-decodes the value at its pre-recorded byte offset on every call (~7–17 ns). This is the one area where upb holds a clear advantage.

`iterate` — count repeated fields (parse included in every iteration)

Benchmark	Size	roto (ns)	upb (ns)	roto speedup
`count_operations`	tiny	50.0	600.2	12.0×
`count_operations`	small	393.7	22,702.9	57.7×
`count_operations`	medium	36,628.0	4,193,874.0	114.5×
`count_all_crew`	tiny	235.3	610.2	2.6×
`count_all_crew`	small	4,369.5	23,109.0	5.3×
`count_all_crew`	medium	444,930.0	4,151,181.5	9.3×

count_operations includes parsing; upb's O(1) array-length read is dominated by its full-decode cost, so roto wins by the same margin as shallow_parse. count_all_crew also parses each Operation sub-message; roto's per-level scans remain cheaper than upb's full decode.

`read_update_write` — parse, update a field, and serialize back to a buffer

Size	Bytes	roto (ns)	upb (ns)	roto speedup
tiny	588	153.8	1,120.3	7.3×
small	20,265	1,301.8	42,089.6	32.3×
medium	2,071,053	302,090.0	9,233,397.9	30.5×

roto's with() method allows copying fields directly from the original binary without decoding them, making the update process extremely efficient. upb must fully parse the message into structs and then re-serialize the entire tree.

Interpreting the comparison

The two libraries have fundamentally different models:

roto shallow_parse does one linear scan recording byte offsets — no allocation, no field decoding. Subsequent field reads decode on demand at the stored offset.
upb Campaign_parse fully decodes the entire message tree into arena-allocated structs upfront. Subsequent field reads are direct struct member lookups (~1 ns).

The result: roto's parse is faster and allocation-free; upb's field access after parsing is faster. For workloads that read every field the costs invert; for workloads that read a handful of fields from large messages roto wins.

Literature

https://protobuf.dev/programming-guides/encoding/

11 KiB Raw Blame History Unescape Escape