roto
Zero-allocation Rust protobuf reader and writer.
Overview
Instead of deserializing binary protobuf data into Rust structs, roto scans a message once on construction — recording the byte offset of each field — then reads fields on demand directly from the original bytes. No heap allocation, no data copying, no full deserialization upfront.
Writing works the same way: you provide a fixed buffer and a builder writes fields directly into it, returning a slice of the bytes written.
Design
protoc generates a CodeGeneratorRequest message; protoc-gen-roto (in
src/bin/protoc-gen-roto.rs) reads this from stdin, generates Rust source files, and writes a
CodeGeneratorResponse to stdout. protoc then writes those .rs files to disk. The generated
files are included directly in the crate that uses the protobuffers.
Sample usage:
protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/
This will generate a file, src/hackers.rs.
Generated code
For each protobuf message roto generates two types:
- Reader struct
MessageName<'a>— borrows the original byte slice, zero-copy. - Builder struct
MessageNameBuilder<'b>— writes into a caller-provided&mut [u8].
Nested message types are placed in a pub mod message_name { ... } module (snake_case of the
parent message name) within the same generated file.
Sample usage
Given this proto definition:
message Hello {
string hello_world = 1;
message InnerWorld {
string thought = 1;
}
InnerWorld inner_world = 2;
}
Reading
fn parse_proto(data: &[u8]) -> roto::Result<String> {
// Scan the data once, recording field offsets
let hello = Hello::new(data)?;
// String fields return &str borrowed from the original bytes (zero-copy)
let hello_world: &str = hello.hello_world()?;
// Nested message fields return &[u8]; construct the nested reader from those bytes
let inner_bytes: &[u8] = hello.inner_world()?;
let inner_world = hello::InnerWorld::new(inner_bytes)?;
let thought: &str = inner_world.thought()?;
Ok(format!("{} is about {}", hello_world, thought))
}
Fields absent from the binary data return Err(roto::RotoError::FieldNotFound).
Writing
Nested messages must be serialized into a scratch buffer first, then embedded as raw bytes in the outer builder.
fn build_proto(buf: &mut [u8]) -> roto::Result<&[u8]> {
// Serialize the inner message first
let mut inner_buf = [0u8; 256];
let inner_bytes = hello::InnerWorldBuilder::builder(&mut inner_buf)
.thought("some thought")?
.finish()?;
// Build the outer message, embedding the serialized inner bytes
HelloBuilder::builder(buf)
.hello_world("some world")?
.inner_world(inner_bytes)?
.finish() // returns Result<&'b mut [u8]> — the written portion of buf
}
Builder methods consume self and return Result<Self>, enabling ?-based chaining.
finish() returns Result<&'b mut [u8]> — a slice of the portion of the buffer that was written.
Repeated fields
Repeated fields return a RepeatedFieldIterator<'a>. Each item yields Result<(&[u8], WireType)>.
let hello = Hello::new(data)?;
for item in hello.tags() {
let (value_bytes, _wire_type) = item?;
// decode value_bytes according to the expected wire type
}
Runtime API
The core runtime in src/lib.rs provides:
ProtoAccessor<'a>— scans a message's fields and reads values at recorded offsets.ProtoBuilder<'a>— writes fields into a provided&mut [u8]buffer.FieldIterator<'a>/RepeatedFieldIterator<'a>— iterators over fields and repeated fields.Tag,WireType— protobuf encoding primitives.read_varint,write_varint,skip_value— low-level wire-format helpers.RotoError,Result<T>— error type and alias.
High-level design
On construction (MessageName::new(data)), the generated reader struct iterates the binary once
using FieldIterator and records the byte offset of each field's tag. Subsequent field accesses
call ProtoAccessor::get_value_at(offset) — no re-scanning. For repeated fields, the start and
end offsets of the field range are recorded to bound iteration efficiently.
Benchmarks
Two benchmark suites share the same binary data files and the same four measurement groups:
| Group | What is timed |
|---|---|
shallow_parse |
Become ready to read any field (one scan / full decode) |
deep_parse |
Walk the full tree: Campaign → Operations → Hackers |
field_access |
Read individual fields on an already-parsed message |
iterate |
Count top-level and nested repeated fields |
1 — Generate the shared data files (do this once)
Data files are written to data/bench/.
cargo run --release --bin gen_bench_data -- --preset tiny
cargo run --release --bin gen_bench_data -- --preset small
cargo run --release --bin gen_bench_data -- --preset medium
cargo run --release --bin gen_bench_data -- --preset large
For even larger inputs use --preset huge (~500 MB) or set the knobs
directly:
# ~50 MB: 500 operations × 100 KB stolen_data each
cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb
2 — Rust benchmark (criterion)
cargo bench --bench hackers_bench
HTML reports are written to target/criterion/. Run a single group:
cargo bench --bench hackers_bench -- shallow_parse
3 — C / upb benchmark
Requires protobuf ≥ 21 with protoc-gen-upb (ships with modern protoc).
cd upb_test
make # compiles hackers_bench from the pre-generated upb files
./hackers_bench
To regenerate the upb C files from proto/hackers.proto:
cd upb_test && make regen
4 — Results
Measured on Linux x86-64 with the four standard presets. Rust times are criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.
shallow_parse — cost to become ready to read any field
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|---|---|---|---|---|
| tiny | 588 | 32.7 | 606.2 | 18.5× |
| small | 20,265 | 182.9 | 22,619.2 | 123.7× |
| medium | 2,071,053 | 16,632.0 | 5,346,977.2 | 321× |
| large | 102,608,384 | 1,618.6 | 41,132,079.7 | 25,411× |
roto's cost is O(number of top-level fields): it records field offsets by jumping past nested blobs using their length prefixes. upb fully decodes the entire tree — including all nested messages and raw byte payloads — into arena-allocated structs.
deep_parse — parse + walk Campaign → Operations → every Hacker handle
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|---|---|---|---|---|
| tiny | 588 | 385.3 | 596.8 | 1.55× |
| small | 20,265 | 13,374.0 | 22,321.6 | 1.67× |
| medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 | 2.91× |
roto pays one extra
::new()scan per nesting level; upb's walk is pure pointer-chasing because everything was decoded upfront. roto is still faster overall because its per-level scans cost less than upb's full decode.
field_access — individual field reads on a pre-parsed message (small preset)
| Field | roto (ns) | upb (ns) | upb speedup |
|---|---|---|---|
campaign::name |
14.3 | 1.11 | 12.9× |
campaign::total_bytes_stolen |
7.1 | 1.74 | 4.1× |
operation::codename |
13.8 | 1.76 | 7.8× |
operation::timestamp |
9.7 | 1.40 | 6.9× |
operation::successful |
7.0 | 1.13 | 6.1× |
hacker::handle |
14.4 | 1.56 | 9.2× |
hacker::skill_level (f32) |
7.7 | 1.76 | 4.4× |
hacker::is_elite (bool) |
7.5 | 1.14 | 6.6× |
worm::polymorphic (bool) |
7.5 | 1.76 | 4.2× |
worm::payload (bytes) |
16.6 | 1.75 | 9.5× |
After parsing, upb field reads are direct struct-member lookups (~1–2 ns). roto re-decodes the value at its pre-recorded byte offset on every call (~7–17 ns). This is the one area where upb holds a clear advantage.
iterate — count repeated fields (parse included in every iteration)
| Benchmark | Size | roto (ns) | upb (ns) | roto speedup |
|---|---|---|---|---|
count_operations |
tiny | 50.0 | 600.2 | 12.0× |
count_operations |
small | 393.7 | 22,702.9 | 57.7× |
count_operations |
medium | 36,628.0 | 4,193,874.0 | 114.5× |
count_all_crew |
tiny | 235.3 | 610.2 | 2.6× |
count_all_crew |
small | 4,369.5 | 23,109.0 | 5.3× |
count_all_crew |
medium | 444,930.0 | 4,151,181.5 | 9.3× |
count_operationsincludes parsing; upb's O(1) array-length read is dominated by its full-decode cost, so roto wins by the same margin asshallow_parse.count_all_crewalso parses eachOperationsub-message; roto's per-level scans remain cheaper than upb's full decode.
Interpreting the comparison
The two libraries have fundamentally different models:
- roto
shallow_parsedoes one linear scan recording byte offsets — no allocation, no field decoding. Subsequent field reads decode on demand at the stored offset. - upb
Campaign_parsefully decodes the entire message tree into arena-allocated structs upfront. Subsequent field reads are direct struct member lookups (~1 ns).
The result: roto's parse is faster and allocation-free; upb's field access after parsing is faster. For workloads that read every field the costs invert; for workloads that read a handful of fields from large messages roto wins.