# roto Zero-allocation Rust protobuf reader and writer. ## Overview Instead of deserializing binary protobuf data into Rust structs, roto scans a message _once_ on construction — recording the byte offset of each field — then reads fields on demand directly from the original bytes. No heap allocation, no data copying, no full deserialization upfront. It also provides a first-class integration with the `tonic` gRPC framework via the `roto-tonic` crate, enabling zero-allocation request/response processing. Writing works the same way: you provide a fixed buffer (or a `bytes::BufMut`) and a builder writes fields directly into it, returning a slice of the bytes written. ## Design `protoc` generates a `CodeGeneratorRequest` message; `protoc-gen-roto` (in `src/bin/protoc-gen-roto.rs`) reads this from stdin, generates Rust source files, and writes a `CodeGeneratorResponse` to stdout. `protoc` then writes those `.rs` files to disk. The generated files are included directly in the crate that uses the protobuffers. Sample usage: ``` protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/ ``` This will generate a file, src/hackers.rs. ## Generated code For each protobuf message roto generates three types: - **Reader struct** `MessageName<'a>` — borrows the original byte slice, zero-copy. - **Builder struct** `MessageNameBuilder<'b>` — writes into a caller-provided `&mut [u8]` or `BufMut`. - **Owned struct** `OwnedMessageName` — owns the byte buffer and implements `RotoOwned`, providing a bridge to the `Reader`. For each protobuf service, roto generates: - **Service Trait** `ServiceName` — a `tonic`-compatible async trait for gRPC service implementations. Nested message types are placed in a `pub mod message_name { ... }` module (snake_case of the parent message name) within the same generated file. ## Sample usage Given this proto definition: ```proto message Hello { string hello_world = 1; message InnerWorld { string thought = 1; } InnerWorld inner_world = 2; } ``` ### Reading ```rust fn parse_proto(data: &[u8]) -> roto::Result { // Scan the data once, recording field offsets let hello = Hello::new(data)?; // String fields return &str borrowed from the original bytes (zero-copy) let hello_world: &str = hello.hello_world()?; // Nested message fields return &[u8]; construct the nested reader from those bytes let inner_bytes: &[u8] = hello.inner_world()?; let inner_world = hello::InnerWorld::new(inner_bytes)?; let thought: &str = inner_world.thought()?; Ok(format!("{} is about {}", hello_world, thought)) } ``` Fields absent from the binary data return `Err(roto::RotoError::FieldNotFound)`. ### Writing Nested messages must be serialized into a scratch buffer first, then embedded as raw bytes in the outer builder. ```rust fn build_proto(buf: &mut [u8]) -> roto::Result<&[u8]> { // Serialize the inner message first let mut inner_buf = [0u8; 256]; let inner_bytes = hello::InnerWorldBuilder::builder(&mut inner_buf) .thought("some thought")? .finish()?; // Build the outer message, embedding the serialized inner bytes HelloBuilder::builder(buf) .hello_world("some world")? .inner_world(inner_bytes)? .finish() // returns Result<&'b mut [u8]> — the written portion of buf } ``` Builder methods consume `self` and return `Result`, enabling `?`-based chaining. `finish()` returns `Result<&'b mut [u8]>` — a slice of the portion of the buffer that was written. ### Updating messages You can read a message, modify specific fields, and use `.with()` to copy the remaining fields from the original binary. ```rust fn update_proto(data: &[u8], buf: &mut [u8]) -> roto::Result<&[u8]> { let msg = Message::new(data)?; let mut builder = MessageBuilder::builder(buf); if msg.foo()? == "bar" { builder = builder.foo("foosbar")?; } builder.with(&msg)?.finish() } ``` ### Repeated fields Repeated fields return a `RepeatedFieldIterator<'a>`. Each item yields `Result<(&[u8], WireType)>`. ```rust let hello = Hello::new(data)?; for item in hello.tags() { let (value_bytes, _wire_type) = item?; // decode value_bytes according to the expected wire type } ``` ## Runtime API The core runtime in `src/lib.rs` provides: - `ProtoAccessor<'a>` — scans a message's fields and reads values at recorded offsets. - `ProtoBuilder<'a>` — writes fields into a provided `&mut [u8]` buffer. - `FieldIterator<'a>` / `RepeatedFieldIterator<'a>` — iterators over fields and repeated fields. - `Tag`, `WireType` — protobuf encoding primitives. - `read_varint`, `write_varint`, `skip_value` — low-level wire-format helpers. - `RotoError`, `Result` — error type and alias. ## High-level design On construction (`MessageName::new(data)`), the generated reader struct iterates the binary once using `FieldIterator` and records the byte offset of each field's tag. Subsequent field accesses call `ProtoAccessor::get_value_at(offset)` — no re-scanning. For repeated fields, the start and end offsets of the field range are recorded to bound iteration efficiently. ## Benchmarks Two benchmark suites share the same binary data files and the same four measurement groups: | Group | What is timed | | ------------------- | ------------------------------------------------------- | | `shallow_parse` | Become ready to read any field (one scan / full decode) | | `deep_parse` | Walk the full tree: Campaign → Operations → Hackers | | `field_access` | Read individual fields on an already-parsed message | | `iterate` | Count top-level and nested repeated fields | | `read_update_write` | Parse, update a field, and serialize back to a buffer | ### 1 — Generate the shared data files (do this once) Data files are written to `data/bench/`. ```sh cargo run --release --bin gen_bench_data -- --preset tiny cargo run --release --bin gen_bench_data -- --preset small cargo run --release --bin gen_bench_data -- --preset medium cargo run --release --bin gen_bench_data -- --preset large ``` For even larger inputs use `--preset huge` (~500 MB) or set the knobs directly: ```sh # ~50 MB: 500 operations × 100 KB stolen_data each cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb ``` ### 2 — Rust benchmark (criterion) ```sh cargo bench --bench hackers_bench ``` HTML reports are written to `target/criterion/`. Run a single group: ```sh cargo bench --bench hackers_bench -- shallow_parse ``` ### 3 — C / upb benchmark Requires protobuf ≥ 21 with `protoc-gen-upb` (ships with modern `protoc`). ```sh cd upb_test make # compiles hackers_bench from the pre-generated upb files ./hackers_bench ``` To regenerate the upb C files from `proto/hackers.proto`: ```sh cd upb_test && make regen ``` ### 4 — Results Measured on Linux x86-64 with the four standard presets. Rust times are criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s. #### `shallow_parse` — cost to become ready to read any field | Size | Bytes | roto (ns) | upb (ns) | roto speedup | | ------ | ----------: | --------: | -----------: | -----------: | | tiny | 588 | 32.7 | 606.2 | **18.5×** | | small | 20,265 | 182.9 | 22,619.2 | **123.7×** | | medium | 2,071,053 | 16,632.0 | 5,346,977.2 | **321×** | | large | 102,608,384 | 1,618.6 | 41,132,079.7 | **25,411×** | > roto's cost is O(number of top-level fields): it records field offsets by > jumping past nested blobs using their length prefixes. upb fully decodes the > entire tree — including all nested messages and raw byte payloads — into > arena-allocated structs. #### `deep_parse` — parse + walk Campaign → Operations → every Hacker handle | Size | Bytes | roto (ns) | upb (ns) | roto speedup | | ------ | --------: | ----------: | ----------: | -----------: | | tiny | 588 | 385.3 | 596.8 | **1.55×** | | small | 20,265 | 13,374.0 | 22,321.6 | **1.67×** | | medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 | **2.91×** | > roto pays one extra `::new()` scan per nesting level; upb's walk is pure > pointer-chasing because everything was decoded upfront. roto is still > faster overall because its per-level scans cost less than upb's full decode. #### `field_access` — individual field reads on a pre-parsed message (`small` preset) | Field | roto (ns) | upb (ns) | upb speedup | | ------------------------------ | --------: | -------: | ----------: | | `campaign::name` | 14.3 | 1.11 | **12.9×** | | `campaign::total_bytes_stolen` | 7.1 | 1.74 | **4.1×** | | `operation::codename` | 13.8 | 1.76 | **7.8×** | | `operation::timestamp` | 9.7 | 1.40 | **6.9×** | | `operation::successful` | 7.0 | 1.13 | **6.1×** | | `hacker::handle` | 14.4 | 1.56 | **9.2×** | | `hacker::skill_level` (f32) | 7.7 | 1.76 | **4.4×** | | `hacker::is_elite` (bool) | 7.5 | 1.14 | **6.6×** | | `worm::polymorphic` (bool) | 7.5 | 1.76 | **4.2×** | | `worm::payload` (bytes) | 16.6 | 1.75 | **9.5×** | > After parsing, upb field reads are direct struct-member lookups (~1–2 ns). > roto re-decodes the value at its pre-recorded byte offset on every call > (~7–17 ns). This is the one area where upb holds a clear advantage. #### `iterate` — count repeated fields (parse included in every iteration) | Benchmark | Size | roto (ns) | upb (ns) | roto speedup | | ------------------ | ------ | --------: | ----------: | -----------: | | `count_operations` | tiny | 50.0 | 600.2 | **12.0×** | | `count_operations` | small | 393.7 | 22,702.9 | **57.7×** | | `count_operations` | medium | 36,628.0 | 4,193,874.0 | **114.5×** | | `count_all_crew` | tiny | 235.3 | 610.2 | **2.6×** | | `count_all_crew` | small | 4,369.5 | 23,109.0 | **5.3×** | | `count_all_crew` | medium | 444,930.0 | 4,151,181.5 | **9.3×** | > `count_operations` includes parsing; upb's O(1) array-length read is > dominated by its full-decode cost, so roto wins by the same margin as > `shallow_parse`. `count_all_crew` also parses each `Operation` sub-message; > roto's per-level scans remain cheaper than upb's full decode. #### `read_update_write` — parse, update a field, and serialize back to a buffer | Size | Bytes | roto (ns) | upb (ns) | roto speedup | | ------ | --------: | --------: | ----------: | -----------: | | tiny | 588 | 153.8 | 1,120.3 | **7.3×** | | small | 20,265 | 1,301.8 | 42,089.6 | **32.3×** | | medium | 2,071,053 | 302,090.0 | 9,233,397.9 | **30.5×** | > roto's `with()` method allows copying fields directly from the original binary > without decoding them, making the update process extremely efficient. upb must > fully parse the message into structs and then re-serialize the entire tree. ### Interpreting the comparison The two libraries have fundamentally different models: - **roto `shallow_parse`** does one linear scan recording byte offsets — no allocation, no field decoding. Subsequent field reads decode on demand at the stored offset. - **upb `Campaign_parse`** fully decodes the entire message tree into arena-allocated structs upfront. Subsequent field reads are direct struct member lookups (~1 ns). The result: roto's parse is faster and allocation-free; upb's field access after parsing is faster. For workloads that read every field the costs invert; for workloads that read a handful of fields from large messages roto wins. ## Protobuf Spec Validation The goal is to validate roto's implementation against the Proto3 specification. ### Supported Features - **Scalar Types**: `double`, `float`, `int32`, `int64`, `uint32`, `uint64`, `sint32`, `sint64`, `fixed32`, `fixed64`, `sfixed32`, `sfixed64`, `bool`, `string`, `bytes`. - **Messages**: Top-level and nested message definitions. - **Enums**: Enum definitions with `from_i32` conversion. - **Field Labels**: Singular and `repeated` fields. - **Field Presence**: No `has_field()` methods are generated to distinguish between a field being absent and a field being set to its default value. - **`oneof` Fields**: Generates enums that allow checking which field is set. - **`map` Fields**: Iterator over underlying key/value pairs. - **Default Values**: There is an option to select the default value for each field. ### Unsupported Features - **Reserved Fields**: `reserved` statements are ignored. - **Options**: Field and message options are ignored.