Add performance results section to README
This commit is contained in:
@@ -184,6 +184,72 @@ To regenerate the upb C files from `proto/hackers.proto`:
|
||||
cd upb_test && make regen
|
||||
```
|
||||
|
||||
### 4 — Results
|
||||
|
||||
Measured on Linux x86-64 with the four standard presets. Rust times are
|
||||
criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.
|
||||
|
||||
#### `shallow_parse` — cost to become ready to read any field
|
||||
|
||||
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|
||||
| ------ | ----------: | --------: | -----------: | -----------: |
|
||||
| tiny | 588 | 32.7 | 606.2 | **18.5×** |
|
||||
| small | 20,265 | 182.9 | 22,619.2 | **123.7×** |
|
||||
| medium | 2,071,053 | 16,632.0 | 5,346,977.2 | **321×** |
|
||||
| large | 102,608,384 | 1,618.6 | 41,132,079.7 | **25,411×** |
|
||||
|
||||
> roto's cost is O(number of top-level fields): it records field offsets by
|
||||
> jumping past nested blobs using their length prefixes. upb fully decodes the
|
||||
> entire tree — including all nested messages and raw byte payloads — into
|
||||
> arena-allocated structs.
|
||||
|
||||
#### `deep_parse` — parse + walk Campaign → Operations → every Hacker handle
|
||||
|
||||
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|
||||
| ------ | --------: | ----------: | ----------: | -----------: |
|
||||
| tiny | 588 | 385.3 | 596.8 | **1.55×** |
|
||||
| small | 20,265 | 13,374.0 | 22,321.6 | **1.67×** |
|
||||
| medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 | **2.91×** |
|
||||
|
||||
> roto pays one extra `::new()` scan per nesting level; upb's walk is pure
|
||||
> pointer-chasing because everything was decoded upfront. roto is still
|
||||
> faster overall because its per-level scans cost less than upb's full decode.
|
||||
|
||||
#### `field_access` — individual field reads on a pre-parsed message (`small` preset)
|
||||
|
||||
| Field | roto (ns) | upb (ns) | upb speedup |
|
||||
| ------------------------------ | --------: | -------: | ----------: |
|
||||
| `campaign::name` | 14.3 | 1.11 | **12.9×** |
|
||||
| `campaign::total_bytes_stolen` | 7.1 | 1.74 | **4.1×** |
|
||||
| `operation::codename` | 13.8 | 1.76 | **7.8×** |
|
||||
| `operation::timestamp` | 9.7 | 1.40 | **6.9×** |
|
||||
| `operation::successful` | 7.0 | 1.13 | **6.1×** |
|
||||
| `hacker::handle` | 14.4 | 1.56 | **9.2×** |
|
||||
| `hacker::skill_level` (f32) | 7.7 | 1.76 | **4.4×** |
|
||||
| `hacker::is_elite` (bool) | 7.5 | 1.14 | **6.6×** |
|
||||
| `worm::polymorphic` (bool) | 7.5 | 1.76 | **4.2×** |
|
||||
| `worm::payload` (bytes) | 16.6 | 1.75 | **9.5×** |
|
||||
|
||||
> After parsing, upb field reads are direct struct-member lookups (~1–2 ns).
|
||||
> roto re-decodes the value at its pre-recorded byte offset on every call
|
||||
> (~7–17 ns). This is the one area where upb holds a clear advantage.
|
||||
|
||||
#### `iterate` — count repeated fields (parse included in every iteration)
|
||||
|
||||
| Benchmark | Size | roto (ns) | upb (ns) | roto speedup |
|
||||
| ------------------ | ------ | --------: | ----------: | -----------: |
|
||||
| `count_operations` | tiny | 50.0 | 600.2 | **12.0×** |
|
||||
| `count_operations` | small | 393.7 | 22,702.9 | **57.7×** |
|
||||
| `count_operations` | medium | 36,628.0 | 4,193,874.0 | **114.5×** |
|
||||
| `count_all_crew` | tiny | 235.3 | 610.2 | **2.6×** |
|
||||
| `count_all_crew` | small | 4,369.5 | 23,109.0 | **5.3×** |
|
||||
| `count_all_crew` | medium | 444,930.0 | 4,151,181.5 | **9.3×** |
|
||||
|
||||
> `count_operations` includes parsing; upb's O(1) array-length read is
|
||||
> dominated by its full-decode cost, so roto wins by the same margin as
|
||||
> `shallow_parse`. `count_all_crew` also parses each `Operation` sub-message;
|
||||
> roto's per-level scans remain cheaper than upb's full decode.
|
||||
|
||||
### Interpreting the comparison
|
||||
|
||||
The two libraries have fundamentally different models:
|
||||
|
||||
Reference in New Issue
Block a user