Add performance results section to README

2026-05-04 14:53:49 -07:00
parent 4a6a09cff1
commit 6821bd1cca
1 changed files with 66 additions and 0 deletions
@@ -184,6 +184,72 @@ To regenerate the upb C files from `proto/hackers.proto`:
 cd upb_test && make regen
 ```

+### 4 — Results
+
+Measured on Linux x86-64 with the four standard presets. Rust times are
+criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.
+
+#### `shallow_parse` — cost to become ready to read any field
+
+| Size   |       Bytes | roto (ns) |     upb (ns) | roto speedup |
+| ------ | ----------: | --------: | -----------: | -----------: |
+| tiny   |         588 |      32.7 |        606.2 |    **18.5×** |
+| small  |      20,265 |     182.9 |     22,619.2 |   **123.7×** |
+| medium |   2,071,053 |  16,632.0 |  5,346,977.2 |     **321×** |
+| large  | 102,608,384 |   1,618.6 | 41,132,079.7 |  **25,411×** |
+
+> roto's cost is O(number of top-level fields): it records field offsets by
+> jumping past nested blobs using their length prefixes. upb fully decodes the
+> entire tree — including all nested messages and raw byte payloads — into
+> arena-allocated structs.
+
+#### `deep_parse` — parse + walk Campaign → Operations → every Hacker handle
+
+| Size   |     Bytes |   roto (ns) |    upb (ns) | roto speedup |
+| ------ | --------: | ----------: | ----------: | -----------: |
+| tiny   |       588 |       385.3 |       596.8 |    **1.55×** |
+| small  |    20,265 |    13,374.0 |    22,321.6 |    **1.67×** |
+| medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 |    **2.91×** |
+
+> roto pays one extra `::new()` scan per nesting level; upb's walk is pure
+> pointer-chasing because everything was decoded upfront. roto is still
+> faster overall because its per-level scans cost less than upb's full decode.
+
+#### `field_access` — individual field reads on a pre-parsed message (`small` preset)
+
+| Field                          | roto (ns) | upb (ns) | upb speedup |
+| ------------------------------ | --------: | -------: | ----------: |
+| `campaign::name`               |      14.3 |     1.11 |   **12.9×** |
+| `campaign::total_bytes_stolen` |       7.1 |     1.74 |    **4.1×** |
+| `operation::codename`          |      13.8 |     1.76 |    **7.8×** |
+| `operation::timestamp`         |       9.7 |     1.40 |    **6.9×** |
+| `operation::successful`        |       7.0 |     1.13 |    **6.1×** |
+| `hacker::handle`               |      14.4 |     1.56 |    **9.2×** |
+| `hacker::skill_level` (f32)    |       7.7 |     1.76 |    **4.4×** |
+| `hacker::is_elite` (bool)      |       7.5 |     1.14 |    **6.6×** |
+| `worm::polymorphic` (bool)     |       7.5 |     1.76 |    **4.2×** |
+| `worm::payload` (bytes)        |      16.6 |     1.75 |    **9.5×** |
+
+> After parsing, upb field reads are direct struct-member lookups (~1–2 ns).
+> roto re-decodes the value at its pre-recorded byte offset on every call
+> (~7–17 ns). This is the one area where upb holds a clear advantage.
+
+#### `iterate` — count repeated fields (parse included in every iteration)
+
+| Benchmark          | Size   | roto (ns) |    upb (ns) | roto speedup |
+| ------------------ | ------ | --------: | ----------: | -----------: |
+| `count_operations` | tiny   |      50.0 |       600.2 |    **12.0×** |
+| `count_operations` | small  |     393.7 |    22,702.9 |    **57.7×** |
+| `count_operations` | medium |  36,628.0 | 4,193,874.0 |   **114.5×** |
+| `count_all_crew`   | tiny   |     235.3 |       610.2 |     **2.6×** |
+| `count_all_crew`   | small  |   4,369.5 |    23,109.0 |     **5.3×** |
+| `count_all_crew`   | medium | 444,930.0 | 4,151,181.5 |     **9.3×** |
+
+> `count_operations` includes parsing; upb's O(1) array-length read is
+> dominated by its full-decode cost, so roto wins by the same margin as
+> `shallow_parse`. `count_all_crew` also parses each `Operation` sub-message;
+> roto's per-level scans remain cheaper than upb's full decode.
+
 ### Interpreting the comparison

 The two libraries have fundamentally different models: