Benchmarks
- Machine: Apple Silicon (M-series)
- Tool:
wrk -t2 -c10 -d10s - Workload: Hello world, JSON response, SIMD (Go calling Zig sum/dot/scale/minmax)
- Wrangler dev for GoMode (local miniflare runtime)
Results
Section titled “Results”| Native Go | GoMode (Worker) | GoMode (DO) | Std Go WASM | |
|---|---|---|---|---|
| GET / req/sec | 80,715 | 3,764 | 1,586 | 614 |
| GET /json req/sec | 83,182 | 3,715 | — | 537 |
| GET /simd req/sec | — | 3,692 | 1,574 | — |
| Latency (avg) | 0.6ms | 3.2ms | 7.2ms | 78ms |
| Binary size | native | 79KB | 79KB | 3.0MB |
Note: GoMode numbers are bottlenecked by wrangler dev (~3.7K req/sec ceiling). Production CF edge numbers will be significantly higher.
GoMode vs Standard Go WASM
Section titled “GoMode vs Standard Go WASM”6.1x faster throughput with a 38x smaller binary.
Standard Go produces a 3MB WASM binary with a heavy runtime (goroutine scheduler, full GC, large stdlib). GoMode produces a 79KB binary — TinyGo with gc=leaking, scheduler=none, and Zig linked directly in.
Zero overhead Zig SIMD
Section titled “Zero overhead Zig SIMD”The /simd route calls Zig SIMD functions (sum, dot product, scale, minmax) from Go. Throughput is identical to the plain hello world — 3,692 vs 3,764 req/sec. This confirms the CGo linking adds zero overhead: Zig functions are direct call instructions in the same WASM binary.
How zero-copy works
Section titled “How zero-copy works”GoMode eliminates all serialization between JS and Go:
- JS writes request fields as zerobuf tagged values directly into WASM memory (32 bytes for 2 fields)
- Go reads at fixed offsets — no parsing
- Go writes response at fixed offsets — no serialization
- JS reads at fixed offsets — no parsing
Compare to standard Go WASM which requires JSON.stringify → encode → copy → json.Unmarshal → process → json.Marshal → copy → JSON.parse on every request.
Worker vs Durable Object
Section titled “Worker vs Durable Object”Worker mode is 2.4x faster than DO mode because:
- No DO queue — requests don’t serialize through a single instance
- Horizontal scaling — CF spins up multiple isolates, each with its own cached WASM instance
- Same warm latency — both have ~1ms per-request cost once WASM is initialized
Use DO mode when you need state (WebSocket sessions, counters, etc.). Use Worker mode for stateless handlers.
What’s measured
Section titled “What’s measured”These benchmarks run on wrangler dev (local miniflare), not production CF edge. The wrangler dev server itself caps around ~3.7K req/sec, so these numbers reflect the wrangler bottleneck — not GoMode’s actual throughput limit.
What’s NOT measured yet
Section titled “What’s NOT measured yet”- Production CF edge — deploy benchmarks (coming soon)
- Columnar SIMD workloads — bulk data transforms with Zig SIMD
- Async CF bindings — KV, R2, D1 via Asyncify (not integrated yet)
Reproduce
Section titled “Reproduce”# Install wrkbrew install wrk
# Build single WASM binarynpm run build
# Start dev servernpm run dev
# Benchmarkwrk -t2 -c10 -d10s http://localhost:8787/ # Worker modewrk -t2 -c10 -d10s http://localhost:8787/json # Worker JSONwrk -t2 -c10 -d10s http://localhost:8787/simd # Worker SIMDwrk -t2 -c10 -d10s http://localhost:8787/do/ # DO modewrk -t2 -c10 -d10s http://localhost:8787/do/simd # DO SIMD