Skip to content

Architecture

PyMode compiles upstream CPython 3.13 to wasm32-wasi using zig cc, then runs it inside Cloudflare Workers via Durable Objects. Each request gets its own sandboxed Python runtime.

┌─────────────────────────────────────────────────────┐
│ Cloudflare Edge │
│ │
│ ┌──────────┐ ┌──────────────────────────────┐ │
│ │ Worker │────▶│ PythonDO (Durable Object) │ │
│ │(stateless)│ │ │ │
│ └──────────┘ │ ┌────────────────────────┐ │ │
│ │ │ python.wasm (CPython) │ │ │
│ │ │ │ │ │
│ │ │ on_fetch(req, env) │ │ │
│ │ │ │ │ │ │
│ │ │ ▼ │ │ │
│ │ │ WASM Host Imports │ │ │
│ │ └───────┬────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────▼─────┐ │ │
│ │ │ Asyncify │ │ │
│ │ │ (suspend/ │ │ │
│ │ │ resume) │ │ │
│ │ └─────┬─────┘ │ │
│ └──────────┼───────────────────┘ │
│ │ │
│ ┌──────────▼───────────────┐ │
│ │ KV │ R2 │ D1 │ TCP │ HTTP │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

A native CPython 3.13 is built first — this serves as the “build Python” for cross-compilation.

CPython is cross-compiled to wasm32-wasi using zig cc wrappers:

ComponentPurpose
zig-ccC compiler wrapper targeting wasm32-wasi with -Os
zig-arStatic archiver
zig-cppC preprocessor
config.site-wasiPre-answers configure checks for WASI

Key build flags:

  • Target: wasm32-wasi
  • Optimization: -Os (ReleaseSmall)
  • Disabled: threads, shared libs, IPv6, pymalloc
  • WASI emulation: -lwasi-emulated-signal, -lwasi-emulated-getpid, -lwasi-emulated-process-clocks

wasm-opt --asyncify instruments the binary so host imports can suspend and resume the WASM stack:

Python calls fetch()
→ WASM host import pymode.http_fetch
→ Asyncify unwinds the stack
→ JS awaits the actual HTTP fetch
→ Asyncify rewinds the stack
→ Python receives the response

This means Python code looks synchronous while the host performs async I/O.

generate-stdlib-fs.py packages the CPython stdlib and PyMode runtime into a TypeScript map that’s embedded in the worker:

export const stdlibFS: Record<string, string> = {
"encodings/__init__.py": "...",
"encodings/utf_8.py": "...",
"json/__init__.py": "...",
// ~90 stdlib modules
};

Instead of JS interop or virtual filesystem hacks, PyMode uses direct WASM imports:

pymode_imports.h
__attribute__((import_module("pymode"), import_name("kv_get")))
int32_t pymode_kv_get(const char* key, int32_t key_len,
uint8_t* buf, int32_t buf_len);

The JavaScript host (PythonDO) provides these functions at WASM instantiation time. Python calls them through a C extension module (_pymode).

NamespaceFunctions
wasi_snapshot_preview1WASI standard (fd_read, fd_write, path_open, etc.)
pymodeKV, R2, D1, TCP, HTTP, threading, dynamic loading
asyncifyStack unwind/rewind control

Each request is handled by a Durable Object instance that:

  1. Instantiates python.wasm with WASI + pymode imports
  2. Writes the request as JSON to WASM stdin
  3. Calls _start (which runs _handler.pyon_fetch())
  4. Reads the response from WASM stdout
  5. Manages async I/O via Asyncify during execution

The DO holds:

  • Active TCP connections (for connection pooling)
  • HTTP response handles
  • Thread/child DO results
  • Dynamic loading state (.wasm side modules)

C extension packages (markupsafe, simplejson, etc.) are compiled to .wasm side modules. PyMode intercepts dlopen/dlsym calls:

import markupsafe
→ CPython calls dlopen("markupsafe/_speedups.so")
→ dynload_pymode.c intercepts
→ WASM host import pymode.dl_open("markupsafe/_speedups.wasm")
→ JS loads + instantiates the side module
→ dlsym("PyInit__speedups") returns the init function
→ CPython initializes the extension module

The 10MB compressed bundle limit applies per worker (total of all JS, WASM, and static assets). For applications needing large packages (numpy, pandas, image processing), you can split across multiple workers connected via Service Bindings:

┌─────────────────────────────────────────────────────────────┐
│ Cloudflare Edge (same colo, sub-ms latency) │
│ │
│ ┌──────────────────┐ Service ┌────────────────────┐ │
│ │ PyMode Worker │───Binding───▶│ Compute Worker │ │
│ │ (~3MB gz) │ │ (~8MB gz) │ │
│ │ │ │ │ │
│ │ python.wasm │ RPC │ python.wasm + │ │
│ │ on_fetch() │◀────────────│ numpy.wasm (zig) │ │
│ │ routing, auth │ │ heavy compute │ │
│ └──────────────────┘ └────────────────────┘ │
│ │ │
│ │ Service Binding │
│ ▼ │
│ ┌────────────────────┐ │
│ │ Extension Worker │ │
│ │ (~5MB gz) │ │
│ │ C extensions as │ │
│ │ WASM side modules │ │
│ └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Each worker is a separate deployment with its own 10MB budget, 128MB memory, and 30s CPU time. Workers communicate via Service Bindings — direct worker-to-worker calls within the same Cloudflare colo with no network overhead.

# wrangler.toml for the PyMode handler worker
[[services]]
binding = "COMPUTE"
service = "compute-worker"
# PyMode handler — fast routing, auth, template rendering
def on_fetch(request, env):
# Delegate heavy compute to a separate worker
resp = env.COMPUTE.fetch("http://internal/process",
method="POST",
body=json.dumps({"data": payload})
)
result = resp.json()
return Response.json(result)
ScenarioApproach
Pure Python packages (jinja2, pyyaml, langchain-core)Bundle in site-packages.zip
Small C extensions (markupsafe)Pure Python fallback or WASM side module
Rust extensions (pydantic_core)Compiled to WASM variant (python-pydantic-core.wasm)
C extensions (numpy)Compiled to WASM variant (python-numpy.wasm)
Heavy compute (ML inference)Use Workers AI (env.AI.run())
PyModePyodide (CF Python Workers)
CPythonUpstream 3.13Patched 3.12 fork
Compilerzig ccEmscripten
Targetwasm32-wasiwasm32-emscripten
Binary size5.7MB (1.8MB gz)20MB (6.4MB gz)
Cold start~28ms (5ms with Wizer)~50ms (with snapshot)
Async I/OAsyncify (transparent)VFS trampoline
CF BindingsWASM host importsJS interop bridge
StatusActiveLimited beta