Anup Shinde
Go

godom's wire protocol: binary patches over WebSocket

May 2, 2026 7 min read

godom uses binary Protocol Buffers, not JSON, with a deliberate Browser→Go split between input value sync and explicit method calls.

A wire-protocol diagram: a browser pane on the left with an input field and Save button, a WebSocket channel in the center carrying four cyan input frames and a pink method frame to a Go process pane on the right, with a green patch frame returning.
Two layers up. One patch back. All binary protobuf.

TL;DR

  • The wire is binary Protocol Buffers, not JSON. godom started on JSON; the switch was for size and schema-as-contract.
  • Browser→Go is split into two layers: BROWSER_INPUT for input value sync, BROWSER_METHOD for explicit method calls. Both go through the render cycle on the server; the split is about scope, not render-vs-no-render.
  • Server→Browser is three kinds: SERVER_INIT (the full tree, on first connect), SERVER_PATCH (the minimal diff, every render after), and SERVER_JSCALL (for ExecJS).
  • Transport is plain WebSocket. SSE+POST and WebTransport were on the table. Nothing was compelling enough to swap a primitive that already worked.

In how godom's virtual DOM works , I described how a state change in Go becomes a list of patches: re-resolve the template, diff against the previous tree, get a minimal set of operations the bridge needs to apply. This post is about how those patches actually travel.

The wire is the part of godom most users will never touch directly. They write Go structs and HTML directives; the framework takes care of the rest. But the choices on the wire shape almost everything else in the system: how cheap a keystroke is, what the bridge has to do, whether two languages can talk to the same godom server.

Why binary Protocol Buffers

godom started with JSON. JSON works, every browser parses it natively, every text editor reads it, and Go has decent encoders. Switching off it wasn’t free.

Three things won out:

  1. Smaller wire size. Patches are tiny by design (one node ID, one new value, sometimes a small diff blob), but they’re frequent. JSON’s keys and quotes add a constant overhead per message that protobuf collapses away.
  2. Schema as contract. internal/proto/protocol.proto is the formal definition of the protocol. No guessing message shapes from source code. Adding fields is backward-compatible by construction.
  3. Multi-language clients become trivial. Anyone can run protoc against the .proto and generate a typed client in Python, Rust, Java, whatever. The Go server doesn’t care who is on the other end as long as the messages parse.

This is invisible to godom users. The browser uses protobuf.js (a small reflection-based library, embedded into the binary). The Go side uses the standard generated types. No user touches a .proto.

For completeness on what was ruled out: FlatBuffers’ zero-copy benefit only matters when you read a few fields out of large messages, which isn’t godom’s pattern. Custom binary protocols would be re-inventing protobuf without the codegen.

Server → Browser: three kinds of message

Everything the server sends is a ServerMessage. The kind field decides what’s in it.

SERVER_INIT is sent the first time a tab connects, and on reconnect. It carries the full resolved VDOM tree as a JSON-encoded blob, plus the target island name. The bridge walks the tree, builds the real DOM, populates nodeMap. After this, the bridge has everything it needs to address nodes by ID.

SERVER_PATCH is sent every render after the first. It carries a list of DomPatch entries and a target. Each patch references a node by its stable ID and carries a typed payload (text update, facts diff, append, reorder, plugin data, lazy subtree). The bridge looks up nodeMap[node_id], applies the operation, moves on.

SERVER_JSCALL is sent when Go calls ExecJS(). It carries a call_id (so the response can be matched up) and a JavaScript expression. The bridge runs the expression, JSON-stringifies the result, and sends it back.

That is the entire server-to-browser surface today. Streaming and broadcast kinds exist as future placeholders in the proto file, but the live system has these three.

Browser → Go: the two-layer split

The browser-to-server side has more variety because it has more reasons to talk to Go. Four kinds today; the two that matter are split deliberately.

Layer 1: BROWSER_INPUT. Sent automatically on every input event. It carries the source node’s ID and the current value.

For elements with g-bind, g-value, or g-checked, the server sets the bound struct field, then runs a render cycle. Because only that one field changed, the diff is tightly scoped: the input itself doesn’t need a patch (the user already sees their own keystroke), and only directives elsewhere that read the same field (a g-text somewhere, a derived g-class, a sibling island sharing a pointer) produce wire output. With no other reader, the wire payload back is empty.

For inputs without a binding (a plain <input> you just want mirrored across tabs), the server takes a cheaper path: it updates the VDOM node’s Props directly and broadcasts a single targeted facts patch. No tree resolve, no diff.

Layer 2: BROWSER_METHOD. Sent on click, keydown, drop, or any other g-* event handler. It carries the source node’s ID, the method name (Save, Toggle, etc.), and JSON-encoded arguments. The server calls the Go method via reflection, and after the method returns, runs a render cycle: re-resolve the template, diff against the previous tree, send a SERVER_PATCH.

BROWSER WEBSOCKET GO PROCESS <input g-bind="Name"> H He Hel Hell Hello <button g-click="Save"> Save ✓ Saved LAYER 1: TYPING 5 input frames out, no return traffic. LAYER 2: CLICK 1 method call. 1 patch back. "H" "He" "Hel" "Hell" "Hello" METHOD Save PATCH (1 op) type App struct { Name string Saved bool } LIVE STATE Name: "H" "He" "Hel" "Hell" "Hello" Saved: false true RENDER CYCLE Save() → ResolveTree → Diff Each frame is a single protobuf-encoded BrowserMessage or ServerMessage.
Five keystrokes ship five small BROWSER_INPUT frames; no render. One click ships one BROWSER_METHOD; the server diffs, ships back one SERVER_PATCH.

The split is the architectural call this post is named after. The difference between the two layers is scope, not render-vs-no-render. Without the split, every keystroke into a g-bind input would be a method dispatch: arbitrary user code on every character, with the framework unable to know what state changed. With the split, the framework itself sets exactly one field, so the render cycle that follows has a tightly bounded diff scope.

g-bind doesn’t debounce. Type “hello world” and you get eleven BROWSER_INPUT messages in quick succession. Each one is a few bytes (a node ID and a value). The server runs eleven render cycles, but the diffs are scoped to a one-field change: the wire payload back is usually empty (nothing else depends on the field) or one tiny patch per message (a g-text somewhere reflecting the same value). The Go state stays in lockstep with what the user sees in the box, with the differ doing tiny amounts of work.

Method calls are the “I did something, please consider the world changed” signal. They go through the same render cycle, but the diff can produce more patches because arbitrary code can have mutated arbitrary fields.

The other two browser-side kinds are plumbing:

  • BROWSER_JSRESULT: the response to a SERVER_JSCALL, matched by call_id.
  • BROWSER_INIT_REQUEST: pull-based init for child islands. When a g-island element appears in the DOM, the bridge asks the server for its tree. The server replies with a targeted SERVER_INIT.

Neither has the same load characteristics as BROWSER_INPUT or BROWSER_METHOD, so they don’t change the layer story.

What a render actually costs on the wire

It’s tempting to read “method call triggers a render” as “method call ships the entire tree.” It doesn’t.

After the method runs, the framework re-resolves the template against the now-mutated state. That produces a fresh VDOM tree. The differ walks the old and new trees, gets a list of patches, and ships those patches as a single SERVER_PATCH. If only one node changed, the wire payload is one DomPatch. If only one field of one element changed (a class name flip, say), the payload is one DomPatch carrying a short JSON FactsDiff.

So Layer 2 is not “expensive on the wire.” It’s “render-cycle work on the server, minimal payload on the wire.” The wire stays cheap because the diff already filtered the work down.

For the cases where even running the diff against a large tree is too much (a dashboard with thousands of nodes, a tight ticker on one field, a 60fps mouse handler), the island layer above the VDOM offers MarkRefresh(fields...) to scope the rebuild before the diff runs. Same wire format, just less server-side work upstream.

Why WebSocket

The transport is plain WebSocket. One persistent connection per tab, frames in both directions, a token in the upgrade URL for auth.

Two alternatives I looked at and didn’t ship:

SSE plus POST. Server-Sent Events for the server-to-browser direction, plain POST for browser-to-server. Zero-dependency and works through stricter proxies. The deal-breaker is the no-debounce design above. Eleven keystrokes becoming eleven HTTP POSTs amplifies per-request overhead in a way that frames on a persistent socket don’t. The bridge would have to debounce or batch, which means the bridge would be making timing decisions, which is the opposite of how the bridge is supposed to work.

WEBSOCKET 1 persistent connection. 5 frames. Bytes per message: payload only. BROWSER typing "Hello" persistent socket "H" "He" "Hel" "Hell" "Hello" GO PROCESS Name field "Hello" 5 frames · only payload bytes on the wire SSE + POST 5 separate HTTP POSTs. Each carries request headers in addition to the payload. BROWSER typing "Hello" a fresh HTTP POST per keystroke HTTP HEADERS "H" HTTP HEADERS "He" HTTP HEADERS "Hel" HTTP HEADERS "Hell" HTTP HEADERS "Hello" GO PROCESS Name field "Hello" 5 POSTs · headers carried on every keystroke
Same 5 keystrokes. WebSocket: 5 small frames on one socket. SSE+POST: 5 HTTP requests, each with header overhead. Per-keystroke amplification is the cost the no-debounce design can't absorb.

WebTransport. Newer, multiplexed, designed for the kind of high-frequency low-latency traffic godom does well at. The browser API isn’t stable yet, which means adopting it would mean tracking spec churn for marginal gain. The independent streams and unreliable datagrams it would unlock aren’t bottlenecks I’ve hit.

So godom is on plain WebSocket. Bidirectional, low overhead, well understood. I didn’t benchmark; the heuristic was simple: WebSockets are fine for real-time games at framerate, so they’re fine for godom’s UI patches. They served the purpose perfectly.

Closing thought

The wire is doing very little, on purpose. The protocol has two short message types, eight or so kinds across them, and a rule for which fields are valid in which kind. There is no streaming protocol on top, no per-app message shapes, no negotiation. The bridge speaks it without making decisions, and the server speaks it without thinking about transport.

The next post in this set is about how islands compose: the rename from g-component to g-island, what changed in the design after it, and how slot-based composition works once you stop trying to be a component framework.