Three event-sourcing bugs, three pillars of one contract
Three bugs over six weeks broke the same contract three different ways: an applier that called Math.random(), a sequence allocator that did read-then-write, and a relay that trusted senderId from the client body. The unified platform from One API, N Games: Client-Authoritative Event Sourcing for a Unified Game Client is what lets a new game stay client-only — clients run the engine, the server stores opaque events, state is the replay from seq 0. That promise has a price, and these three bugs are the price being real instead of theoretical.
- March 21, Augmented Chess —
getRandomAugmentationChoicesran insideapplyMove, so each client rolled different choices and validation desynced. Lesson: appliers must be deterministic. - April 12, submit-event backend —
getLatestSequenceNumberthencreateraced under load, two writers gotseq 8, MongoE11000surfaced as a 500. Lesson: allocation and write are one operation, or they’re a race. - April 24, game-validator — the relay believed whatever
senderIdthe client sent, so a guest could trigger host-onlyresolveRound. Lesson: identity is the one thing the server cannot delegate.
Bug 1: randomness in an applier desyncs replay
Round 1 of a chess augmentation worked because choices were generated inside createInitialState() on the host and shipped as the init event — both clients replayed identical bytes. Round 2 broke because choices were generated inside applyMove itself:
if (turnNumber === 6) {
state.triggeredAugmentationChoices = getRandomAugmentationChoices(3);
}
Player A saw [X1, X2, X3], Player B saw [Y1, Y2, Y3]. The moment A picked X1, B’s validateSelectAugmentation rejected it (augmentationId not in choices) and the clients silently desynced.
The fix hoists the non-determinism into the action. The submitting client rolls once, the action carries the result, every client applies the same bytes:
interface MoveAction {
type: "move";
from: Square;
to: Square;
triggeredAugmentationChoices?: AugmentationChoice[];
}
Three files: packages/augmented-chess-engine/src/types/actions.ts, move.applier.ts, and apps/board-game-client/src/games/chess/game-state-store.ts. The general rule: appliers are pure functions of (state, action) -> state. Math.random(), Date.now(), network reads — anything that isn’t a function of those inputs has to be hoisted into the action and serialized.
Bug 2: non-atomic sequence allocation races under load
functions/game/room/submit-event/logic.ts allocated sequence numbers with a read, then a write:
const seqNum = await eventRepository.getLatestSequenceNumber(roomId);
const nextSeq = seqNum + 1;
await eventRepository.create(roomId, eventType, payload, nextSeq);
Two concurrent submits both read seqNum = 7, both wrote 8. The unique compound index on (gameId, sequenceNumber) in game_room_events caught it — the second writer got Mongo code 11000 — but unhandled, that surfaced as a 500 to the client. Without the index, replay would have two events at seq 8 and engine state diverges.
createWithAtomicSequence() in game-event.repository.ts collapses the window into one round trip:
const counter = await db.collection("game-sequence-counters").findOneAndUpdate(
{ _id: `room-${roomId}` },
{ $inc: { seq: 1 } },
{ upsert: true, returnDocument: "after" },
);
await db.collection("game_room_events").insertOne({ ...event, sequenceNumber: counter.seq });
findOneAndUpdate with $inc is atomic per document. Concurrent callers serialize on the counter. The unique index stays as the safety net, and a defensive catch in handleRequest returns 409 Conflict on any residual 11000. If replay correctness depends on a monotonic sequence, allocation and write must be one operation.
Bug 3: identity from a client field is not identity
The relay used senderId from the client’s submit-event body to mark who an event was from:
{ "type": "resolve_round", "senderId": "<host-user-id>", "payload": { ... } }
Host-only validators (resolveRound on fruit-shop, revealCard on horror-race) checked action.resolverId === state.hostPlayerId — but resolverId came from senderId, which came from the client. A guest could trigger host-only actions by lying about who they were.
The fix wasn’t a one-line patch — the relay had to own identity end-to-end (PRs #398, #400, #402):
- Engine identity metadata.
hostPlayerIdadded toInitGameOptionsand persisted state forfruit-shop-engine,horror-race-engine,stone-flicking-core-engine,augmented-stone-flicking-engine. Host-only validators check the field onstate, not on the action. - Validation service. New
apps/game-validatorBun service withPOST /validate. Identity is injected from the verified session, not read from the body —action.resolverId = senderId(verified from JWT),action.revealerId = senderId. - Relay integration.
go/cmd/game-api/handlers/room.gocalls the validator over HTTP, persistscurrentStateon the room, only writes the event if validation passes.
In a client-authoritative system the server can let go of almost everything — engine, rules, most validation. The one thing it cannot let go of is who you are. Anything the client can write, the client can lie about.
The three pillars in one checklist
| Pillar | Concrete rule | Caught by |
|---|---|---|
| Deterministic appliers | No random, no time, no I/O — hoist into the action | Replay divergence in tests |
| Atomic sequence allocation | One operation; unique index as backstop | code 11000 → 409 |
| Server-owned identity | Rewrite identity from authenticated session on entry | Host-only validators on persisted state |
Future-me checklist for a new game: could random or time leak in, is allocation atomic server-side, who decides who I am. If any answer is shaky, the bug already exists — it just hasn’t been triggered yet.
AI workflow note
Claude was the pattern-matcher across the three task-logs. I asked it to read 20260321-chess-augmentation-event-sync-fix.md, 20260412-backend-atomic-event-sequence.md, and 20260424-game-event-validation/progress.md side by side and surface what they shared — the “three pillars” framing came from that pass, not from any individual log. After bug 1, I ran the code-reviewer agent across the other engine appliers with one question: “where else might Math.random() or Date.now() be hiding inside an applier?” It surfaced two more sites before they shipped. The discipline that paid off most was writing the fix as a regression test first — the duplicate-key race in bug 2 was easy to assert against once I had the test, hard to reason about without it.
