Networking & Retry

vuer-rtc tracks message acknowledgement status to enable retry on network failures.

How It Works

Every message in the journal has an ack field:

  • ack: false - Message hasn't been acknowledged by the server
  • ack: true - Server has confirmed receipt

When a message is committed locally, it starts with ack: false. When the server acknowledges it (via onServerAck), it becomes ack: true.

Retry Helpers

import {
  getUnackedMessages,
  hasPendingMessages,
  getPendingCount,
} from '@vuer-ai/vuer-rtc/client';

// Get all messages that need to be (re)sent
const messages = getUnackedMessages(state);

// Check if there's anything pending
if (hasPendingMessages(state)) {
  console.log(`${getPendingCount(state)} messages pending`);
}

Undo/Redo and Ack Reset

When you undo or redo an operation, two things happen:

  1. New undo/redo message is created with ack: false
  2. Target entry's ack is reset to false

This ensures that:

  • The undo/redo message itself needs to be sent
  • The target entry's deletedAt state change is tracked for sync
// Example: After undo
journal: [
  { msg: originalMsg, ack: false, deletedAt: 123456 },  // ack reset!
  { msg: undoMsg, ack: false },                          // new message
]

Implementing Retry

class RTCClient {
  private retryTimer: number | null = null;
  private retryDelay = 1000; // Start with 1 second
  private maxRetryDelay = 30000; // Max 30 seconds

  // Call when connection is restored
  async retryPendingMessages() {
    const messages = getUnackedMessages(this.store.getState());

    for (const msg of messages) {
      try {
        await this.sendToServer(msg);
      } catch (err) {
        // Schedule exponential backoff retry
        this.scheduleRetry();
        return;
      }
    }

    // All sent successfully, reset delay
    this.retryDelay = 1000;
  }

  private scheduleRetry() {
    if (this.retryTimer) return;

    this.retryTimer = setTimeout(() => {
      this.retryTimer = null;
      this.retryPendingMessages();
    }, this.retryDelay);

    // Exponential backoff
    this.retryDelay = Math.min(this.retryDelay * 2, this.maxRetryDelay);
  }

  // Call when receiving ack from server
  onAck(msgId: string) {
    this.store.dispatch(state => onServerAck(state, msgId));
  }
}

Server Implementation

The server should:

  1. Acknowledge each message after processing:

    ws.on('message', (data) => {
      const msg = JSON.parse(data);
    
      // Process the message...
    
      // Send ack back to client
      ws.send(JSON.stringify({ mtype: 'ack', msgId: msg.id }));
    });
  2. Handle duplicate messages gracefully (idempotent):

    const processedIds = new Set<string>();
    
    function processMessage(msg: CRDTMessage) {
      if (processedIds.has(msg.id)) {
        // Already processed, just send ack
        return;
      }
    
      processedIds.add(msg.id);
      // Process...
    }

Connection Status UI

You can track connection state and show sync status:

function useConnectionStatus(store: GraphStore) {
  const [status, setStatus] = useState<'connected' | 'reconnecting' | 'offline'>('connected');
  const pendingCount = getPendingCount(store.getState());

  return {
    status,
    pendingCount,
    isSynced: pendingCount === 0,
  };
}

Sync Reconciliation (Bloom Filter)

Retry alone handles client → server drops (the client knows which messages the server hasn't acknowledged). But what about server → client drops? When the server broadcasts another client's edit and the message is lost in transit, the receiving client has no idea it's missing anything.

Bloom filter-based sync reconciliation solves this. Periodically, each client sends a compact digest of all message IDs it knows about. The server checks this against its history and retransmits anything the client is missing.

Two complementary recovery mechanisms

Retry unacked recovers client → server drops (resend messages the server hasn't acknowledged). Bloom filter sync recovers server → client drops (server fills in messages the client never received). Together they provide full bidirectional loss recovery.

Protocol Flow

Client                          Server
  │                               │
  │──── CRDTMessage ────────────▶ │  normal edit
  │ ◀──────────── ack ────────────│
  │                               │
    (some broadcasts get lost)  │          ❌ ◀── broadcast ────│  dropped!
  │                               │
  │──── SyncDigest ─────────────▶ │  "here's what I have"
{ mtype: 'sync',  │      filter: '<bloom b64>',  │      count: N }  │                               │
  │ ◀──── broadcast (fill) ───────│  server retransmits
  │ ◀──── broadcast (fill) ───────│  each missing message
  │                               │

Building a Sync Digest

A SyncDigest is a bloom filter containing every message ID from the client's journal. The server uses it to check which messages the client has and hasn't seen.

import { buildSyncDigest } from '@vuer-ai/vuer-rtc';

// Build a digest from the current client state
const digest = buildSyncDigest(store.getState());
// => { mtype: 'sync', vectorClock: { alice: 5, bob: 3 }, filter: '<base64>', count: 12 }

// Send it to the server over your WebSocket
ws.send(JSON.stringify(digest));

The digest has two parts:

  • vectorClock — covers all messages compacted into the snapshot (fast O(1) lookup per message)
  • filter — bloom filter covering only the uncompacted journal entries (stays small after compaction)
  • count — number of journal entries in the bloom filter (diagnostics)

Server-Side Handling

When the server receives a SyncDigest, it:

  1. Checks the vector clock first — messages covered by the clock are already in the client's snapshot (O(1) per message)
  2. For remaining messages, checks the bloom filter — these are in the client's uncompacted journal
  3. Retransmits any message that isn't covered by either
// Simplified server-side handler
function handleSyncRequest(ws, roomId, sessionId, digest) {
  const clientClock = digest.vectorClock ?? {};
  const filter = BloomFilter.deserialize(digest.filter);
  const history = this.messageHistory.get(roomId) ?? [];

  for (const msg of history) {
    // Skip the client's own messages (it already has them)
    if (msg.sessionId === sessionId) continue;
    // Skip messages covered by the client's snapshot (compacted)
    if ((clientClock[msg.sessionId] ?? 0) >= msg.clock[msg.sessionId]) continue;
    // Check the bloom filter for uncompacted journal entries
    if (!filter.has(msg.id)) {
      ws.send(JSON.stringify({ mtype: 'broadcast', msg }));
    }
  }
}

Bloom filters have a small false positive rate (~1%), meaning the server occasionally thinks a client has a message when it doesn't. This is harmless — the next sync round will catch it. There are no false negatives, so the server never sends a message the client already has (apart from the normal dedup the client already handles).

Periodic Sync Timer

For the best recovery experience, run both retry and sync on a periodic timer:

import { getUnackedMessages, buildSyncDigest } from '@vuer-ai/vuer-rtc';

let syncTimer: ReturnType<typeof setInterval> | null = null;

function startSyncTimer(intervalMs: number) {
  if (syncTimer) clearInterval(syncTimer);
  if (intervalMs <= 0) return;

  syncTimer = setInterval(() => {
    // 1. Retry unacked — recovers client → server drops
    const unacked = getUnackedMessages(store.getState());
    for (const msg of unacked) {
      ws.send(JSON.stringify(msg));
    }

    // 2. Bloom filter sync — recovers server → client drops
    const digest = buildSyncDigest(store.getState());
    ws.send(JSON.stringify(digest));
  }, intervalMs);
}

Complexity

Time complexity per sync round:

OperationComplexityNotes
Build bloom filterO(j)j = journal length (shrinks after compaction)
Server check historyO(H)Most entries skipped via vector clock (one comparison each)
Retransmit missesO(m)m = number of missing messages

Space complexity:

ComponentSizeNotes
Sync digest (wire)~1.2 bytes per journal entry + vector clockBloom filter ≈ 9.6 bits/item at 1% FP rate
Server history bufferO(H × msg_size)Unbounded per room (needed for full recovery)
Client journalO(n)Shrinks when compact() is called

For a typical session with 1,000 messages, the sync digest is about 1.2 KB on the wire — far smaller than retransmitting the full journal.

Why bloom filters?

A naive approach would send the full list of message IDs (36 bytes each for UUIDs). For 1,000 messages, that's 36 KB. A bloom filter encodes the same information in ~1.2 KB with only a 1% chance of missing a message per round. Over multiple rounds, the probability of never recovering a specific message drops exponentially.

Causal Ordering

When sync retransmits missed messages, they arrive out of their original order. For scene graph operations (set, delete), this is fine — they're commutative. But text CRDT operations have causal dependencies (each insert references a parent character ID).

To handle this, rebuildGraph sorts journal entries by (lamportTime, sessionId) before replaying, which restores causal order regardless of arrival sequence:

// Inside rebuildGraph — causal sort before replay
const sorted = [...journal].sort((a, b) => {
  const dt = a.msg.lamportTime - b.msg.lamportTime;
  if (dt !== 0) return dt;
  return a.msg.sessionId < b.msg.sessionId ? -1
       : a.msg.sessionId > b.msg.sessionId ? 1 : 0;
});

Without causal sorting, text operations that arrive via sync retransmission can reference parent IDs that haven't been inserted yet, causing the text CRDT to produce incorrect results.

Compaction

For long-lasting sessions, the journal grows with every operation. Call compact() periodically to fold acknowledged entries into the snapshot, keeping the journal (and bloom filter) small:

// Compact acknowledged entries into the snapshot
store.compact();

After compaction:

  • Journal shrinks to only unacknowledged entries
  • Bloom filter only covers uncompacted entries (small, bounded)
  • Vector clock in the snapshot covers everything that was compacted (server skips these efficiently)

Compacted entries can no longer be undone — undo() searches the journal for the target entry. Call compact() only when you're OK losing undo history for older operations.

Why not auto-compact?

Auto-compacting on every ack would give the smallest possible journal, but it would also destroy the undo stack immediately. Keeping compaction as an explicit operation lets you balance journal size against undo history. For most applications, compacting every 30–60 seconds provides a good tradeoff.