Why We Migrated Our Signaling Layer from Node.js to Go
Analyzing the performance bounds of the V8 event loop under high-concurrency WebSocket traffic and our exact migration strategy.

The Original Architecture
Our real-time signaling layer was originally built in Node.js. It handled WebSocket connections, presence updates, and message routing for a social platform growing toward 50,000 concurrent users. At ~10,000 connections, it ran fine. At 30,000+, we hit a wall.
Why Node.js Struggled
Node.js is single-threaded. Every event on the event loop — a new connection, a message, a timer — competes for the same thread. At scale, the event loop lag (the delay between scheduling a callback and executing it) grew from <1ms to 40–80ms. This directly caused:
We profiled with clinic.js and confirmed the bottleneck was pure CPU contention on the event loop during message broadcast, not I/O.
The Go Rewrite Strategy
We did not rewrite everything at once. We used the strangler fig pattern:
1. Phase 1: Deploy the Go signaling service alongside Node.js, routing 5% of traffic to it.
2. Phase 2: Gradually shift traffic while monitoring error rates and latency.
3. Phase 3: Decommission Node.js once Go handled 100% of traffic stably for 7 days.
Go's Goroutine Model
Each WebSocket connection in Go runs in its own goroutine — a lightweight green thread managed by the Go runtime. Goroutines start at ~2KB of stack vs ~8KB for OS threads, meaning we could handle 50,000 concurrent connections on a single 8-core instance.
func handleConnections(hub *Hub, conn *websocket.Conn) {
client := &Client{hub: hub, conn: conn, send: make(chan []byte, 256)}
hub.register <- client
// Read and write in separate goroutines
go client.readPump()
go client.writePump()
}Results After 30 Days in Production
| Metric | Node.js | Go | Change |
|---|---|---|---|
| P99 Message Latency | 78ms | 11ms | -86% |
| CPU (50K concurrent) | 91% | 34% | -63% |
| Memory per connection | ~200KB | ~8KB | -96% |
When You Should NOT Do This
If your concurrency requirements are under 10,000 connections and your team is deeply productive in Node.js, the migration cost is unlikely to pay off. We made this switch because we had hard latency SLAs and a roadmap to 200,000 concurrent users.

