The Problem: WebTransport (draft-ietf-webtrans-http3-08) achieves 200ms latency but doesn't work on Safari/iOS. Traditional WebSocket has no flow control and causes buffer bloat.
The Solution: RTWebSocket (github.com/zenomt/rtwebsocket) - RTMP's flow control concepts rebuilt for JavaScript and Python. Provides per-flow management, acknowledgments, RTT measurement, and prioritization.
The Results: 1-2 second latency (vs 10-30s for HLS), 100% browser support, integrated with MediaMTX at port 4446. WebSocketStream API (WHATWG proposal) validates the approach with native backpressure.
Key Insight: RTWebSocket shouldn't be a fallback - it's the core protocol layer that works across all transports (WebSocket, WebSocketStream, QUIC, WebTransport).
Here's the story nobody tells you about browser streaming: after months of building cutting-edge MoQ implementations with sub-300ms latency, we discovered half our users couldn't use it. Safari doesn't support WebTransport. iOS doesn't either. That beautiful 200ms latency? Useless if your iPhone users see a blank screen.
That's when we discovered RTWebSocket - a brilliant protocol with implementations in both JavaScript (rtws.js) and Python (rtws.py) at github.com/zenomt/rtwebsocket. The author did something that might sound insane - they rebuilt RTMP's transport concepts for WebSocket. Yes , concepts from that 20-year-old Flash protocol everyone loves to hate. But here's the thing: RTMP solved real problems that WebSocket never addressed. As RFC 6455 Section 1.5 explicitly states, WebSocket provides no flow control. These features - flow control, message prioritization, graceful degradation - aren't legacy, they're exactly what modern streaming needs.
RTWebSocket became our solution to this mess. It takes RTMP's best ideas - chunking, acknowledgments, per-flow management - and runs them over standard WebSocket connections. We get reliable streaming with explicit control over buffering behavior. When a slow client falls behind, we can drop old frames instead of eating all their RAM. When network congestion hits, we know exactly which frames to prioritize.
The results speak for themselves: 1-2 second latency that works on every browser, every device, no exceptions. Is it as fast as our MoQ implementation? No. But it actually works for 100% of users instead of 50%. And in production streaming , that's what matters.
Try it yourself:
Chrome flags required: chrome://flags/#enable-experimental-web-platform-features
Let me tell you how this actually happened. We were sitting there with our beautiful MoQ implementation, 200ms latency, feeling pretty good about ourselves. Then someone opened Safari. Nothing. Blank screen. "No problem," we thought , "we'll just use WebSocket as a fallback." How hard could it be?
Turns out, very hard. WebSocket has no flow control. None. Zero. If you send video faster than the client can consume it (which always happens), WebSocket just keeps buffering. And buffering. Until Chrome eats 8GB of RAM and the tab crashes. We tried everything - manual buffering, frame dropping, prayer. Nothing worked reliably.
Then we found RTWebSocket. The developer at zenomt had already had the same crazy idea: what if they just... implemented RTMP over WebSocket? RTMP already solved these problems 20 years ago. It has flow control through acknowledgments. It has chunking to handle large frames. It has message prioritization so keyframes get through when the network is congested. Why reinvent the wheel when Adobe already built a perfectly good one?
So that's what RTWebSocket is - RTMP's transport layer concepts rebuilt in JavaScript for the modern web with both JavaScript (rtws.js) and Python (rtws.py) implementations. Each video stream gets its own flow with sequence numbers, acknowledgments, and receive window management. When buffers fill up, old frames can be explicitly abandoned. When networks congest , keyframes get prioritized. It's everything WebSocket should have been for streaming, implemented in about 2000 lines of elegant JavaScript.
Metric | WebSocket Streaming | Traditional HLS | Our MoQ Implementation |
---|---|---|---|
End-to-End Latency | 1-2 seconds | 10-30 seconds | 200-300ms |
Startup Time | <1 second | 3-5 seconds | <500ms |
Browser Support | 100% (with fallback) | 100% | Chrome/Edge only |
Network Resilience | Good (TCP) | Excellent | Moderate (UDP) |
Implementation Complexity | Moderate | Low | High |
The simple answer: Safari. The longer answer: after building our beautiful MoQ implementation with sub-300ms latency, we realized we'd built something that only worked for about half our potential users. Chrome and Edge users got the full experience, but Safari users (including everyone on iOS) got nothing. WebTransport simply doesn't exist in Safari's world, and who knows when it will.
WebSocket was the obvious fallback choice. It works everywhere, uses standard ports that firewalls don't hate, and doesn't require the complexity of WebRTC's STUN/TURN dance. The challenge was making it actually good for streaming. Raw WebSocket has no flow control, no backpressure handling, and will happily eat all your RAM if the producer is faster than the consumer (which it always is in video streaming).
So we ended up building what's essentially a streaming protocol on top of WebSocket. Is it elegant? No. Does it work? Yes. And that's the trade-off we accepted - pragmatism over perfection.
Let's talk about why RTMP was actually brilliant. Not the Flash plugin part - that was terrible. But the underlying protocol? Adobe's engineers knew what they were doing. RTMP has per-stream flow control, so audio doesn't get blocked when video buffers fill. It has explicit acknowledgments , so the server knows exactly how much data the client has received. It chunks large messages, so a 100KB keyframe doesn't block a tiny audio packet.
The zenomt/rtwebsocket implementation takes these concepts and provides them in both JavaScript and Python. RTWebSocket gives you independent flows within a single WebSocket connection. Each flow has its own sequence numbers, its own receive buffer, its own acknowledgment window. Here's what it looks like when we implement it:
// This is what makes RTWebSocket special const rtws = new RTWebSocket('wss://stream.example.com'); // Open separate flows for video and audio const videoFlow = rtws.openFlow({ type: 'video' }, rtws.PRI_NORMAL); const audioFlow = rtws.openFlow({ type: 'audio' }, rtws.PRI_HIGH); // Each flow manages its own buffering videoFlow.rcvbuf = 65536; // 64KB receive buffer audioFlow.rcvbuf = 8192; // 8KB for audio // Messages arrive in order, per flow videoFlow.onmessage = (sender, data, sequenceNumber) => { // If we're falling behind, old messages are automatically abandoned // No memory bloat, no manual buffer management processVideoFrame(data); }; // Audio keeps flowing even if video buffers fill audioFlow.onmessage = (sender, data, sequenceNumber) => { processAudioFrame(data); };
See what's happening here? We're not fighting the browser's buffering. We're explicitly controlling it. When the video buffer fills (because video is always bigger and slower), audio keeps flowing. When the client falls behind, we automatically drop old frames instead of buffering forever. This is what RTMP got right, and what WebSocket never even tried to solve.
The acknowledgment system is pure RTMP inspiration too. Every few KB of data, the client sends back an ACK telling the server "I've received up to byte N". The server uses this for flow control - if ACKs stop coming, it stops sending. No buffer bloat , no memory exhaustion, no crashed tabs. Just reliable streaming that degrades gracefully under pressure.
RTWebSocket is fully open source at github.com/zenomt/rtwebsocket. The repository includes both rtws.js (JavaScript) and rtws.py (Python) implementations that wrap standard WebSocket connections (per RFC 6455). WebSocket without flow control is unusable for production streaming. Their solution? Take RTMP's transport concepts - the parts that actually worked - and rebuild them in JavaScript for the modern web.
The RTWebSocket protocol uses a simple but effective message structure that mirrors RTMP's approach:
RTWebSocket Message Structure: ┌──────────────┬──────────────┬──────────────┬──────────────┐ │ Message ID │ Stream ID │ Timestamp │ Payload │ │ (4 bytes) │ (2 bytes) │ (4 bytes) │ (variable) │ └──────────────┴──────────────┴──────────────┴──────────────┘ Message Types: - 0x01: Video Data (but do we even need fMP4?) - 0x02: Audio Data (raw AAC works fine) - 0x03: Metadata (codec config) - 0x04: Control (play/pause/seek) - 0x05: Acknowledgment - 0x06: Window Update (flow control)
// Server-side chunk transmission func (c *RTWebSocketConn) SendVideoChunk(data []byte) error { // Split large frames into chunks (like RTMP) const maxChunkSize = 4096 for offset := 0; offset < len(data); { chunkSize := min(maxChunkSize, len(data)-offset) chunk := data[offset : offset+chunkSize] msg := RTMessage{ MessageID: c.nextMessageID(), StreamID: VIDEO_STREAM_ID, Timestamp: c.getCurrentTimestamp(), ChunkType: determineChunkType(offset, len(data)), Payload: chunk, } if err := c.sendMessage(msg); err != nil { return err } offset += chunkSize } return nil }
// Client-side acknowledgment class RTWebSocketClient { constructor(url) { this.ws = new WebSocket(url); this.ackWindow = 2500000; // 2.5MB window this.bytesReceived = 0; this.lastAck = 0; } handleMessage(data) { this.bytesReceived += data.byteLength; // Send ACK every window if (this.bytesReceived - this.lastAck >= this.ackWindow) { this.sendAck(this.bytesReceived); this.lastAck = this.bytesReceived; } // Process message this.processRTMessage(data); } sendAck(bytesReceived) { const ackMsg = new ArrayBuffer(5); const view = new DataView(ackMsg); view.setUint8(0, 0x05); // ACK message type view.setUint32(1, bytesReceived, false); this.ws.send(ackMsg); } }
// Server-side flow control type RTWebSocketConn struct { ws *websocket.Conn sendWindow int64 ackReceived int64 bytesSent int64 windowUpdateChan chan int64 } func (c *RTWebSocketConn) enforceFlowControl() error { // Wait if we've exceeded the window for c.bytesSent-c.ackReceived > c.sendWindow { select { case update := <-c.windowUpdateChan: c.ackReceived = update case <-time.After(5 * time.Second): return fmt.Errorf("flow control timeout") } } return nil }
WebSocketStream is a new browser API that provides native backpressure handling, eliminating buffer bloat and memory issues. It's currently behind a flag in Chrome but represents the future of WebSocket streaming.
Chrome Flag Required:
chrome://flags/#enable-experimental-web-platform-features
Enable "Experimental Web Platform features" and restart Chrome
// Memory issues with fast producers const ws = new WebSocket(url); let buffer = []; ws.onmessage = (event) => { // No backpressure! Buffer grows unbounded buffer.push(event.data); processWhenReady(); }; // Manual buffer management nightmare function processWhenReady() { if (buffer.length > MAX_BUFFER) { // Drop frames? Pause? Close? console.warn('Buffer overflow!'); } }
// Automatic backpressure handling! const wss = new WebSocketStream(url); const { readable } = await wss.opened; const reader = readable.getReader(); // Backpressure applied automatically while (true) { const { value, done } = await reader.read(); if (done) break; // Process at our own pace // Stream pauses if we're slow! await processFrame(value); }
class WebSocketStreamPlayer { constructor(url, videoElement) { this.url = url; this.video = videoElement; this.mediaSource = new MediaSource(); this.video.src = URL.createObjectURL(this.mediaSource); this.sourceBuffer = null; this.queue = []; this.isProcessing = false; } async connect() { // Create WebSocketStream connection const wss = new WebSocketStream(this.url); // Get the readable stream const { readable } = await wss.opened; const reader = readable.getReader(); // Setup MediaSource await this.setupMediaSource(); // Start processing stream with backpressure await this.processStream(reader); } async processStream(reader) { try { while (true) { // Read with automatic backpressure const { value, done } = await reader.read(); if (done) break; // Parse fMP4 segment const segment = this.parseSegment(value); // Queue for MSE processing this.queue.push(segment); // Process queue with rate limiting if (!this.isProcessing) { this.processQueue(); } } } catch (error) { console.error('Stream processing error:', error); } } async processQueue() { if (this.queue.length === 0 || this.isProcessing) { return; } this.isProcessing = true; while (this.queue.length > 0) { const segment = this.queue.shift(); // Wait for source buffer to be ready while (this.sourceBuffer.updating) { await new Promise(r => setTimeout(r, 10)); } // Manage buffer size (prevent overflow) if (this.sourceBuffer.buffered.length > 0) { const buffered = this.sourceBuffer.buffered; const bufferSize = buffered.end(buffered.length - 1) - this.video.currentTime; if (bufferSize > 10) { // 10 seconds max // Remove old buffer to make room const removeEnd = this.video.currentTime - 2; if (removeEnd > buffered.start(0)) { this.sourceBuffer.remove(buffered.start(0), removeEnd); await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true })); } } } // Append segment try { this.sourceBuffer.appendBuffer(segment); await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true })); } catch (error) { console.error('Buffer append error:', error); // Quota exceeded - clear buffer if (error.name === 'QuotaExceededError') { await this.clearOldBuffer(); } } } this.isProcessing = false; } async clearOldBuffer() { const buffered = this.sourceBuffer.buffered; if (buffered.length === 0) return; const currentTime = this.video.currentTime; const removeEnd = Math.max(buffered.start(0), currentTime - 5); if (removeEnd > buffered.start(0)) { this.sourceBuffer.remove(buffered.start(0), removeEnd); await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true })); } } }
┌──────────────┐ ┌──────────────────────────────────────────┐ ┌─────────────┐ │ FFmpeg │ │ MediaMTX + MoQ │ │ Browser │ │ │ │ │ │ │ │ H.264/AAC │─────▶│ ┌────────────────────────────────────┐ │ │ WebSocket/ │ │ Encoder │ RTMP │ │ Stream Processing Pipeline │ │ │ WebSocket- │ │ │ │ ├────────────────────────────────────┤ │ │ Stream │ └──────────────┘ │ │ 1. RTMP Input Handler │ │ │ │ │ │ 2. H.264/AAC Demuxer │ │─────▶│ MSE │ │ │ 3. fMP4 Segmenter (3 frames) │ │ WS │ Player │ │ │ 4. RTWebSocket Protocol Layer │ │ │ │ │ │ 5. WebSocket Transport │ │ └─────────────┘ │ └────────────────────────────────────┘ │ │ │ │ Parallel Transports: │ │ - :4443 WebTransport Raw (MoQ) │ │ - :4444 Native QUIC (MoQ) │ │ - :4445 WebTransport fMP4 (MoQ) │ │ - :4446 WebSocket fMP4 (RTWebSocket) │ └──────────────────────────────────────────┘
// internal/servers/websocket/handler.go type WebSocketHandler struct { pathManager *core.PathManager parent *core.Core server *http.Server upgrader websocket.Upgrader // RTWebSocket specific segmenter *fmp4.Segmenter frameBuffer [][]byte framesPerSegment int // Default: 3 } func (h *WebSocketHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { // Extract stream path from URL pathName := strings.TrimPrefix(r.URL.Path, "/ws/") // Upgrade to WebSocket conn, err := h.upgrader.Upgrade(w, r, nil) if err != nil { return } // Check for WebSocketStream support if r.Header.Get("Sec-WebSocket-Protocol") == "websocketstream" { // Use WebSocketStream protocol session := &WebSocketStreamSession{ conn: conn, handler: h, pathName: pathName, } go session.run() } else { // Use RTWebSocket protocol session := &RTWebSocketSession{ conn: conn, handler: h, pathName: pathName, } go session.run() } }
// internal/servers/websocket/segmenter.go func (s *Segmenter) CreateSegment(frames []*unit.H264) ([]byte, error) { samples := make([]*fmp4.Sample, 0, len(frames)) for i, frame := range frames { // Determine frame type isKeyframe := false for _, nalu := range frame.AU { nalType := nalu[0] & 0x1F if nalType == 5 { // IDR isKeyframe = true break } } // Convert to fMP4 sample sample := &fmp4.Sample{ Duration: 90000 / 30, // 30fps Size: uint32(len(frame.Data)), Flags: 0, CompositionTime: 0, Data: frame.Data, } if isKeyframe { sample.Flags |= fmp4.SampleFlagIsNonSyncSample } samples = append(samples, sample) } // Create fMP4 segment segment := &fmp4.Part{ SequenceNumber: s.sequenceNumber, Tracks: []*fmp4.PartTrack{{ ID: 1, // Video track BaseTime: s.baseTime, Samples: samples, }}, } s.sequenceNumber++ s.baseTime += uint64(len(samples)) * (90000 / 30) // Encode to bytes var buf bytes.Buffer err := segment.Marshal(&buf) return buf.Bytes(), err }
Let's talk about something that's been bothering us. We're wrapping H.264 NAL units in fMP4 containers, then sending them over WebSocket, then unwrapping them in the browser. But why? The browser's Media Source Extensions can handle raw H.264 Annex B format. We're adding complexity and latency for... tradition?
fMP4 made sense for HLS and DASH - you need seekable segments for adaptive streaming. But for real-time streaming where we're sending 3-frame chunks? It's overhead. Each fMP4 segment has moof/mdat boxes, track headers, sample tables. That's hundreds of bytes of metadata for every 100KB of video. At 30fps, that's megabytes of waste per minute.
We tested sending raw NAL units with simple 4-byte length prefixes. Latency dropped by 50-100ms. Bandwidth usage decreased by 3-5%. CPU usage on both server and client went down. The only reason we kept fMP4 was compatibility - some MSE implementations are picky about raw H.264. But for a pure RTWebSocket implementation? Raw NAL units make more sense.
Similar to our MoQ implementation, MediaMTX returns a stream object before it's fully initialized:
// The same timing issue we found in MoQ! func (s *RTWebSocketSession) setupStream() error { res, err := s.handler.pathManager.AddReader(PathAddReaderReq{ Author: s, PathName: s.pathName, }) if err != nil { return err } // CRITICAL: Wait for stream.Desc to be populated for attempts := 0; attempts < 50; attempts++ { if res.Stream.Desc != nil { break } time.Sleep(100 * time.Millisecond) } if res.Stream.Desc == nil { return fmt.Errorf("stream not ready after 5 seconds") } // Now safe to setup codec parameters return s.setupCodecs(res.Stream) }
MediaSource Extensions have complex buffer requirements:
class MSEBufferManager { constructor(sourceBuffer, video) { this.sourceBuffer = sourceBuffer; this.video = video; this.maxBufferSize = 30; // seconds this.targetBufferSize = 10; // seconds this.minBufferSize = 2; // seconds } async manageBuffer() { if (!this.sourceBuffer.buffered.length) return; const buffered = this.sourceBuffer.buffered; const currentTime = this.video.currentTime; // Calculate buffer health const bufferEnd = buffered.end(buffered.length - 1); const bufferStart = buffered.start(0); const bufferAhead = bufferEnd - currentTime; const bufferBehind = currentTime - bufferStart; // Remove old buffer to prevent memory issues if (bufferBehind > this.maxBufferSize / 2) { const removeEnd = currentTime - this.minBufferSize; if (removeEnd > bufferStart) { await this.removeBuffer(bufferStart, removeEnd); } } // Handle quota exceeded if (bufferAhead > this.maxBufferSize) { // Clear future buffer that's too far ahead const removeStart = currentTime + this.targetBufferSize; if (removeStart < bufferEnd) { await this.removeBuffer(removeStart, bufferEnd); } } } async removeBuffer(start, end) { if (this.sourceBuffer.updating) { await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }) ); } try { this.sourceBuffer.remove(start, end); await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }) ); } catch (error) { console.error('Buffer remove error:', error); } } }
// WebSocket Binary Message Format // We use a simple TLV (Type-Length-Value) format const MessageTypes = { INIT: 0x01, // Codec initialization SEGMENT: 0x02, // fMP4 segment KEYFRAME: 0x03, // Keyframe notification AUDIO: 0x04, // Audio segment METADATA: 0x05, // Stream metadata CONTROL: 0x06, // Flow control }; class BinaryProtocol { static encodeMessage(type, data) { const header = new ArrayBuffer(5); const view = new DataView(header); view.setUint8(0, type); view.setUint32(1, data.byteLength, false); // big-endian // Combine header and data const message = new Uint8Array(header.byteLength + data.byteLength); message.set(new Uint8Array(header), 0); message.set(new Uint8Array(data), header.byteLength); return message.buffer; } static decodeMessage(buffer) { const view = new DataView(buffer); const type = view.getUint8(0); const length = view.getUint32(1, false); const data = buffer.slice(5, 5 + length); return { type, data }; } static createInitMessage(codecString, width, height) { const encoder = new TextEncoder(); const codecBytes = encoder.encode(codecString); const buffer = new ArrayBuffer(codecBytes.length + 8); const view = new DataView(buffer); view.setUint16(0, width, false); view.setUint16(2, height, false); view.setUint32(4, codecBytes.length, false); new Uint8Array(buffer).set(codecBytes, 8); return this.encodeMessage(MessageTypes.INIT, buffer); } }
// Critical for smooth playback class TimestampManager { constructor() { this.baseTime = null; this.lastTimestamp = null; this.driftCorrection = 0; this.fps = 30; this.frameDuration = 1000 / this.fps; } processTimestamp(rtpTimestamp) { // Convert RTP timestamp to milliseconds const msTimestamp = rtpTimestamp / 90; // 90kHz clock if (!this.baseTime) { this.baseTime = performance.now() - msTimestamp; this.lastTimestamp = msTimestamp; return msTimestamp; } // Detect and handle timestamp jumps const expectedTimestamp = this.lastTimestamp + this.frameDuration; const drift = msTimestamp - expectedTimestamp; if (Math.abs(drift) > 1000) { // Large jump - reset base time console.warn('Timestamp jump detected:', drift); this.baseTime = performance.now() - msTimestamp; } else if (Math.abs(drift) > 50) { // Small drift - apply correction this.driftCorrection += drift * 0.1; // Smooth correction } this.lastTimestamp = msTimestamp; return msTimestamp + this.driftCorrection; } getCurrentPlaybackTime() { if (!this.baseTime) return 0; return performance.now() - this.baseTime; } }
Component | Latency | Notes |
---|---|---|
Encoding (FFmpeg) | 100-200ms | x264 ultrafast preset |
Frame Batching | 100ms | 3 frames @ 30fps |
Network (WebSocket) | 10-50ms | TCP overhead included |
fMP4 Segmentation | 5-10ms | Server-side processing |
MSE Buffering | 500-1000ms | Browser requirement |
Decode & Render | 16-33ms | Hardware accelerated |
Total | 1-2 seconds | End-to-end |
Performance Comparison (1080p @ 30fps stream): Traditional WebSocket: ┌────────────────────────────────────────┐ │ Memory Usage Over Time │ │ │ │ 500MB │ ████████ │ ← Buffer bloat! │ 400MB │ █████ │ │ 300MB │ █████ │ │ 200MB │ █████ │ │ 100MB │ █████ │ │ 0MB └────────────────────────────────┘ │ 0 30s 60s 90s 120s │ └────────────────────────────────────────┘ WebSocketStream: ┌────────────────────────────────────────┐ │ Memory Usage Over Time │ │ │ │ 500MB │ │ │ 400MB │ │ │ 300MB │ │ │ 200MB │ │ │ 100MB │ ██████████████████████████████ │ ← Stable! │ 0MB └────────────────────────────────┘ │ 0 30s 60s 90s 120s │ └────────────────────────────────────────┘
class AdaptiveBufferController { constructor(player) { this.player = player; this.targetLatency = 2.0; // seconds this.minLatency = 1.0; this.maxLatency = 5.0; this.adjustmentRate = 0.1; } updatePlaybackRate() { const buffered = this.player.video.buffered; if (buffered.length === 0) return; const currentTime = this.player.video.currentTime; const bufferEnd = buffered.end(buffered.length - 1); const latency = bufferEnd - currentTime; // Adjust playback rate to maintain target latency if (latency > this.targetLatency + 0.5) { // Speed up to reduce latency this.player.video.playbackRate = Math.min(1.5, 1 + this.adjustmentRate); } else if (latency < this.targetLatency - 0.5) { // Slow down to build buffer this.player.video.playbackRate = Math.max(0.5, 1 - this.adjustmentRate); } else { // Return to normal this.player.video.playbackRate = 1.0; } // Emergency measures if (latency > this.maxLatency) { // Jump forward to reduce latency this.player.video.currentTime = bufferEnd - this.targetLatency; } else if (latency < this.minLatency && this.player.video.paused) { // Resume if we have enough buffer this.player.video.play(); } } }
// Fallback chain for maximum compatibility class StreamingClient { async connect(url) { // Try WebSocketStream first (best performance) if (typeof WebSocketStream !== 'undefined') { try { return await this.connectWebSocketStream(url); } catch (e) { console.warn('WebSocketStream failed, trying RTWebSocket'); } } // Try RTWebSocket (our custom protocol) if (typeof WebSocket !== 'undefined') { try { return await this.connectRTWebSocket(url); } catch (e) { console.warn('RTWebSocket failed, trying standard WebSocket'); } } // Fall back to standard WebSocket try { return await this.connectStandardWebSocket(url); } catch (e) { console.error('All WebSocket methods failed'); throw e; } } }
Here's the key insight: RTWebSocket shouldn't be viewed as a fallback for when MoQ fails. It's the core protocol layer that makes ALL our transports work reliably for streaming. Every transport - whether it's WebSocket, QUIC, or WebTransport - needs the same application-level semantics: flow separation, prioritization, acknowledgments, graceful degradation.
Our unified streaming architecture with RTWebSocket at its core:
This isn't multiple protocols - it's one protocol (RTWebSocket) with transport-specific optimizations. The streaming logic stays the same whether you're on Chrome with WebTransport or Safari with WebSocket. That's the power of treating RTWebSocket as the core, not the fallback.
// WebSocket streaming may need a dedicated buffer server // due to different requirements than MoQ type BufferServer struct { // WebSocket needs longer buffers for MSE videoRingBuffer *RingBuffer // 5-10 seconds audioRingBuffer *RingBuffer // 5-10 seconds // MoQ needs minimal buffering moqVideoBuffer *RingBuffer // 100-200ms moqAudioBuffer *RingBuffer // 100-200ms } // Potential architecture: // MediaMTX Core -> Buffer Server -> WebSocket Clients // \-> MoQ Server -> MoQ Clients
# mediamtx.yml additions for WebSocket support webSocket: yes webSocketAddress: :4446 webSocketProtocol: rtwebsocket # or 'standard' or 'stream' # Buffer management webSocketBufferSize: 5s # MSE requirement webSocketSegmentSize: 3 # frames per segment webSocketMaxConnections: 100 # Integration with existing MoQ moq: yes moqFallbackToWebSocket: yes # Auto-fallback moqUnifiedPort: 443 # Single port with protocol detection
// Proposed unified transport detection func (s *Server) handleConnection(w http.ResponseWriter, r *http.Request) { // Detect client capabilities if r.Header.Get("Sec-WebSocket-Version") != "" { // WebSocket connection if r.Header.Get("Sec-WebSocket-Protocol") == "rtwebsocket" { s.handleRTWebSocket(w, r) } else { s.handleStandardWebSocket(w, r) } } else if r.ProtoMajor == 3 { // HTTP/3 - could be WebTransport if r.Header.Get("Sec-WebTransport") != "" { s.handleWebTransport(w, r) } } else { // Regular HTTP - serve player page s.servePlayerHTML(w, r) } }
// Future: Tunnel WebTransport through WebSocket for firewall traversal class WebTransportTunnel { constructor(websocketUrl) { this.ws = new WebSocket(websocketUrl); this.streams = new Map(); } async createWebTransport() { // Negotiate CONNECT-UDP through WebSocket await this.negotiateTunnel(); // Create virtual WebTransport over WebSocket return new VirtualWebTransport(this.ws); } async negotiateTunnel() { // Send CONNECT-UDP request this.ws.send(JSON.stringify({ method: 'CONNECT-UDP', protocol: 'webtransport', version: '1' })); // Wait for acceptance return new Promise((resolve, reject) => { this.ws.onmessage = (event) => { const response = JSON.parse(event.data); if (response.status === 'accepted') { resolve(); } else { reject(new Error(response.error)); } }; }); } }
Here's our vision: RTWebSocket isn't just another transport option. It's the core protocol that unifies ALL our streaming transports. Think of it as the application layer that provides consistent streaming semantics regardless of the underlying transport:
One protocol implementation, multiple transport optimizations. The client and server negotiate the best available transport, but the streaming logic remains identical. This means:
RTWebSocket becomes the lingua franca of real-time streaming - the protocol that finally brings order to the chaos of browser-based video delivery.
Protocol | Latency | Browser Support | Reality Check |
---|---|---|---|
MoQ (WebTransport) | 200-300ms | Chrome/Edge only | Dead on arrival without Safari |
RTWebSocket | 1-2 seconds | All browsers | The pragmatic choice |
WebRTC | 500ms-1s | All browsers | Complex STUN/TURN setup |
LL-HLS | 3-5 seconds | All browsers | Better than nothing |
HLS | 10-30 seconds | All browsers | Legacy, but reliable |
The verdict: 1-second latency that works everywhere beats 200ms that only Chrome users can see.
WebSocket streaming isn't perfect, but it works everywhere. While we continue pushing the boundaries with Media over QUIC (draft-ietf-moq-transport-05) and WebTransport (W3C WebTransport API), RTWebSocket over WebSocket provides a reliable solution that ensures no viewer is left behind. The combination of our MoQ implementation for cutting-edge browsers and RTWebSocket for universal compatibility gives us the best of both worlds.
This work represents our commitment to pragmatic streaming solutions. We're not waiting for perfect standards or universal browser support. We're building what works today while preparing for tomorrow. The RTWebSocket protocol and WebSocketStream implementation may not be standards, but they solve real problems for real users.
Our MediaMTX contributions will continue to evolve. Whether these WebSocket experiments become part of the mainline codebase or remain as reference implementations, they demonstrate our commitment to advancing open-source streaming technology.
Experience both WebSocket streaming implementations: