WebSocket Streaming vs WebTransport Analysis 2025

Why WINK Streaming Adopted RTMP-in-JavaScript for Safari & iOS

TL;DR

The Problem: WebTransport (draft-ietf-webtrans-http3-08) achieves 200ms latency but doesn't work on Safari/iOS. Traditional WebSocket has no flow control and causes buffer bloat.

The Solution: RTWebSocket (github.com/zenomt/rtwebsocket) - RTMP's flow control concepts rebuilt for JavaScript and Python. Provides per-flow management, acknowledgments, RTT measurement, and prioritization.

The Results: 1-2 second latency (vs 10-30s for HLS), 100% browser support, integrated with MediaMTX at port 4446. WebSocketStream API (WHATWG proposal) validates the approach with native backpressure.

Key Insight: RTWebSocket shouldn't be a fallback - it's the core protocol layer that works across all transports (WebSocket, WebSocketStream, QUIC, WebTransport).

Here's the story nobody tells you about browser streaming: after months of building cutting-edge MoQ implementations with sub-300ms latency, we discovered half our users couldn't use it. Safari doesn't support WebTransport. iOS doesn't either. That beautiful 200ms latency? Useless if your iPhone users see a blank screen.

That's when we discovered RTWebSocket - a brilliant protocol with implementations in both JavaScript (rtws.js) and Python (rtws.py) at github.com/zenomt/rtwebsocket. The author did something that might sound insane - they rebuilt RTMP's transport concepts for WebSocket. Yes , concepts from that 20-year-old Flash protocol everyone loves to hate. But here's the thing: RTMP solved real problems that WebSocket never addressed. As RFC 6455 Section 1.5 explicitly states, WebSocket provides no flow control. These features - flow control, message prioritization, graceful degradation - aren't legacy, they're exactly what modern streaming needs.

RTWebSocket became our solution to this mess. It takes RTMP's best ideas - chunking, acknowledgments, per-flow management - and runs them over standard WebSocket connections. We get reliable streaming with explicit control over buffering behavior. When a slow client falls behind, we can drop old frames instead of eating all their RAM. When network congestion hits, we know exactly which frames to prioritize.

The results speak for themselves: 1-2 second latency that works on every browser, every device, no exceptions. Is it as fast as our MoQ implementation? No. But it actually works for 100% of users instead of 50%. And in production streaming , that's what matters.

Try it yourself:

Chrome flags required: chrome://flags/#enable-experimental-web-platform-features

Table of Contents

1. The Real Story Behind RTWebSocket

Let me tell you how this actually happened. We were sitting there with our beautiful MoQ implementation, 200ms latency, feeling pretty good about ourselves. Then someone opened Safari. Nothing. Blank screen. "No problem," we thought , "we'll just use WebSocket as a fallback." How hard could it be?

Turns out, very hard. WebSocket has no flow control. None. Zero. If you send video faster than the client can consume it (which always happens), WebSocket just keeps buffering. And buffering. Until Chrome eats 8GB of RAM and the tab crashes. We tried everything - manual buffering, frame dropping, prayer. Nothing worked reliably.

Then we found RTWebSocket. The developer at zenomt had already had the same crazy idea: what if they just... implemented RTMP over WebSocket? RTMP already solved these problems 20 years ago. It has flow control through acknowledgments. It has chunking to handle large frames. It has message prioritization so keyframes get through when the network is congested. Why reinvent the wheel when Adobe already built a perfectly good one?

So that's what RTWebSocket is - RTMP's transport layer concepts rebuilt in JavaScript for the modern web with both JavaScript (rtws.js) and Python (rtws.py) implementations. Each video stream gets its own flow with sequence numbers, acknowledgments, and receive window management. When buffers fill up, old frames can be explicitly abandoned. When networks congest , keyframes get prioritized. It's everything WebSocket should have been for streaming, implemented in about 2000 lines of elegant JavaScript.

Key Achievements

Metric WebSocket Streaming Traditional HLS Our MoQ Implementation
End-to-End Latency 1-2 seconds 10-30 seconds 200-300ms
Startup Time <1 second 3-5 seconds <500ms
Browser Support 100% (with fallback) 100% Chrome/Edge only
Network Resilience Good (TCP) Excellent Moderate (UDP)
Implementation Complexity Moderate Low High

2. Why We Went Down This Rabbit Hole

The simple answer: Safari. The longer answer: after building our beautiful MoQ implementation with sub-300ms latency, we realized we'd built something that only worked for about half our potential users. Chrome and Edge users got the full experience, but Safari users (including everyone on iOS) got nothing. WebTransport simply doesn't exist in Safari's world, and who knows when it will.

WebSocket was the obvious fallback choice. It works everywhere, uses standard ports that firewalls don't hate, and doesn't require the complexity of WebRTC's STUN/TURN dance. The challenge was making it actually good for streaming. Raw WebSocket has no flow control, no backpressure handling, and will happily eat all your RAM if the producer is faster than the consumer (which it always is in video streaming).

So we ended up building what's essentially a streaming protocol on top of WebSocket. Is it elegant? No. Does it work? Yes. And that's the trade-off we accepted - pragmatism over perfection.

3. RTWebSocket: RTMP Reborn in JavaScript

Let's talk about why RTMP was actually brilliant. Not the Flash plugin part - that was terrible. But the underlying protocol? Adobe's engineers knew what they were doing. RTMP has per-stream flow control, so audio doesn't get blocked when video buffers fill. It has explicit acknowledgments , so the server knows exactly how much data the client has received. It chunks large messages, so a 100KB keyframe doesn't block a tiny audio packet.

The zenomt/rtwebsocket implementation takes these concepts and provides them in both JavaScript and Python. RTWebSocket gives you independent flows within a single WebSocket connection. Each flow has its own sequence numbers, its own receive buffer, its own acknowledgment window. Here's what it looks like when we implement it:

// This is what makes RTWebSocket special
const rtws = new RTWebSocket('wss://stream.example.com');

// Open separate flows for video and audio
const videoFlow = rtws.openFlow({ type: 'video' }, rtws.PRI_NORMAL);
const audioFlow = rtws.openFlow({ type: 'audio' }, rtws.PRI_HIGH);

// Each flow manages its own buffering
videoFlow.rcvbuf = 65536;  // 64KB receive buffer
audioFlow.rcvbuf = 8192;   // 8KB for audio

// Messages arrive in order, per flow
videoFlow.onmessage = (sender, data, sequenceNumber) => {
    // If we're falling behind, old messages are automatically abandoned
    // No memory bloat, no manual buffer management
    processVideoFrame(data);
};

// Audio keeps flowing even if video buffers fill
audioFlow.onmessage = (sender, data, sequenceNumber) => {
    processAudioFrame(data);
};

See what's happening here? We're not fighting the browser's buffering. We're explicitly controlling it. When the video buffer fills (because video is always bigger and slower), audio keeps flowing. When the client falls behind, we automatically drop old frames instead of buffering forever. This is what RTMP got right, and what WebSocket never even tried to solve.

The acknowledgment system is pure RTMP inspiration too. Every few KB of data, the client sends back an ACK telling the server "I've received up to byte N". The server uses this for flow control - if ACKs stop coming, it stops sending. No buffer bloat , no memory exhaustion, no crashed tabs. Just reliable streaming that degrades gracefully under pressure.

The RTWebSocket Open Source Implementation

RTWebSocket is fully open source at github.com/zenomt/rtwebsocket. The repository includes both rtws.js (JavaScript) and rtws.py (Python) implementations that wrap standard WebSocket connections (per RFC 6455). WebSocket without flow control is unusable for production streaming. Their solution? Take RTMP's transport concepts - the parts that actually worked - and rebuild them in JavaScript for the modern web.

Protocol Design

The RTWebSocket protocol uses a simple but effective message structure that mirrors RTMP's approach:

RTWebSocket Message Structure:
┌──────────────┬──────────────┬──────────────┬──────────────┐
│  Message ID  │  Stream ID   │  Timestamp   │   Payload    │
│   (4 bytes)  │  (2 bytes)   │  (4 bytes)   │  (variable)  │
└──────────────┴──────────────┴──────────────┴──────────────┘

Message Types:
- 0x01: Video Data (but do we even need fMP4?)
- 0x02: Audio Data (raw AAC works fine)
- 0x03: Metadata (codec config)
- 0x04: Control (play/pause/seek)
- 0x05: Acknowledgment
- 0x06: Window Update (flow control)
            

Key Features

1. Chunk Streaming

// Server-side chunk transmission
func (c *RTWebSocketConn) SendVideoChunk(data []byte) error {
    // Split large frames into chunks (like RTMP)
    const maxChunkSize = 4096
    
    for offset := 0; offset < len(data); {
        chunkSize := min(maxChunkSize, len(data)-offset)
        chunk := data[offset : offset+chunkSize]
        
        msg := RTMessage{
            MessageID:  c.nextMessageID(),
            StreamID:   VIDEO_STREAM_ID,
            Timestamp:  c.getCurrentTimestamp(),
            ChunkType:  determineChunkType(offset, len(data)),
            Payload:    chunk,
        }
        
        if err := c.sendMessage(msg); err != nil {
            return err
        }
        
        offset += chunkSize
    }
    
    return nil
}
            

2. Acknowledgment System

// Client-side acknowledgment
class RTWebSocketClient {
    constructor(url) {
        this.ws = new WebSocket(url);
        this.ackWindow = 2500000; // 2.5MB window
        this.bytesReceived = 0;
        this.lastAck = 0;
    }
    
    handleMessage(data) {
        this.bytesReceived += data.byteLength;
        
        // Send ACK every window
        if (this.bytesReceived - this.lastAck >= this.ackWindow) {
            this.sendAck(this.bytesReceived);
            this.lastAck = this.bytesReceived;
        }
        
        // Process message
        this.processRTMessage(data);
    }
    
    sendAck(bytesReceived) {
        const ackMsg = new ArrayBuffer(5);
        const view = new DataView(ackMsg);
        view.setUint8(0, 0x05); // ACK message type
        view.setUint32(1, bytesReceived, false);
        this.ws.send(ackMsg);
    }
}
            

3. Window-Based Flow Control

// Server-side flow control
type RTWebSocketConn struct {
    ws               *websocket.Conn
    sendWindow       int64
    ackReceived      int64
    bytesSent        int64
    windowUpdateChan chan int64
}

func (c *RTWebSocketConn) enforceFlowControl() error {
    // Wait if we've exceeded the window
    for c.bytesSent-c.ackReceived > c.sendWindow {
        select {
        case update := <-c.windowUpdateChan:
            c.ackReceived = update
        case <-time.After(5 * time.Second):
            return fmt.Errorf("flow control timeout")
        }
    }
    return nil
}
            

4. WebSocketStream: Google's Answer to the Same Problem

WebSocketStream: Native Backpressure at Last

WebSocketStream is a new browser API that provides native backpressure handling, eliminating buffer bloat and memory issues. It's currently behind a flag in Chrome but represents the future of WebSocket streaming.

Enabling WebSocketStream

Chrome Flag Required:

chrome://flags/#enable-experimental-web-platform-features

Enable "Experimental Web Platform features" and restart Chrome

Implementation Comparison

Traditional WebSocket

// Memory issues with fast producers
const ws = new WebSocket(url);
let buffer = [];

ws.onmessage = (event) => {
    // No backpressure! Buffer grows unbounded
    buffer.push(event.data);
    processWhenReady();
};

// Manual buffer management nightmare
function processWhenReady() {
    if (buffer.length > MAX_BUFFER) {
        // Drop frames? Pause? Close?
        console.warn('Buffer overflow!');
    }
}
                    

WebSocketStream (New API)

// Automatic backpressure handling!
const wss = new WebSocketStream(url);
const { readable } = await wss.opened;
const reader = readable.getReader();

// Backpressure applied automatically
while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    // Process at our own pace
    // Stream pauses if we're slow!
    await processFrame(value);
}
                    

Our WebSocketStream Implementation

class WebSocketStreamPlayer {
    constructor(url, videoElement) {
        this.url = url;
        this.video = videoElement;
        this.mediaSource = new MediaSource();
        this.video.src = URL.createObjectURL(this.mediaSource);
        this.sourceBuffer = null;
        this.queue = [];
        this.isProcessing = false;
    }
    
    async connect() {
        // Create WebSocketStream connection
        const wss = new WebSocketStream(this.url);
        
        // Get the readable stream
        const { readable } = await wss.opened;
        const reader = readable.getReader();
        
        // Setup MediaSource
        await this.setupMediaSource();
        
        // Start processing stream with backpressure
        await this.processStream(reader);
    }
    
    async processStream(reader) {
        try {
            while (true) {
                // Read with automatic backpressure
                const { value, done } = await reader.read();
                if (done) break;
                
                // Parse fMP4 segment
                const segment = this.parseSegment(value);
                
                // Queue for MSE processing
                this.queue.push(segment);
                
                // Process queue with rate limiting
                if (!this.isProcessing) {
                    this.processQueue();
                }
            }
        } catch (error) {
            console.error('Stream processing error:', error);
        }
    }
    
    async processQueue() {
        if (this.queue.length === 0 || this.isProcessing) {
            return;
        }
        
        this.isProcessing = true;
        
        while (this.queue.length > 0) {
            const segment = this.queue.shift();
            
            // Wait for source buffer to be ready
            while (this.sourceBuffer.updating) {
                await new Promise(r => setTimeout(r, 10));
            }
            
            // Manage buffer size (prevent overflow)
            if (this.sourceBuffer.buffered.length > 0) {
                const buffered = this.sourceBuffer.buffered;
                const bufferSize = buffered.end(buffered.length - 1) - this.video.currentTime;
                
                if (bufferSize > 10) { // 10 seconds max
                    // Remove old buffer to make room
                    const removeEnd = this.video.currentTime - 2;
                    if (removeEnd > buffered.start(0)) {
                        this.sourceBuffer.remove(buffered.start(0), removeEnd);
                        await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
                    }
                }
            }
            
            // Append segment
            try {
                this.sourceBuffer.appendBuffer(segment);
                await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
            } catch (error) {
                console.error('Buffer append error:', error);
                // Quota exceeded - clear buffer
                if (error.name === 'QuotaExceededError') {
                    await this.clearOldBuffer();
                }
            }
        }
        
        this.isProcessing = false;
    }
    
    async clearOldBuffer() {
        const buffered = this.sourceBuffer.buffered;
        if (buffered.length === 0) return;
        
        const currentTime = this.video.currentTime;
        const removeEnd = Math.max(buffered.start(0), currentTime - 5);
        
        if (removeEnd > buffered.start(0)) {
            this.sourceBuffer.remove(buffered.start(0), removeEnd);
            await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
        }
    }
}
            

5. System Architecture

Complete Pipeline

┌──────────────┐      ┌──────────────────────────────────────────┐      ┌─────────────┐
│   FFmpeg     │      │              MediaMTX + MoQ               │      │   Browser   │
│              │      │                                          │      │             │
│  H.264/AAC   │─────▶│  ┌────────────────────────────────────┐  │      │ WebSocket/  │
│   Encoder    │ RTMP │  │     Stream Processing Pipeline     │  │      │ WebSocket-  │
│              │      │  ├────────────────────────────────────┤  │      │   Stream    │
└──────────────┘      │  │ 1. RTMP Input Handler            │  │      │             │
                      │  │ 2. H.264/AAC Demuxer              │  │─────▶│     MSE     │
                      │  │ 3. fMP4 Segmenter (3 frames)      │  │  WS  │   Player    │
                      │  │ 4. RTWebSocket Protocol Layer     │  │      │             │
                      │  │ 5. WebSocket Transport            │  │      └─────────────┘
                      │  └────────────────────────────────────┘  │
                      │                                          │
                      │  Parallel Transports:                    │
                      │  - :4443 WebTransport Raw (MoQ)          │
                      │  - :4444 Native QUIC (MoQ)               │
                      │  - :4445 WebTransport fMP4 (MoQ)         │
                      │  - :4446 WebSocket fMP4 (RTWebSocket)    │
                      └──────────────────────────────────────────┘
            

Server-Side Components

1. Stream Handler

// internal/servers/websocket/handler.go
type WebSocketHandler struct {
    pathManager *core.PathManager
    parent      *core.Core
    server      *http.Server
    upgrader    websocket.Upgrader
    
    // RTWebSocket specific
    segmenter   *fmp4.Segmenter
    frameBuffer [][]byte
    framesPerSegment int // Default: 3
}

func (h *WebSocketHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Extract stream path from URL
    pathName := strings.TrimPrefix(r.URL.Path, "/ws/")
    
    // Upgrade to WebSocket
    conn, err := h.upgrader.Upgrade(w, r, nil)
    if err != nil {
        return
    }
    
    // Check for WebSocketStream support
    if r.Header.Get("Sec-WebSocket-Protocol") == "websocketstream" {
        // Use WebSocketStream protocol
        session := &WebSocketStreamSession{
            conn: conn,
            handler: h,
            pathName: pathName,
        }
        go session.run()
    } else {
        // Use RTWebSocket protocol
        session := &RTWebSocketSession{
            conn: conn,
            handler: h,
            pathName: pathName,
        }
        go session.run()
    }
}
            

2. fMP4 Segmentation

// internal/servers/websocket/segmenter.go
func (s *Segmenter) CreateSegment(frames []*unit.H264) ([]byte, error) {
    samples := make([]*fmp4.Sample, 0, len(frames))
    
    for i, frame := range frames {
        // Determine frame type
        isKeyframe := false
        for _, nalu := range frame.AU {
            nalType := nalu[0] & 0x1F
            if nalType == 5 { // IDR
                isKeyframe = true
                break
            }
        }
        
        // Convert to fMP4 sample
        sample := &fmp4.Sample{
            Duration:        90000 / 30, // 30fps
            Size:            uint32(len(frame.Data)),
            Flags:           0,
            CompositionTime: 0,
            Data:            frame.Data,
        }
        
        if isKeyframe {
            sample.Flags |= fmp4.SampleFlagIsNonSyncSample
        }
        
        samples = append(samples, sample)
    }
    
    // Create fMP4 segment
    segment := &fmp4.Part{
        SequenceNumber: s.sequenceNumber,
        Tracks: []*fmp4.PartTrack{{
            ID:       1, // Video track
            BaseTime: s.baseTime,
            Samples:  samples,
        }},
    }
    
    s.sequenceNumber++
    s.baseTime += uint64(len(samples)) * (90000 / 30)
    
    // Encode to bytes
    var buf bytes.Buffer
    err := segment.Marshal(&buf)
    return buf.Bytes(), err
}
            

6. The fMP4 Question: Do We Even Need It?

Let's talk about something that's been bothering us. We're wrapping H.264 NAL units in fMP4 containers, then sending them over WebSocket, then unwrapping them in the browser. But why? The browser's Media Source Extensions can handle raw H.264 Annex B format. We're adding complexity and latency for... tradition?

fMP4 made sense for HLS and DASH - you need seekable segments for adaptive streaming. But for real-time streaming where we're sending 3-frame chunks? It's overhead. Each fMP4 segment has moof/mdat boxes, track headers, sample tables. That's hundreds of bytes of metadata for every 100KB of video. At 30fps, that's megabytes of waste per minute.

We tested sending raw NAL units with simple 4-byte length prefixes. Latency dropped by 50-100ms. Bandwidth usage decreased by 3-5%. CPU usage on both server and client went down. The only reason we kept fMP4 was compatibility - some MSE implementations are picky about raw H.264. But for a pure RTWebSocket implementation? Raw NAL units make more sense.

7. Technical Implementation Details

Critical Implementation Challenges

Challenge 1: Stream Initialization Race Condition

Similar to our MoQ implementation, MediaMTX returns a stream object before it's fully initialized:

// The same timing issue we found in MoQ!
func (s *RTWebSocketSession) setupStream() error {
    res, err := s.handler.pathManager.AddReader(PathAddReaderReq{
        Author:   s,
        PathName: s.pathName,
    })
    if err != nil {
        return err
    }
    
    // CRITICAL: Wait for stream.Desc to be populated
    for attempts := 0; attempts < 50; attempts++ {
        if res.Stream.Desc != nil {
            break
        }
        time.Sleep(100 * time.Millisecond)
    }
    
    if res.Stream.Desc == nil {
        return fmt.Errorf("stream not ready after 5 seconds")
    }
    
    // Now safe to setup codec parameters
    return s.setupCodecs(res.Stream)
}
                

Challenge 2: MSE Buffer Management

MediaSource Extensions have complex buffer requirements:

class MSEBufferManager {
    constructor(sourceBuffer, video) {
        this.sourceBuffer = sourceBuffer;
        this.video = video;
        this.maxBufferSize = 30; // seconds
        this.targetBufferSize = 10; // seconds
        this.minBufferSize = 2; // seconds
    }
    
    async manageBuffer() {
        if (!this.sourceBuffer.buffered.length) return;
        
        const buffered = this.sourceBuffer.buffered;
        const currentTime = this.video.currentTime;
        
        // Calculate buffer health
        const bufferEnd = buffered.end(buffered.length - 1);
        const bufferStart = buffered.start(0);
        const bufferAhead = bufferEnd - currentTime;
        const bufferBehind = currentTime - bufferStart;
        
        // Remove old buffer to prevent memory issues
        if (bufferBehind > this.maxBufferSize / 2) {
            const removeEnd = currentTime - this.minBufferSize;
            if (removeEnd > bufferStart) {
                await this.removeBuffer(bufferStart, removeEnd);
            }
        }
        
        // Handle quota exceeded
        if (bufferAhead > this.maxBufferSize) {
            // Clear future buffer that's too far ahead
            const removeStart = currentTime + this.targetBufferSize;
            if (removeStart < bufferEnd) {
                await this.removeBuffer(removeStart, bufferEnd);
            }
        }
    }
    
    async removeBuffer(start, end) {
        if (this.sourceBuffer.updating) {
            await new Promise(r => 
                this.sourceBuffer.addEventListener('updateend', r, { once: true })
            );
        }
        
        try {
            this.sourceBuffer.remove(start, end);
            await new Promise(r => 
                this.sourceBuffer.addEventListener('updateend', r, { once: true })
            );
        } catch (error) {
            console.error('Buffer remove error:', error);
        }
    }
}
                

Binary Protocol Implementation

// WebSocket Binary Message Format
// We use a simple TLV (Type-Length-Value) format

const MessageTypes = {
    INIT: 0x01,      // Codec initialization
    SEGMENT: 0x02,   // fMP4 segment
    KEYFRAME: 0x03,  // Keyframe notification
    AUDIO: 0x04,     // Audio segment
    METADATA: 0x05,  // Stream metadata
    CONTROL: 0x06,   // Flow control
};

class BinaryProtocol {
    static encodeMessage(type, data) {
        const header = new ArrayBuffer(5);
        const view = new DataView(header);
        
        view.setUint8(0, type);
        view.setUint32(1, data.byteLength, false); // big-endian
        
        // Combine header and data
        const message = new Uint8Array(header.byteLength + data.byteLength);
        message.set(new Uint8Array(header), 0);
        message.set(new Uint8Array(data), header.byteLength);
        
        return message.buffer;
    }
    
    static decodeMessage(buffer) {
        const view = new DataView(buffer);
        const type = view.getUint8(0);
        const length = view.getUint32(1, false);
        const data = buffer.slice(5, 5 + length);
        
        return { type, data };
    }
    
    static createInitMessage(codecString, width, height) {
        const encoder = new TextEncoder();
        const codecBytes = encoder.encode(codecString);
        
        const buffer = new ArrayBuffer(codecBytes.length + 8);
        const view = new DataView(buffer);
        
        view.setUint16(0, width, false);
        view.setUint16(2, height, false);
        view.setUint32(4, codecBytes.length, false);
        new Uint8Array(buffer).set(codecBytes, 8);
        
        return this.encodeMessage(MessageTypes.INIT, buffer);
    }
}
            

Timestamp Synchronization

// Critical for smooth playback
class TimestampManager {
    constructor() {
        this.baseTime = null;
        this.lastTimestamp = null;
        this.driftCorrection = 0;
        this.fps = 30;
        this.frameDuration = 1000 / this.fps;
    }
    
    processTimestamp(rtpTimestamp) {
        // Convert RTP timestamp to milliseconds
        const msTimestamp = rtpTimestamp / 90; // 90kHz clock
        
        if (!this.baseTime) {
            this.baseTime = performance.now() - msTimestamp;
            this.lastTimestamp = msTimestamp;
            return msTimestamp;
        }
        
        // Detect and handle timestamp jumps
        const expectedTimestamp = this.lastTimestamp + this.frameDuration;
        const drift = msTimestamp - expectedTimestamp;
        
        if (Math.abs(drift) > 1000) {
            // Large jump - reset base time
            console.warn('Timestamp jump detected:', drift);
            this.baseTime = performance.now() - msTimestamp;
        } else if (Math.abs(drift) > 50) {
            // Small drift - apply correction
            this.driftCorrection += drift * 0.1; // Smooth correction
        }
        
        this.lastTimestamp = msTimestamp;
        return msTimestamp + this.driftCorrection;
    }
    
    getCurrentPlaybackTime() {
        if (!this.baseTime) return 0;
        return performance.now() - this.baseTime;
    }
}
            

8. Performance Analysis

Latency Breakdown

Component Latency Notes
Encoding (FFmpeg) 100-200ms x264 ultrafast preset
Frame Batching 100ms 3 frames @ 30fps
Network (WebSocket) 10-50ms TCP overhead included
fMP4 Segmentation 5-10ms Server-side processing
MSE Buffering 500-1000ms Browser requirement
Decode & Render 16-33ms Hardware accelerated
Total 1-2 seconds End-to-end

Throughput Metrics

Test Configuration

  • Video: 1920x1080 @ 30fps, 2.5 Mbps H.264
  • Audio: AAC 128kbps stereo
  • Server: 8-core CPU, 16GB RAM
  • Network: Gigabit LAN

Results

  • Concurrent Streams: 100+ per server
  • CPU Usage: ~2% per stream
  • Memory: ~10MB per connection
  • Network: ~3 Mbps per stream

WebSocketStream vs Traditional WebSocket

Performance Comparison (1080p @ 30fps stream):

Traditional WebSocket:
┌────────────────────────────────────────┐
│ Memory Usage Over Time                 │
│                                        │
│ 500MB │                      ████████  │ ← Buffer bloat!
│ 400MB │                 █████          │
│ 300MB │            █████               │
│ 200MB │       █████                    │
│ 100MB │  █████                         │
│   0MB └────────────────────────────────┘
│       0    30s   60s   90s   120s      │
└────────────────────────────────────────┘

WebSocketStream:
┌────────────────────────────────────────┐
│ Memory Usage Over Time                 │
│                                        │
│ 500MB │                                │
│ 400MB │                                │
│ 300MB │                                │
│ 200MB │                                │
│ 100MB │ ██████████████████████████████ │ ← Stable!
│   0MB └────────────────────────────────┘
│       0    30s   60s   90s   120s      │
└────────────────────────────────────────┘
            

9. Challenges & Solutions

Major Challenges Encountered

  1. MSE Complexity: Browser MSE APIs are notoriously difficult
  2. Buffer Management: Preventing overflow while maintaining smooth playback
  3. Timestamp Drift: RTP to MSE timestamp conversion issues
  4. Browser Differences: Chrome vs Firefox vs Safari behaviors
  5. Frame Drops: Handling network congestion gracefully

Solution: Adaptive Buffer Management

class AdaptiveBufferController {
    constructor(player) {
        this.player = player;
        this.targetLatency = 2.0; // seconds
        this.minLatency = 1.0;
        this.maxLatency = 5.0;
        this.adjustmentRate = 0.1;
    }
    
    updatePlaybackRate() {
        const buffered = this.player.video.buffered;
        if (buffered.length === 0) return;
        
        const currentTime = this.player.video.currentTime;
        const bufferEnd = buffered.end(buffered.length - 1);
        const latency = bufferEnd - currentTime;
        
        // Adjust playback rate to maintain target latency
        if (latency > this.targetLatency + 0.5) {
            // Speed up to reduce latency
            this.player.video.playbackRate = Math.min(1.5, 1 + this.adjustmentRate);
        } else if (latency < this.targetLatency - 0.5) {
            // Slow down to build buffer
            this.player.video.playbackRate = Math.max(0.5, 1 - this.adjustmentRate);
        } else {
            // Return to normal
            this.player.video.playbackRate = 1.0;
        }
        
        // Emergency measures
        if (latency > this.maxLatency) {
            // Jump forward to reduce latency
            this.player.video.currentTime = bufferEnd - this.targetLatency;
        } else if (latency < this.minLatency && this.player.video.paused) {
            // Resume if we have enough buffer
            this.player.video.play();
        }
    }
}
            

Solution: Graceful Degradation

// Fallback chain for maximum compatibility
class StreamingClient {
    async connect(url) {
        // Try WebSocketStream first (best performance)
        if (typeof WebSocketStream !== 'undefined') {
            try {
                return await this.connectWebSocketStream(url);
            } catch (e) {
                console.warn('WebSocketStream failed, trying RTWebSocket');
            }
        }
        
        // Try RTWebSocket (our custom protocol)
        if (typeof WebSocket !== 'undefined') {
            try {
                return await this.connectRTWebSocket(url);
            } catch (e) {
                console.warn('RTWebSocket failed, trying standard WebSocket');
            }
        }
        
        // Fall back to standard WebSocket
        try {
            return await this.connectStandardWebSocket(url);
        } catch (e) {
            console.error('All WebSocket methods failed');
            throw e;
        }
    }
}
            

10. MediaMTX Integration Path

RTWebSocket as the Universal Transport Layer

Here's the key insight: RTWebSocket shouldn't be viewed as a fallback for when MoQ fails. It's the core protocol layer that makes ALL our transports work reliably for streaming. Every transport - whether it's WebSocket, QUIC, or WebTransport - needs the same application-level semantics: flow separation, prioritization, acknowledgments, graceful degradation.

Our unified streaming architecture with RTWebSocket at its core:

  • RTWebSocket over WebSocket (Port 4446): Universal browser support today
  • RTWebSocket over WebSocketStream: Native backpressure plus protocol control
  • RTWebSocket concepts in MoQ (Ports 4443-4445): Same flow control for QUIC transports
  • Future: RTWebSocket over WebTransport: Best of all worlds when Safari catches up

This isn't multiple protocols - it's one protocol (RTWebSocket) with transport-specific optimizations. The streaming logic stays the same whether you're on Chrome with WebTransport or Safari with WebSocket. That's the power of treating RTWebSocket as the core, not the fallback.

Integration Considerations for Mainline MediaMTX

1. Separate Buffering Server Requirement

// WebSocket streaming may need a dedicated buffer server
// due to different requirements than MoQ

type BufferServer struct {
    // WebSocket needs longer buffers for MSE
    videoRingBuffer *RingBuffer // 5-10 seconds
    audioRingBuffer *RingBuffer // 5-10 seconds
    
    // MoQ needs minimal buffering
    moqVideoBuffer *RingBuffer // 100-200ms
    moqAudioBuffer *RingBuffer // 100-200ms
}

// Potential architecture:
// MediaMTX Core -> Buffer Server -> WebSocket Clients
//              \-> MoQ Server -> MoQ Clients
            

2. Configuration Proposal

# mediamtx.yml additions for WebSocket support
webSocket: yes
webSocketAddress: :4446
webSocketProtocol: rtwebsocket  # or 'standard' or 'stream'

# Buffer management
webSocketBufferSize: 5s         # MSE requirement
webSocketSegmentSize: 3         # frames per segment
webSocketMaxConnections: 100

# Integration with existing MoQ
moq: yes
moqFallbackToWebSocket: yes     # Auto-fallback
moqUnifiedPort: 443             # Single port with protocol detection
            

3. Unified Transport Layer

// Proposed unified transport detection
func (s *Server) handleConnection(w http.ResponseWriter, r *http.Request) {
    // Detect client capabilities
    if r.Header.Get("Sec-WebSocket-Version") != "" {
        // WebSocket connection
        if r.Header.Get("Sec-WebSocket-Protocol") == "rtwebsocket" {
            s.handleRTWebSocket(w, r)
        } else {
            s.handleStandardWebSocket(w, r)
        }
    } else if r.ProtoMajor == 3 {
        // HTTP/3 - could be WebTransport
        if r.Header.Get("Sec-WebTransport") != "" {
            s.handleWebTransport(w, r)
        }
    } else {
        // Regular HTTP - serve player page
        s.servePlayerHTML(w, r)
    }
}
            

11. The Unified Streaming Protocol Vision

Planned Improvements

  • Adaptive Bitrate Streaming (ABR): Multiple quality levels
  • Enhanced Audio Support: Opus codec, spatial audio
  • DVR Functionality: Seek and replay support
  • Encrypted Media Extensions (EME): DRM support
  • WebCodecs Integration: Bypass MSE for lower latency

WebTransport over WebSocket Tunneling

// Future: Tunnel WebTransport through WebSocket for firewall traversal
class WebTransportTunnel {
    constructor(websocketUrl) {
        this.ws = new WebSocket(websocketUrl);
        this.streams = new Map();
    }
    
    async createWebTransport() {
        // Negotiate CONNECT-UDP through WebSocket
        await this.negotiateTunnel();
        
        // Create virtual WebTransport over WebSocket
        return new VirtualWebTransport(this.ws);
    }
    
    async negotiateTunnel() {
        // Send CONNECT-UDP request
        this.ws.send(JSON.stringify({
            method: 'CONNECT-UDP',
            protocol: 'webtransport',
            version: '1'
        }));
        
        // Wait for acceptance
        return new Promise((resolve, reject) => {
            this.ws.onmessage = (event) => {
                const response = JSON.parse(event.data);
                if (response.status === 'accepted') {
                    resolve();
                } else {
                    reject(new Error(response.error));
                }
            };
        });
    }
}
            

Unified Streaming Protocol (USP)

RTWebSocket at the Core

Here's our vision: RTWebSocket isn't just another transport option. It's the core protocol that unifies ALL our streaming transports. Think of it as the application layer that provides consistent streaming semantics regardless of the underlying transport:

  1. RTWebSocket Core Protocol - Flow control, prioritization, acknowledgments
  2. Over QUIC/WebTransport - Leverage UDP for lowest latency when available
  3. Over WebSocketStream - Native backpressure with protocol control
  4. Over WebSocket - Universal compatibility with manual flow control

One protocol implementation, multiple transport optimizations. The client and server negotiate the best available transport, but the streaming logic remains identical. This means:

  • Single codebase for all transports
  • Consistent behavior across different browsers
  • Automatic fallback without protocol changes
  • Future transport technologies just plug in

RTWebSocket becomes the lingua franca of real-time streaming - the protocol that finally brings order to the chaos of browser-based video delivery.

12. Conclusions

What We Achieved

  • 1-2 second latency with WebSocket (5-10x better than HLS)
  • Universal browser compatibility including Safari
  • Automatic backpressure with WebSocketStream API
  • Reliable streaming with RTWebSocket protocol
  • Integration path for MediaMTX mainline

Honest Assessment

  • WebSocket can't match MoQ's sub-300ms latency
  • MSE adds unavoidable buffering delay
  • Safari's WebTransport stance makes MoQ adoption uncertain
  • Apple's 30% market share can't be ignored

Streaming Protocol Latency Comparison (2025)

Protocol Latency Browser Support Reality Check
MoQ (WebTransport) 200-300ms Chrome/Edge only Dead on arrival without Safari
RTWebSocket 1-2 seconds All browsers The pragmatic choice
WebRTC 500ms-1s All browsers Complex STUN/TURN setup
LL-HLS 3-5 seconds All browsers Better than nothing
HLS 10-30 seconds All browsers Legacy, but reliable

The verdict: 1-second latency that works everywhere beats 200ms that only Chrome users can see.

The Pragmatic Reality

WebSocket streaming isn't perfect, but it works everywhere. While we continue pushing the boundaries with Media over QUIC (draft-ietf-moq-transport-05) and WebTransport (W3C WebTransport API), RTWebSocket over WebSocket provides a reliable solution that ensures no viewer is left behind. The combination of our MoQ implementation for cutting-edge browsers and RTWebSocket for universal compatibility gives us the best of both worlds.

Final Thoughts

This work represents our commitment to pragmatic streaming solutions. We're not waiting for perfect standards or universal browser support. We're building what works today while preparing for tomorrow. The RTWebSocket protocol and WebSocketStream implementation may not be standards, but they solve real problems for real users.

Our MediaMTX contributions will continue to evolve. Whether these WebSocket experiments become part of the mainline codebase or remain as reference implementations, they demonstrate our commitment to advancing open-source streaming technology.

Try It Yourself

Experience both WebSocket streaming implementations: