WebSocket Streaming vs WebTransport Analysis 2025

Why WINK Streaming Adopted RTMP-in-JavaScript for Safari & iOS

TL;DR

The Problem: WebTransport (draft-ietf-webtrans-http3-08) achieves 200ms latency but doesn't work on Safari/iOS. Traditional WebSocket has no flow control and causes buffer bloat.

The Solution: RTWebSocket (github.com/zenomt/rtwebsocket) - RTMP's flow control concepts rebuilt for JavaScript and Python. Provides per-flow management, acknowledgments, RTT measurement, and prioritization.

The Results: 1-2 second latency (vs 10-30s for HLS), 100% browser support, integrated with MediaMTX at port 4446. WebSocketStream API (WHATWG proposal) validates the approach with native backpressure.

Key Insight: RTWebSocket shouldn't be a fallback - it's the core protocol layer that works across all transports (WebSocket, WebSocketStream, QUIC, WebTransport).

Here's the story nobody tells you about browser streaming: after months of building cutting-edge MoQ implementations with sub-300ms latency, we discovered half our users couldn't use it. Safari doesn't support WebTransport. iOS doesn't either. That beautiful 200ms latency? Useless if your iPhone users see a blank screen.

That's when we discovered RTWebSocket - a brilliant protocol with implementations in both JavaScript (rtws.js) and Python (rtws.py) at github.com/zenomt/rtwebsocket. The author did something that might sound insane - they rebuilt RTMP's transport concepts for WebSocket. Yes , concepts from that 20-year-old Flash protocol everyone loves to hate. But here's the thing: RTMP solved real problems that WebSocket never addressed. As RFC 6455 Section 1.5 explicitly states, WebSocket provides no flow control. These features - flow control, message prioritization, graceful degradation - aren't legacy, they're exactly what modern streaming needs.

RTWebSocket became our solution to this mess. It takes RTMP's best ideas - chunking, acknowledgments, per-flow management - and runs them over standard WebSocket connections. We get reliable streaming with explicit control over buffering behavior. When a slow client falls behind, we can drop old frames instead of eating all their RAM. When network congestion hits, we know exactly which frames to prioritize.

The results speak for themselves: 1-2 second latency that works on every browser, every device, no exceptions. Is it as fast as our MoQ implementation? No. But it actually works for 100% of users instead of 50%. And in production streaming , that's what matters.

Try it yourself:

RTWebSocket with MSE: moq.wink.co/test-4446-mse.html
WebSocketStream (Chrome experimental): moq.wink.co/test-4450-stream.html

Chrome flags required: chrome://flags/#enable-experimental-web-platform-features

1. Executive Summary
2. Why WebSocket Streaming?
3. RTWebSocket Protocol Innovation
4. WebSocketStream API Implementation
5. System Architecture
6. The fMP4 Question
7. Technical Implementation
8. Performance Analysis
9. Challenges & Solutions
10. MediaMTX Integration Path
11. The Unified Protocol Vision
12. Conclusions

1. The Real Story Behind RTWebSocket

Let me tell you how this actually happened. We were sitting there with our beautiful MoQ implementation, 200ms latency, feeling pretty good about ourselves. Then someone opened Safari. Nothing. Blank screen. "No problem," we thought , "we'll just use WebSocket as a fallback." How hard could it be?

Turns out, very hard. WebSocket has no flow control. None. Zero. If you send video faster than the client can consume it (which always happens), WebSocket just keeps buffering. And buffering. Until Chrome eats 8GB of RAM and the tab crashes. We tried everything - manual buffering, frame dropping, prayer. Nothing worked reliably.

Then we found RTWebSocket. The developer at zenomt had already had the same crazy idea: what if they just... implemented RTMP over WebSocket? RTMP already solved these problems 20 years ago. It has flow control through acknowledgments. It has chunking to handle large frames. It has message prioritization so keyframes get through when the network is congested. Why reinvent the wheel when Adobe already built a perfectly good one?

So that's what RTWebSocket is - RTMP's transport layer concepts rebuilt in JavaScript for the modern web with both JavaScript (rtws.js) and Python (rtws.py) implementations. Each video stream gets its own flow with sequence numbers, acknowledgments, and receive window management. When buffers fill up, old frames can be explicitly abandoned. When networks congest , keyframes get prioritized. It's everything WebSocket should have been for streaming, implemented in about 2000 lines of elegant JavaScript.

Key Achievements

Metric	WebSocket Streaming	Traditional HLS	Our MoQ Implementation
End-to-End Latency	1-2 seconds	10-30 seconds	200-300ms
Startup Time	<1 second	3-5 seconds	<500ms
Browser Support	100% (with fallback)	100%	Chrome/Edge only
Network Resilience	Good (TCP)	Excellent	Moderate (UDP)
Implementation Complexity	Moderate	Low	High

2. Why We Went Down This Rabbit Hole

The simple answer: Safari. The longer answer: after building our beautiful MoQ implementation with sub-300ms latency, we realized we'd built something that only worked for about half our potential users. Chrome and Edge users got the full experience, but Safari users (including everyone on iOS) got nothing. WebTransport simply doesn't exist in Safari's world, and who knows when it will.

WebSocket was the obvious fallback choice. It works everywhere, uses standard ports that firewalls don't hate, and doesn't require the complexity of WebRTC's STUN/TURN dance. The challenge was making it actually good for streaming. Raw WebSocket has no flow control, no backpressure handling, and will happily eat all your RAM if the producer is faster than the consumer (which it always is in video streaming).

So we ended up building what's essentially a streaming protocol on top of WebSocket. Is it elegant? No. Does it work? Yes. And that's the trade-off we accepted - pragmatism over perfection.

3. RTWebSocket: RTMP Reborn in JavaScript

Let's talk about why RTMP was actually brilliant. Not the Flash plugin part - that was terrible. But the underlying protocol? Adobe's engineers knew what they were doing. RTMP has per-stream flow control, so audio doesn't get blocked when video buffers fill. It has explicit acknowledgments , so the server knows exactly how much data the client has received. It chunks large messages, so a 100KB keyframe doesn't block a tiny audio packet.

The zenomt/rtwebsocket implementation takes these concepts and provides them in both JavaScript and Python. RTWebSocket gives you independent flows within a single WebSocket connection. Each flow has its own sequence numbers, its own receive buffer, its own acknowledgment window. Here's what it looks like when we implement it:

// This is what makes RTWebSocket special
const rtws = new RTWebSocket('wss://stream.example.com');

// Open separate flows for video and audio
const videoFlow = rtws.openFlow({ type: 'video' }, rtws.PRI_NORMAL);
const audioFlow = rtws.openFlow({ type: 'audio' }, rtws.PRI_HIGH);

// Each flow manages its own buffering
videoFlow.rcvbuf = 65536;  // 64KB receive buffer
audioFlow.rcvbuf = 8192;   // 8KB for audio

// Messages arrive in order, per flow
videoFlow.onmessage = (sender, data, sequenceNumber) => {
    // If we're falling behind, old messages are automatically abandoned
    // No memory bloat, no manual buffer management
    processVideoFrame(data);
};

// Audio keeps flowing even if video buffers fill
audioFlow.onmessage = (sender, data, sequenceNumber) => {
    processAudioFrame(data);
};

See what's happening here? We're not fighting the browser's buffering. We're explicitly controlling it. When the video buffer fills (because video is always bigger and slower), audio keeps flowing. When the client falls behind, we automatically drop old frames instead of buffering forever. This is what RTMP got right, and what WebSocket never even tried to solve.

The acknowledgment system is pure RTMP inspiration too. Every few KB of data, the client sends back an ACK telling the server "I've received up to byte N". The server uses this for flow control - if ACKs stop coming, it stops sending. No buffer bloat , no memory exhaustion, no crashed tabs. Just reliable streaming that degrades gracefully under pressure.

The RTWebSocket Open Source Implementation

RTWebSocket is fully open source at github.com/zenomt/rtwebsocket. The repository includes both rtws.js (JavaScript) and rtws.py (Python) implementations that wrap standard WebSocket connections (per RFC 6455). WebSocket without flow control is unusable for production streaming. Their solution? Take RTMP's transport concepts - the parts that actually worked - and rebuild them in JavaScript for the modern web.

Protocol Design

The RTWebSocket protocol uses a simple but effective message structure that mirrors RTMP's approach:

RTWebSocket Message Structure:
┌──────────────┬──────────────┬──────────────┬──────────────┐
│  Message ID  │  Stream ID   │  Timestamp   │   Payload    │
│   (4 bytes)  │  (2 bytes)   │  (4 bytes)   │  (variable)  │
└──────────────┴──────────────┴──────────────┴──────────────┘

Message Types:
- 0x01: Video Data (but do we even need fMP4?)
- 0x02: Audio Data (raw AAC works fine)
- 0x03: Metadata (codec config)
- 0x04: Control (play/pause/seek)
- 0x05: Acknowledgment
- 0x06: Window Update (flow control)

Key Features

1. Chunk Streaming

// Server-side chunk transmission
func (c *RTWebSocketConn) SendVideoChunk(data []byte) error {
    // Split large frames into chunks (like RTMP)
    const maxChunkSize = 4096
    
    for offset := 0; offset < len(data); {
        chunkSize := min(maxChunkSize, len(data)-offset)
        chunk := data[offset : offset+chunkSize]
        
        msg := RTMessage{
            MessageID:  c.nextMessageID(),
            StreamID:   VIDEO_STREAM_ID,
            Timestamp:  c.getCurrentTimestamp(),
            ChunkType:  determineChunkType(offset, len(data)),
            Payload:    chunk,
        }
        
        if err := c.sendMessage(msg); err != nil {
            return err
        }
        
        offset += chunkSize
    }
    
    return nil
}

2. Acknowledgment System

// Client-side acknowledgment
class RTWebSocketClient {
    constructor(url) {
        this.ws = new WebSocket(url);
        this.ackWindow = 2500000; // 2.5MB window
        this.bytesReceived = 0;
        this.lastAck = 0;
    }
    
    handleMessage(data) {
        this.bytesReceived += data.byteLength;
        
        // Send ACK every window
        if (this.bytesReceived - this.lastAck >= this.ackWindow) {
            this.sendAck(this.bytesReceived);
            this.lastAck = this.bytesReceived;
        }
        
        // Process message
        this.processRTMessage(data);
    }
    
    sendAck(bytesReceived) {
        const ackMsg = new ArrayBuffer(5);
        const view = new DataView(ackMsg);
        view.setUint8(0, 0x05); // ACK message type
        view.setUint32(1, bytesReceived, false);
        this.ws.send(ackMsg);
    }
}

3. Window-Based Flow Control

// Server-side flow control
type RTWebSocketConn struct {
    ws               *websocket.Conn
    sendWindow       int64
    ackReceived      int64
    bytesSent        int64
    windowUpdateChan chan int64
}

func (c *RTWebSocketConn) enforceFlowControl() error {
    // Wait if we've exceeded the window
    for c.bytesSent-c.ackReceived > c.sendWindow {
        select {
        case update := <-c.windowUpdateChan:
            c.ackReceived = update
        case <-time.After(5 * time.Second):
            return fmt.Errorf("flow control timeout")
        }
    }
    return nil
}

4. WebSocketStream: Google's Answer to the Same Problem

WebSocketStream: Native Backpressure at Last

WebSocketStream is a new browser API that provides native backpressure handling, eliminating buffer bloat and memory issues. It's currently behind a flag in Chrome but represents the future of WebSocket streaming.

Enabling WebSocketStream

Chrome Flag Required:

chrome://flags/#enable-experimental-web-platform-features

Enable "Experimental Web Platform features" and restart Chrome

Implementation Comparison

Traditional WebSocket

// Memory issues with fast producers
const ws = new WebSocket(url);
let buffer = [];

ws.onmessage = (event) => {
    // No backpressure! Buffer grows unbounded
    buffer.push(event.data);
    processWhenReady();
};

// Manual buffer management nightmare
function processWhenReady() {
    if (buffer.length > MAX_BUFFER) {
        // Drop frames? Pause? Close?
        console.warn('Buffer overflow!');
    }
}

WebSocketStream (New API)

// Automatic backpressure handling!
const wss = new WebSocketStream(url);
const { readable } = await wss.opened;
const reader = readable.getReader();

// Backpressure applied automatically
while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    // Process at our own pace
    // Stream pauses if we're slow!
    await processFrame(value);
}

Our WebSocketStream Implementation

class WebSocketStreamPlayer {
    constructor(url, videoElement) {
        this.url = url;
        this.video = videoElement;
        this.mediaSource = new MediaSource();
        this.video.src = URL.createObjectURL(this.mediaSource);
        this.sourceBuffer = null;
        this.queue = [];
        this.isProcessing = false;
    }
    
    async connect() {
        // Create WebSocketStream connection
        const wss = new WebSocketStream(this.url);
        
        // Get the readable stream
        const { readable } = await wss.opened;
        const reader = readable.getReader();
        
        // Setup MediaSource
        await this.setupMediaSource();
        
        // Start processing stream with backpressure
        await this.processStream(reader);
    }
    
    async processStream(reader) {
        try {
            while (true) {
                // Read with automatic backpressure
                const { value, done } = await reader.read();
                if (done) break;
                
                // Parse fMP4 segment
                const segment = this.parseSegment(value);
                
                // Queue for MSE processing
                this.queue.push(segment);
                
                // Process queue with rate limiting
                if (!this.isProcessing) {
                    this.processQueue();
                }
            }
        } catch (error) {
            console.error('Stream processing error:', error);
        }
    }
    
    async processQueue() {
        if (this.queue.length === 0 || this.isProcessing) {
            return;
        }
        
        this.isProcessing = true;
        
        while (this.queue.length > 0) {
            const segment = this.queue.shift();
            
            // Wait for source buffer to be ready
            while (this.sourceBuffer.updating) {
                await new Promise(r => setTimeout(r, 10));
            }
            
            // Manage buffer size (prevent overflow)
            if (this.sourceBuffer.buffered.length > 0) {
                const buffered = this.sourceBuffer.buffered;
                const bufferSize = buffered.end(buffered.length - 1) - this.video.currentTime;
                
                if (bufferSize > 10) { // 10 seconds max
                    // Remove old buffer to make room
                    const removeEnd = this.video.currentTime - 2;
                    if (removeEnd > buffered.start(0)) {
                        this.sourceBuffer.remove(buffered.start(0), removeEnd);
                        await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
                    }
                }
            }
            
            // Append segment
            try {
                this.sourceBuffer.appendBuffer(segment);
                await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
            } catch (error) {
                console.error('Buffer append error:', error);
                // Quota exceeded - clear buffer
                if (error.name === 'QuotaExceededError') {
                    await this.clearOldBuffer();
                }
            }
        }
        
        this.isProcessing = false;
    }
    
    async clearOldBuffer() {
        const buffered = this.sourceBuffer.buffered;
        if (buffered.length === 0) return;
        
        const currentTime = this.video.currentTime;
        const removeEnd = Math.max(buffered.start(0), currentTime - 5);
        
        if (removeEnd > buffered.start(0)) {
            this.sourceBuffer.remove(buffered.start(0), removeEnd);
            await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
        }
    }
}

5. System Architecture

Complete Pipeline

┌──────────────┐      ┌──────────────────────────────────────────┐      ┌─────────────┐
│   FFmpeg     │      │              MediaMTX + MoQ               │      │   Browser   │
│              │      │                                          │      │             │
│  H.264/AAC   │─────▶│  ┌────────────────────────────────────┐  │      │ WebSocket/  │
│   Encoder    │ RTMP │  │     Stream Processing Pipeline     │  │      │ WebSocket-  │
│              │      │  ├────────────────────────────────────┤  │      │   Stream    │
└──────────────┘      │  │ 1. RTMP Input Handler            │  │      │             │
                      │  │ 2. H.264/AAC Demuxer              │  │─────▶│     MSE     │
                      │  │ 3. fMP4 Segmenter (3 frames)      │  │  WS  │   Player    │
                      │  │ 4. RTWebSocket Protocol Layer     │  │      │             │
                      │  │ 5. WebSocket Transport            │  │      └─────────────┘
                      │  └────────────────────────────────────┘  │
                      │                                          │
                      │  Parallel Transports:                    │
                      │  - :4443 WebTransport Raw (MoQ)          │
                      │  - :4444 Native QUIC (MoQ)               │
                      │  - :4445 WebTransport fMP4 (MoQ)         │
                      │  - :4446 WebSocket fMP4 (RTWebSocket)    │
                      └──────────────────────────────────────────┘

Server-Side Components

1. Stream Handler

// internal/servers/websocket/handler.go
type WebSocketHandler struct {
    pathManager *core.PathManager
    parent      *core.Core
    server      *http.Server
    upgrader    websocket.Upgrader
    
    // RTWebSocket specific
    segmenter   *fmp4.Segmenter
    frameBuffer [][]byte
    framesPerSegment int // Default: 3
}

func (h *WebSocketHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Extract stream path from URL
    pathName := strings.TrimPrefix(r.URL.Path, "/ws/")
    
    // Upgrade to WebSocket
    conn, err := h.upgrader.Upgrade(w, r, nil)
    if err != nil {
        return
    }
    
    // Check for WebSocketStream support
    if r.Header.Get("Sec-WebSocket-Protocol") == "websocketstream" {
        // Use WebSocketStream protocol
        session := &WebSocketStreamSession{
            conn: conn,
            handler: h,
            pathName: pathName,
        }
        go session.run()
    } else {
        // Use RTWebSocket protocol
        session := &RTWebSocketSession{
            conn: conn,
            handler: h,
            pathName: pathName,
        }
        go session.run()
    }
}

2. fMP4 Segmentation

// internal/servers/websocket/segmenter.go
func (s *Segmenter) CreateSegment(frames []*unit.H264) ([]byte, error) {
    samples := make([]*fmp4.Sample, 0, len(frames))
    
    for i, frame := range frames {
        // Determine frame type
        isKeyframe := false
        for _, nalu := range frame.AU {
            nalType := nalu[0] & 0x1F
            if nalType == 5 { // IDR
                isKeyframe = true
                break
            }
        }
        
        // Convert to fMP4 sample
        sample := &fmp4.Sample{
            Duration:        90000 / 30, // 30fps
            Size:            uint32(len(frame.Data)),
            Flags:           0,
            CompositionTime: 0,
            Data:            frame.Data,
        }
        
        if isKeyframe {
            sample.Flags |= fmp4.SampleFlagIsNonSyncSample
        }
        
        samples = append(samples, sample)
    }
    
    // Create fMP4 segment
    segment := &fmp4.Part{
        SequenceNumber: s.sequenceNumber,
        Tracks: []*fmp4.PartTrack{{
            ID:       1, // Video track
            BaseTime: s.baseTime,
            Samples:  samples,
        }},
    }
    
    s.sequenceNumber++
    s.baseTime += uint64(len(samples)) * (90000 / 30)
    
    // Encode to bytes
    var buf bytes.Buffer
    err := segment.Marshal(&buf)
    return buf.Bytes(), err
}

6. The fMP4 Question: Do We Even Need It?

Let's talk about something that's been bothering us. We're wrapping H.264 NAL units in fMP4 containers, then sending them over WebSocket, then unwrapping them in the browser. But why? The browser's Media Source Extensions can handle raw H.264 Annex B format. We're adding complexity and latency for... tradition?

fMP4 made sense for HLS and DASH - you need seekable segments for adaptive streaming. But for real-time streaming where we're sending 3-frame chunks? It's overhead. Each fMP4 segment has moof/mdat boxes, track headers, sample tables. That's hundreds of bytes of metadata for every 100KB of video. At 30fps, that's megabytes of waste per minute.

We tested sending raw NAL units with simple 4-byte length prefixes. Latency dropped by 50-100ms. Bandwidth usage decreased by 3-5%. CPU usage on both server and client went down. The only reason we kept fMP4 was compatibility - some MSE implementations are picky about raw H.264. But for a pure RTWebSocket implementation? Raw NAL units make more sense.

7. Technical Implementation Details

Critical Implementation Challenges

Challenge 1: Stream Initialization Race Condition

Similar to our MoQ implementation, MediaMTX returns a stream object before it's fully initialized:

// The same timing issue we found in MoQ!
func (s *RTWebSocketSession) setupStream() error {
    res, err := s.handler.pathManager.AddReader(PathAddReaderReq{
        Author:   s,
        PathName: s.pathName,
    })
    if err != nil {
        return err
    }
    
    // CRITICAL: Wait for stream.Desc to be populated
    for attempts := 0; attempts < 50; attempts++ {
        if res.Stream.Desc != nil {
            break
        }
        time.Sleep(100 * time.Millisecond)
    }
    
    if res.Stream.Desc == nil {
        return fmt.Errorf("stream not ready after 5 seconds")
    }
    
    // Now safe to setup codec parameters
    return s.setupCodecs(res.Stream)
}

Challenge 2: MSE Buffer Management

MediaSource Extensions have complex buffer requirements:

class MSEBufferManager {
    constructor(sourceBuffer, video) {
        this.sourceBuffer = sourceBuffer;
        this.video = video;
        this.maxBufferSize = 30; // seconds
        this.targetBufferSize = 10; // seconds
        this.minBufferSize = 2; // seconds
    }
    
    async manageBuffer() {
        if (!this.sourceBuffer.buffered.length) return;
        
        const buffered = this.sourceBuffer.buffered;
        const currentTime = this.video.currentTime;
        
        // Calculate buffer health
        const bufferEnd = buffered.end(buffered.length - 1);
        const bufferStart = buffered.start(0);
        const bufferAhead = bufferEnd - currentTime;
        const bufferBehind = currentTime - bufferStart;
        
        // Remove old buffer to prevent memory issues
        if (bufferBehind > this.maxBufferSize / 2) {
            const removeEnd = currentTime - this.minBufferSize;
            if (removeEnd > bufferStart) {
                await this.removeBuffer(bufferStart, removeEnd);
            }
        }
        
        // Handle quota exceeded
        if (bufferAhead > this.maxBufferSize) {
            // Clear future buffer that's too far ahead
            const removeStart = currentTime + this.targetBufferSize;
            if (removeStart < bufferEnd) {
                await this.removeBuffer(removeStart, bufferEnd);
            }
        }
    }
    
    async removeBuffer(start, end) {
        if (this.sourceBuffer.updating) {
            await new Promise(r => 
                this.sourceBuffer.addEventListener('updateend', r, { once: true })
            );
        }
        
        try {
            this.sourceBuffer.remove(start, end);
            await new Promise(r => 
                this.sourceBuffer.addEventListener('updateend', r, { once: true })
            );
        } catch (error) {
            console.error('Buffer remove error:', error);
        }
    }
}

Binary Protocol Implementation

// WebSocket Binary Message Format
// We use a simple TLV (Type-Length-Value) format

const MessageTypes = {
    INIT: 0x01,      // Codec initialization
    SEGMENT: 0x02,   // fMP4 segment
    KEYFRAME: 0x03,  // Keyframe notification
    AUDIO: 0x04,     // Audio segment
    METADATA: 0x05,  // Stream metadata
    CONTROL: 0x06,   // Flow control
};

class BinaryProtocol {
    static encodeMessage(type, data) {
        const header = new ArrayBuffer(5);
        const view = new DataView(header);
        
        view.setUint8(0, type);
        view.setUint32(1, data.byteLength, false); // big-endian
        
        // Combine header and data
        const message = new Uint8Array(header.byteLength + data.byteLength);
        message.set(new Uint8Array(header), 0);
        message.set(new Uint8Array(data), header.byteLength);
        
        return message.buffer;
    }
    
    static decodeMessage(buffer) {
        const view = new DataView(buffer);
        const type = view.getUint8(0);
        const length = view.getUint32(1, false);
        const data = buffer.slice(5, 5 + length);
        
        return { type, data };
    }
    
    static createInitMessage(codecString, width, height) {
        const encoder = new TextEncoder();
        const codecBytes = encoder.encode(codecString);
        
        const buffer = new ArrayBuffer(codecBytes.length + 8);
        const view = new DataView(buffer);
        
        view.setUint16(0, width, false);
        view.setUint16(2, height, false);
        view.setUint32(4, codecBytes.length, false);
        new Uint8Array(buffer).set(codecBytes, 8);
        
        return this.encodeMessage(MessageTypes.INIT, buffer);
    }
}

Timestamp Synchronization

// Critical for smooth playback
class TimestampManager {
    constructor() {
        this.baseTime = null;
        this.lastTimestamp = null;
        this.driftCorrection = 0;
        this.fps = 30;
        this.frameDuration = 1000 / this.fps;
    }
    
    processTimestamp(rtpTimestamp) {
        // Convert RTP timestamp to milliseconds
        const msTimestamp = rtpTimestamp / 90; // 90kHz clock
        
        if (!this.baseTime) {
            this.baseTime = performance.now() - msTimestamp;
            this.lastTimestamp = msTimestamp;
            return msTimestamp;
        }
        
        // Detect and handle timestamp jumps
        const expectedTimestamp = this.lastTimestamp + this.frameDuration;
        const drift = msTimestamp - expectedTimestamp;
        
        if (Math.abs(drift) > 1000) {
            // Large jump - reset base time
            console.warn('Timestamp jump detected:', drift);
            this.baseTime = performance.now() - msTimestamp;
        } else if (Math.abs(drift) > 50) {
            // Small drift - apply correction
            this.driftCorrection += drift * 0.1; // Smooth correction
        }
        
        this.lastTimestamp = msTimestamp;
        return msTimestamp + this.driftCorrection;
    }
    
    getCurrentPlaybackTime() {
        if (!this.baseTime) return 0;
        return performance.now() - this.baseTime;
    }
}

8. Performance Analysis

Latency Breakdown

Component	Latency	Notes
Encoding (FFmpeg)	100-200ms	x264 ultrafast preset
Frame Batching	100ms	3 frames @ 30fps
Network (WebSocket)	10-50ms	TCP overhead included
fMP4 Segmentation	5-10ms	Server-side processing
MSE Buffering	500-1000ms	Browser requirement
Decode & Render	16-33ms	Hardware accelerated
Total	1-2 seconds	End-to-end

Throughput Metrics

Test Configuration

Video: 1920x1080 @ 30fps, 2.5 Mbps H.264
Audio: AAC 128kbps stereo
Server: 8-core CPU, 16GB RAM
Network: Gigabit LAN

Results

Concurrent Streams: 100+ per server
CPU Usage: ~2% per stream
Memory: ~10MB per connection
Network: ~3 Mbps per stream

WebSocketStream vs Traditional WebSocket

Performance Comparison (1080p @ 30fps stream):

Traditional WebSocket:
┌────────────────────────────────────────┐
│ Memory Usage Over Time                 │
│                                        │
│ 500MB │                      ████████  │ ← Buffer bloat!
│ 400MB │                 █████          │
│ 300MB │            █████               │
│ 200MB │       █████                    │
│ 100MB │  █████                         │
│   0MB └────────────────────────────────┘
│       0    30s   60s   90s   120s      │
└────────────────────────────────────────┘

WebSocketStream:
┌────────────────────────────────────────┐
│ Memory Usage Over Time                 │
│                                        │
│ 500MB │                                │
│ 400MB │                                │
│ 300MB │                                │
│ 200MB │                                │
│ 100MB │ ██████████████████████████████ │ ← Stable!
│   0MB └────────────────────────────────┘
│       0    30s   60s   90s   120s      │
└────────────────────────────────────────┘

9. Challenges & Solutions

Major Challenges Encountered

MSE Complexity: Browser MSE APIs are notoriously difficult
Buffer Management: Preventing overflow while maintaining smooth playback
Timestamp Drift: RTP to MSE timestamp conversion issues
Browser Differences: Chrome vs Firefox vs Safari behaviors
Frame Drops: Handling network congestion gracefully

Solution: Adaptive Buffer Management

class AdaptiveBufferController {
    constructor(player) {
        this.player = player;
        this.targetLatency = 2.0; // seconds
        this.minLatency = 1.0;
        this.maxLatency = 5.0;
        this.adjustmentRate = 0.1;
    }
    
    updatePlaybackRate() {
        const buffered = this.player.video.buffered;
        if (buffered.length === 0) return;
        
        const currentTime = this.player.video.currentTime;
        const bufferEnd = buffered.end(buffered.length - 1);
        const latency = bufferEnd - currentTime;
        
        // Adjust playback rate to maintain target latency
        if (latency > this.targetLatency + 0.5) {
            // Speed up to reduce latency
            this.player.video.playbackRate = Math.min(1.5, 1 + this.adjustmentRate);
        } else if (latency < this.targetLatency - 0.5) {
            // Slow down to build buffer
            this.player.video.playbackRate = Math.max(0.5, 1 - this.adjustmentRate);
        } else {
            // Return to normal
            this.player.video.playbackRate = 1.0;
        }
        
        // Emergency measures
        if (latency > this.maxLatency) {
            // Jump forward to reduce latency
            this.player.video.currentTime = bufferEnd - this.targetLatency;
        } else if (latency < this.minLatency && this.player.video.paused) {
            // Resume if we have enough buffer
            this.player.video.play();
        }
    }
}

Solution: Graceful Degradation

// Fallback chain for maximum compatibility
class StreamingClient {
    async connect(url) {
        // Try WebSocketStream first (best performance)
        if (typeof WebSocketStream !== 'undefined') {
            try {
                return await this.connectWebSocketStream(url);
            } catch (e) {
                console.warn('WebSocketStream failed, trying RTWebSocket');
            }
        }
        
        // Try RTWebSocket (our custom protocol)
        if (typeof WebSocket !== 'undefined') {
            try {
                return await this.connectRTWebSocket(url);
            } catch (e) {
                console.warn('RTWebSocket failed, trying standard WebSocket');
            }
        }
        
        // Fall back to standard WebSocket
        try {
            return await this.connectStandardWebSocket(url);
        } catch (e) {
            console.error('All WebSocket methods failed');
            throw e;
        }
    }
}

10. MediaMTX Integration Path

RTWebSocket as the Universal Transport Layer

Here's the key insight: RTWebSocket shouldn't be viewed as a fallback for when MoQ fails. It's the core protocol layer that makes ALL our transports work reliably for streaming. Every transport - whether it's WebSocket, QUIC, or WebTransport - needs the same application-level semantics: flow separation, prioritization, acknowledgments, graceful degradation.

Our unified streaming architecture with RTWebSocket at its core:

RTWebSocket over WebSocket (Port 4446): Universal browser support today
RTWebSocket over WebSocketStream: Native backpressure plus protocol control
RTWebSocket concepts in MoQ (Ports 4443-4445): Same flow control for QUIC transports
Future: RTWebSocket over WebTransport: Best of all worlds when Safari catches up

This isn't multiple protocols - it's one protocol (RTWebSocket) with transport-specific optimizations. The streaming logic stays the same whether you're on Chrome with WebTransport or Safari with WebSocket. That's the power of treating RTWebSocket as the core, not the fallback.

Integration Considerations for Mainline MediaMTX

1. Separate Buffering Server Requirement

// WebSocket streaming may need a dedicated buffer server
// due to different requirements than MoQ

type BufferServer struct {
    // WebSocket needs longer buffers for MSE
    videoRingBuffer *RingBuffer // 5-10 seconds
    audioRingBuffer *RingBuffer // 5-10 seconds
    
    // MoQ needs minimal buffering
    moqVideoBuffer *RingBuffer // 100-200ms
    moqAudioBuffer *RingBuffer // 100-200ms
}

// Potential architecture:
// MediaMTX Core -> Buffer Server -> WebSocket Clients
//              \-> MoQ Server -> MoQ Clients

2. Configuration Proposal

# mediamtx.yml additions for WebSocket support
webSocket: yes
webSocketAddress: :4446
webSocketProtocol: rtwebsocket  # or 'standard' or 'stream'

# Buffer management
webSocketBufferSize: 5s         # MSE requirement
webSocketSegmentSize: 3         # frames per segment
webSocketMaxConnections: 100

# Integration with existing MoQ
moq: yes
moqFallbackToWebSocket: yes     # Auto-fallback
moqUnifiedPort: 443             # Single port with protocol detection

3. Unified Transport Layer

// Proposed unified transport detection
func (s *Server) handleConnection(w http.ResponseWriter, r *http.Request) {
    // Detect client capabilities
    if r.Header.Get("Sec-WebSocket-Version") != "" {
        // WebSocket connection
        if r.Header.Get("Sec-WebSocket-Protocol") == "rtwebsocket" {
            s.handleRTWebSocket(w, r)
        } else {
            s.handleStandardWebSocket(w, r)
        }
    } else if r.ProtoMajor == 3 {
        // HTTP/3 - could be WebTransport
        if r.Header.Get("Sec-WebTransport") != "" {
            s.handleWebTransport(w, r)
        }
    } else {
        // Regular HTTP - serve player page
        s.servePlayerHTML(w, r)
    }
}

11. The Unified Streaming Protocol Vision

Planned Improvements

Adaptive Bitrate Streaming (ABR): Multiple quality levels
Enhanced Audio Support: Opus codec, spatial audio
DVR Functionality: Seek and replay support
Encrypted Media Extensions (EME): DRM support
WebCodecs Integration: Bypass MSE for lower latency

WebTransport over WebSocket Tunneling

// Future: Tunnel WebTransport through WebSocket for firewall traversal
class WebTransportTunnel {
    constructor(websocketUrl) {
        this.ws = new WebSocket(websocketUrl);
        this.streams = new Map();
    }
    
    async createWebTransport() {
        // Negotiate CONNECT-UDP through WebSocket
        await this.negotiateTunnel();
        
        // Create virtual WebTransport over WebSocket
        return new VirtualWebTransport(this.ws);
    }
    
    async negotiateTunnel() {
        // Send CONNECT-UDP request
        this.ws.send(JSON.stringify({
            method: 'CONNECT-UDP',
            protocol: 'webtransport',
            version: '1'
        }));
        
        // Wait for acceptance
        return new Promise((resolve, reject) => {
            this.ws.onmessage = (event) => {
                const response = JSON.parse(event.data);
                if (response.status === 'accepted') {
                    resolve();
                } else {
                    reject(new Error(response.error));
                }
            };
        });
    }
}

Unified Streaming Protocol (USP)

RTWebSocket at the Core

Here's our vision: RTWebSocket isn't just another transport option. It's the core protocol that unifies ALL our streaming transports. Think of it as the application layer that provides consistent streaming semantics regardless of the underlying transport:

RTWebSocket Core Protocol - Flow control, prioritization, acknowledgments
Over QUIC/WebTransport - Leverage UDP for lowest latency when available
Over WebSocketStream - Native backpressure with protocol control
Over WebSocket - Universal compatibility with manual flow control

One protocol implementation, multiple transport optimizations. The client and server negotiate the best available transport, but the streaming logic remains identical. This means:

Single codebase for all transports
Consistent behavior across different browsers
Automatic fallback without protocol changes
Future transport technologies just plug in

RTWebSocket becomes the lingua franca of real-time streaming - the protocol that finally brings order to the chaos of browser-based video delivery.

12. Conclusions

What We Achieved

1-2 second latency with WebSocket (5-10x better than HLS)
Universal browser compatibility including Safari
Automatic backpressure with WebSocketStream API
Reliable streaming with RTWebSocket protocol
Integration path for MediaMTX mainline

Honest Assessment

WebSocket can't match MoQ's sub-300ms latency
MSE adds unavoidable buffering delay
Safari's WebTransport stance makes MoQ adoption uncertain
Apple's 30% market share can't be ignored

Streaming Protocol Latency Comparison (2025)

Protocol	Latency	Browser Support	Reality Check
MoQ (WebTransport)	200-300ms	Chrome/Edge only	Dead on arrival without Safari
RTWebSocket	1-2 seconds	All browsers	The pragmatic choice
WebRTC	500ms-1s	All browsers	Complex STUN/TURN setup
LL-HLS	3-5 seconds	All browsers	Better than nothing
HLS	10-30 seconds	All browsers	Legacy, but reliable

The verdict: 1-second latency that works everywhere beats 200ms that only Chrome users can see.

The Pragmatic Reality

WebSocket streaming isn't perfect, but it works everywhere. While we continue pushing the boundaries with Media over QUIC (draft-ietf-moq-transport-05) and WebTransport (W3C WebTransport API), RTWebSocket over WebSocket provides a reliable solution that ensures no viewer is left behind. The combination of our MoQ implementation for cutting-edge browsers and RTWebSocket for universal compatibility gives us the best of both worlds.

Final Thoughts

This work represents our commitment to pragmatic streaming solutions. We're not waiting for perfect standards or universal browser support. We're building what works today while preparing for tomorrow. The RTWebSocket protocol and WebSocketStream implementation may not be standards, but they solve real problems for real users.

Our MediaMTX contributions will continue to evolve. Whether these WebSocket experiments become part of the mainline codebase or remain as reference implementations, they demonstrate our commitment to advancing open-source streaming technology.

Try It Yourself

Experience both WebSocket streaming implementations:

RTWebSocket Demo WebSocketStream Demo Source Code Contact Us