The Problem: WebTransport (draft-ietf-webtrans-http3-08) achieves 200ms latency but doesn't work on Safari/iOS. Traditional WebSocket has no flow control and causes buffer bloat.
The Solution: RTWebSocket (github.com/zenomt/rtwebsocket) - RTMP's flow control concepts rebuilt for JavaScript and Python. Provides per-flow management, acknowledgments, RTT measurement, and prioritization.
The Results: 1-2 second latency (vs 10-30s for HLS), 100% browser support, integrated with MediaMTX at port 4446. WebSocketStream API (WHATWG proposal) validates the approach with native backpressure.
Key Insight: RTWebSocket shouldn't be a fallback - it's the core protocol layer that works across all transports (WebSocket, WebSocketStream, QUIC, WebTransport).
Here's the story nobody tells you about browser streaming: after months of building cutting-edge MoQ implementations with sub-300ms latency, we discovered half our users couldn't use it. Safari doesn't support WebTransport. iOS doesn't either. That beautiful 200ms latency? Useless if your iPhone users see a blank screen.
That's when we discovered RTWebSocket - a brilliant protocol with implementations in both JavaScript (rtws.js) and Python (rtws.py) at github.com/zenomt/rtwebsocket. The author did something that might sound insane - they rebuilt RTMP's transport concepts for WebSocket. Yes , concepts from that 20-year-old Flash protocol everyone loves to hate. But here's the thing: RTMP solved real problems that WebSocket never addressed. As RFC 6455 Section 1.5 explicitly states, WebSocket provides no flow control. These features - flow control, message prioritization, graceful degradation - aren't legacy, they're exactly what modern streaming needs.
RTWebSocket became our solution to this mess. It takes RTMP's best ideas - chunking, acknowledgments, per-flow management - and runs them over standard WebSocket connections. We get reliable streaming with explicit control over buffering behavior. When a slow client falls behind, we can drop old frames instead of eating all their RAM. When network congestion hits, we know exactly which frames to prioritize.
The results speak for themselves: 1-2 second latency that works on every browser, every device, no exceptions. Is it as fast as our MoQ implementation? No. But it actually works for 100% of users instead of 50%. And in production streaming , that's what matters.
Try it yourself:
Chrome flags required: chrome://flags/#enable-experimental-web-platform-features
Let me tell you how this actually happened. We were sitting there with our beautiful MoQ implementation, 200ms latency, feeling pretty good about ourselves. Then someone opened Safari. Nothing. Blank screen. "No problem," we thought , "we'll just use WebSocket as a fallback." How hard could it be?
Turns out, very hard. WebSocket has no flow control. None. Zero. If you send video faster than the client can consume it (which always happens), WebSocket just keeps buffering. And buffering. Until Chrome eats 8GB of RAM and the tab crashes. We tried everything - manual buffering, frame dropping, prayer. Nothing worked reliably.
Then we found RTWebSocket. The developer at zenomt had already had the same crazy idea: what if they just... implemented RTMP over WebSocket? RTMP already solved these problems 20 years ago. It has flow control through acknowledgments. It has chunking to handle large frames. It has message prioritization so keyframes get through when the network is congested. Why reinvent the wheel when Adobe already built a perfectly good one?
So that's what RTWebSocket is - RTMP's transport layer concepts rebuilt in JavaScript for the modern web with both JavaScript (rtws.js) and Python (rtws.py) implementations. Each video stream gets its own flow with sequence numbers, acknowledgments, and receive window management. When buffers fill up, old frames can be explicitly abandoned. When networks congest , keyframes get prioritized. It's everything WebSocket should have been for streaming, implemented in about 2000 lines of elegant JavaScript.
| Metric | WebSocket Streaming | Traditional HLS | Our MoQ Implementation |
|---|---|---|---|
| End-to-End Latency | 1-2 seconds | 10-30 seconds | 200-300ms |
| Startup Time | <1 second | 3-5 seconds | <500ms |
| Browser Support | 100% (with fallback) | 100% | Chrome/Edge only |
| Network Resilience | Good (TCP) | Excellent | Moderate (UDP) |
| Implementation Complexity | Moderate | Low | High |
The simple answer: Safari. The longer answer: after building our beautiful MoQ implementation with sub-300ms latency, we realized we'd built something that only worked for about half our potential users. Chrome and Edge users got the full experience, but Safari users (including everyone on iOS) got nothing. WebTransport simply doesn't exist in Safari's world, and who knows when it will.
WebSocket was the obvious fallback choice. It works everywhere, uses standard ports that firewalls don't hate, and doesn't require the complexity of WebRTC's STUN/TURN dance. The challenge was making it actually good for streaming. Raw WebSocket has no flow control, no backpressure handling, and will happily eat all your RAM if the producer is faster than the consumer (which it always is in video streaming).
So we ended up building what's essentially a streaming protocol on top of WebSocket. Is it elegant? No. Does it work? Yes. And that's the trade-off we accepted - pragmatism over perfection.
Let's talk about why RTMP was actually brilliant. Not the Flash plugin part - that was terrible. But the underlying protocol? Adobe's engineers knew what they were doing. RTMP has per-stream flow control, so audio doesn't get blocked when video buffers fill. It has explicit acknowledgments , so the server knows exactly how much data the client has received. It chunks large messages, so a 100KB keyframe doesn't block a tiny audio packet.
The zenomt/rtwebsocket implementation takes these concepts and provides them in both JavaScript and Python. RTWebSocket gives you independent flows within a single WebSocket connection. Each flow has its own sequence numbers, its own receive buffer, its own acknowledgment window. Here's what it looks like when we implement it:
// This is what makes RTWebSocket special
const rtws = new RTWebSocket('wss://stream.example.com');
// Open separate flows for video and audio
const videoFlow = rtws.openFlow({ type: 'video' }, rtws.PRI_NORMAL);
const audioFlow = rtws.openFlow({ type: 'audio' }, rtws.PRI_HIGH);
// Each flow manages its own buffering
videoFlow.rcvbuf = 65536; // 64KB receive buffer
audioFlow.rcvbuf = 8192; // 8KB for audio
// Messages arrive in order, per flow
videoFlow.onmessage = (sender, data, sequenceNumber) => {
// If we're falling behind, old messages are automatically abandoned
// No memory bloat, no manual buffer management
processVideoFrame(data);
};
// Audio keeps flowing even if video buffers fill
audioFlow.onmessage = (sender, data, sequenceNumber) => {
processAudioFrame(data);
};
See what's happening here? We're not fighting the browser's buffering. We're explicitly controlling it. When the video buffer fills (because video is always bigger and slower), audio keeps flowing. When the client falls behind, we automatically drop old frames instead of buffering forever. This is what RTMP got right, and what WebSocket never even tried to solve.
The acknowledgment system is pure RTMP inspiration too. Every few KB of data, the client sends back an ACK telling the server "I've received up to byte N". The server uses this for flow control - if ACKs stop coming, it stops sending. No buffer bloat , no memory exhaustion, no crashed tabs. Just reliable streaming that degrades gracefully under pressure.
RTWebSocket is fully open source at github.com/zenomt/rtwebsocket. The repository includes both rtws.js (JavaScript) and rtws.py (Python) implementations that wrap standard WebSocket connections (per RFC 6455). WebSocket without flow control is unusable for production streaming. Their solution? Take RTMP's transport concepts - the parts that actually worked - and rebuild them in JavaScript for the modern web.
The RTWebSocket protocol uses a simple but effective message structure that mirrors RTMP's approach:
RTWebSocket Message Structure:
┌──────────────┬──────────────┬──────────────┬──────────────┐
│ Message ID │ Stream ID │ Timestamp │ Payload │
│ (4 bytes) │ (2 bytes) │ (4 bytes) │ (variable) │
└──────────────┴──────────────┴──────────────┴──────────────┘
Message Types:
- 0x01: Video Data (but do we even need fMP4?)
- 0x02: Audio Data (raw AAC works fine)
- 0x03: Metadata (codec config)
- 0x04: Control (play/pause/seek)
- 0x05: Acknowledgment
- 0x06: Window Update (flow control)
// Server-side chunk transmission
func (c *RTWebSocketConn) SendVideoChunk(data []byte) error {
// Split large frames into chunks (like RTMP)
const maxChunkSize = 4096
for offset := 0; offset < len(data); {
chunkSize := min(maxChunkSize, len(data)-offset)
chunk := data[offset : offset+chunkSize]
msg := RTMessage{
MessageID: c.nextMessageID(),
StreamID: VIDEO_STREAM_ID,
Timestamp: c.getCurrentTimestamp(),
ChunkType: determineChunkType(offset, len(data)),
Payload: chunk,
}
if err := c.sendMessage(msg); err != nil {
return err
}
offset += chunkSize
}
return nil
}
// Client-side acknowledgment
class RTWebSocketClient {
constructor(url) {
this.ws = new WebSocket(url);
this.ackWindow = 2500000; // 2.5MB window
this.bytesReceived = 0;
this.lastAck = 0;
}
handleMessage(data) {
this.bytesReceived += data.byteLength;
// Send ACK every window
if (this.bytesReceived - this.lastAck >= this.ackWindow) {
this.sendAck(this.bytesReceived);
this.lastAck = this.bytesReceived;
}
// Process message
this.processRTMessage(data);
}
sendAck(bytesReceived) {
const ackMsg = new ArrayBuffer(5);
const view = new DataView(ackMsg);
view.setUint8(0, 0x05); // ACK message type
view.setUint32(1, bytesReceived, false);
this.ws.send(ackMsg);
}
}
// Server-side flow control
type RTWebSocketConn struct {
ws *websocket.Conn
sendWindow int64
ackReceived int64
bytesSent int64
windowUpdateChan chan int64
}
func (c *RTWebSocketConn) enforceFlowControl() error {
// Wait if we've exceeded the window
for c.bytesSent-c.ackReceived > c.sendWindow {
select {
case update := <-c.windowUpdateChan:
c.ackReceived = update
case <-time.After(5 * time.Second):
return fmt.Errorf("flow control timeout")
}
}
return nil
}
WebSocketStream is a new browser API that provides native backpressure handling, eliminating buffer bloat and memory issues. It's currently behind a flag in Chrome but represents the future of WebSocket streaming.
Chrome Flag Required:
chrome://flags/#enable-experimental-web-platform-features
Enable "Experimental Web Platform features" and restart Chrome
// Memory issues with fast producers
const ws = new WebSocket(url);
let buffer = [];
ws.onmessage = (event) => {
// No backpressure! Buffer grows unbounded
buffer.push(event.data);
processWhenReady();
};
// Manual buffer management nightmare
function processWhenReady() {
if (buffer.length > MAX_BUFFER) {
// Drop frames? Pause? Close?
console.warn('Buffer overflow!');
}
}
// Automatic backpressure handling!
const wss = new WebSocketStream(url);
const { readable } = await wss.opened;
const reader = readable.getReader();
// Backpressure applied automatically
while (true) {
const { value, done } = await reader.read();
if (done) break;
// Process at our own pace
// Stream pauses if we're slow!
await processFrame(value);
}
class WebSocketStreamPlayer {
constructor(url, videoElement) {
this.url = url;
this.video = videoElement;
this.mediaSource = new MediaSource();
this.video.src = URL.createObjectURL(this.mediaSource);
this.sourceBuffer = null;
this.queue = [];
this.isProcessing = false;
}
async connect() {
// Create WebSocketStream connection
const wss = new WebSocketStream(this.url);
// Get the readable stream
const { readable } = await wss.opened;
const reader = readable.getReader();
// Setup MediaSource
await this.setupMediaSource();
// Start processing stream with backpressure
await this.processStream(reader);
}
async processStream(reader) {
try {
while (true) {
// Read with automatic backpressure
const { value, done } = await reader.read();
if (done) break;
// Parse fMP4 segment
const segment = this.parseSegment(value);
// Queue for MSE processing
this.queue.push(segment);
// Process queue with rate limiting
if (!this.isProcessing) {
this.processQueue();
}
}
} catch (error) {
console.error('Stream processing error:', error);
}
}
async processQueue() {
if (this.queue.length === 0 || this.isProcessing) {
return;
}
this.isProcessing = true;
while (this.queue.length > 0) {
const segment = this.queue.shift();
// Wait for source buffer to be ready
while (this.sourceBuffer.updating) {
await new Promise(r => setTimeout(r, 10));
}
// Manage buffer size (prevent overflow)
if (this.sourceBuffer.buffered.length > 0) {
const buffered = this.sourceBuffer.buffered;
const bufferSize = buffered.end(buffered.length - 1) - this.video.currentTime;
if (bufferSize > 10) { // 10 seconds max
// Remove old buffer to make room
const removeEnd = this.video.currentTime - 2;
if (removeEnd > buffered.start(0)) {
this.sourceBuffer.remove(buffered.start(0), removeEnd);
await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
}
}
}
// Append segment
try {
this.sourceBuffer.appendBuffer(segment);
await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
} catch (error) {
console.error('Buffer append error:', error);
// Quota exceeded - clear buffer
if (error.name === 'QuotaExceededError') {
await this.clearOldBuffer();
}
}
}
this.isProcessing = false;
}
async clearOldBuffer() {
const buffered = this.sourceBuffer.buffered;
if (buffered.length === 0) return;
const currentTime = this.video.currentTime;
const removeEnd = Math.max(buffered.start(0), currentTime - 5);
if (removeEnd > buffered.start(0)) {
this.sourceBuffer.remove(buffered.start(0), removeEnd);
await new Promise(r => this.sourceBuffer.addEventListener('updateend', r, { once: true }));
}
}
}
┌──────────────┐ ┌──────────────────────────────────────────┐ ┌─────────────┐
│ FFmpeg │ │ MediaMTX + MoQ │ │ Browser │
│ │ │ │ │ │
│ H.264/AAC │─────▶│ ┌────────────────────────────────────┐ │ │ WebSocket/ │
│ Encoder │ RTMP │ │ Stream Processing Pipeline │ │ │ WebSocket- │
│ │ │ ├────────────────────────────────────┤ │ │ Stream │
└──────────────┘ │ │ 1. RTMP Input Handler │ │ │ │
│ │ 2. H.264/AAC Demuxer │ │─────▶│ MSE │
│ │ 3. fMP4 Segmenter (3 frames) │ │ WS │ Player │
│ │ 4. RTWebSocket Protocol Layer │ │ │ │
│ │ 5. WebSocket Transport │ │ └─────────────┘
│ └────────────────────────────────────┘ │
│ │
│ Parallel Transports: │
│ - :4443 WebTransport Raw (MoQ) │
│ - :4444 Native QUIC (MoQ) │
│ - :4445 WebTransport fMP4 (MoQ) │
│ - :4446 WebSocket fMP4 (RTWebSocket) │
└──────────────────────────────────────────┘
// internal/servers/websocket/handler.go
type WebSocketHandler struct {
pathManager *core.PathManager
parent *core.Core
server *http.Server
upgrader websocket.Upgrader
// RTWebSocket specific
segmenter *fmp4.Segmenter
frameBuffer [][]byte
framesPerSegment int // Default: 3
}
func (h *WebSocketHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Extract stream path from URL
pathName := strings.TrimPrefix(r.URL.Path, "/ws/")
// Upgrade to WebSocket
conn, err := h.upgrader.Upgrade(w, r, nil)
if err != nil {
return
}
// Check for WebSocketStream support
if r.Header.Get("Sec-WebSocket-Protocol") == "websocketstream" {
// Use WebSocketStream protocol
session := &WebSocketStreamSession{
conn: conn,
handler: h,
pathName: pathName,
}
go session.run()
} else {
// Use RTWebSocket protocol
session := &RTWebSocketSession{
conn: conn,
handler: h,
pathName: pathName,
}
go session.run()
}
}
// internal/servers/websocket/segmenter.go
func (s *Segmenter) CreateSegment(frames []*unit.H264) ([]byte, error) {
samples := make([]*fmp4.Sample, 0, len(frames))
for i, frame := range frames {
// Determine frame type
isKeyframe := false
for _, nalu := range frame.AU {
nalType := nalu[0] & 0x1F
if nalType == 5 { // IDR
isKeyframe = true
break
}
}
// Convert to fMP4 sample
sample := &fmp4.Sample{
Duration: 90000 / 30, // 30fps
Size: uint32(len(frame.Data)),
Flags: 0,
CompositionTime: 0,
Data: frame.Data,
}
if isKeyframe {
sample.Flags |= fmp4.SampleFlagIsNonSyncSample
}
samples = append(samples, sample)
}
// Create fMP4 segment
segment := &fmp4.Part{
SequenceNumber: s.sequenceNumber,
Tracks: []*fmp4.PartTrack{{
ID: 1, // Video track
BaseTime: s.baseTime,
Samples: samples,
}},
}
s.sequenceNumber++
s.baseTime += uint64(len(samples)) * (90000 / 30)
// Encode to bytes
var buf bytes.Buffer
err := segment.Marshal(&buf)
return buf.Bytes(), err
}
Let's talk about something that's been bothering us. We're wrapping H.264 NAL units in fMP4 containers, then sending them over WebSocket, then unwrapping them in the browser. But why? The browser's Media Source Extensions can handle raw H.264 Annex B format. We're adding complexity and latency for... tradition?
fMP4 made sense for HLS and DASH - you need seekable segments for adaptive streaming. But for real-time streaming where we're sending 3-frame chunks? It's overhead. Each fMP4 segment has moof/mdat boxes, track headers, sample tables. That's hundreds of bytes of metadata for every 100KB of video. At 30fps, that's megabytes of waste per minute.
We tested sending raw NAL units with simple 4-byte length prefixes. Latency dropped by 50-100ms. Bandwidth usage decreased by 3-5%. CPU usage on both server and client went down. The only reason we kept fMP4 was compatibility - some MSE implementations are picky about raw H.264. But for a pure RTWebSocket implementation? Raw NAL units make more sense.
Similar to our MoQ implementation, MediaMTX returns a stream object before it's fully initialized:
// The same timing issue we found in MoQ!
func (s *RTWebSocketSession) setupStream() error {
res, err := s.handler.pathManager.AddReader(PathAddReaderReq{
Author: s,
PathName: s.pathName,
})
if err != nil {
return err
}
// CRITICAL: Wait for stream.Desc to be populated
for attempts := 0; attempts < 50; attempts++ {
if res.Stream.Desc != nil {
break
}
time.Sleep(100 * time.Millisecond)
}
if res.Stream.Desc == nil {
return fmt.Errorf("stream not ready after 5 seconds")
}
// Now safe to setup codec parameters
return s.setupCodecs(res.Stream)
}
MediaSource Extensions have complex buffer requirements:
class MSEBufferManager {
constructor(sourceBuffer, video) {
this.sourceBuffer = sourceBuffer;
this.video = video;
this.maxBufferSize = 30; // seconds
this.targetBufferSize = 10; // seconds
this.minBufferSize = 2; // seconds
}
async manageBuffer() {
if (!this.sourceBuffer.buffered.length) return;
const buffered = this.sourceBuffer.buffered;
const currentTime = this.video.currentTime;
// Calculate buffer health
const bufferEnd = buffered.end(buffered.length - 1);
const bufferStart = buffered.start(0);
const bufferAhead = bufferEnd - currentTime;
const bufferBehind = currentTime - bufferStart;
// Remove old buffer to prevent memory issues
if (bufferBehind > this.maxBufferSize / 2) {
const removeEnd = currentTime - this.minBufferSize;
if (removeEnd > bufferStart) {
await this.removeBuffer(bufferStart, removeEnd);
}
}
// Handle quota exceeded
if (bufferAhead > this.maxBufferSize) {
// Clear future buffer that's too far ahead
const removeStart = currentTime + this.targetBufferSize;
if (removeStart < bufferEnd) {
await this.removeBuffer(removeStart, bufferEnd);
}
}
}
async removeBuffer(start, end) {
if (this.sourceBuffer.updating) {
await new Promise(r =>
this.sourceBuffer.addEventListener('updateend', r, { once: true })
);
}
try {
this.sourceBuffer.remove(start, end);
await new Promise(r =>
this.sourceBuffer.addEventListener('updateend', r, { once: true })
);
} catch (error) {
console.error('Buffer remove error:', error);
}
}
}
// WebSocket Binary Message Format
// We use a simple TLV (Type-Length-Value) format
const MessageTypes = {
INIT: 0x01, // Codec initialization
SEGMENT: 0x02, // fMP4 segment
KEYFRAME: 0x03, // Keyframe notification
AUDIO: 0x04, // Audio segment
METADATA: 0x05, // Stream metadata
CONTROL: 0x06, // Flow control
};
class BinaryProtocol {
static encodeMessage(type, data) {
const header = new ArrayBuffer(5);
const view = new DataView(header);
view.setUint8(0, type);
view.setUint32(1, data.byteLength, false); // big-endian
// Combine header and data
const message = new Uint8Array(header.byteLength + data.byteLength);
message.set(new Uint8Array(header), 0);
message.set(new Uint8Array(data), header.byteLength);
return message.buffer;
}
static decodeMessage(buffer) {
const view = new DataView(buffer);
const type = view.getUint8(0);
const length = view.getUint32(1, false);
const data = buffer.slice(5, 5 + length);
return { type, data };
}
static createInitMessage(codecString, width, height) {
const encoder = new TextEncoder();
const codecBytes = encoder.encode(codecString);
const buffer = new ArrayBuffer(codecBytes.length + 8);
const view = new DataView(buffer);
view.setUint16(0, width, false);
view.setUint16(2, height, false);
view.setUint32(4, codecBytes.length, false);
new Uint8Array(buffer).set(codecBytes, 8);
return this.encodeMessage(MessageTypes.INIT, buffer);
}
}
// Critical for smooth playback
class TimestampManager {
constructor() {
this.baseTime = null;
this.lastTimestamp = null;
this.driftCorrection = 0;
this.fps = 30;
this.frameDuration = 1000 / this.fps;
}
processTimestamp(rtpTimestamp) {
// Convert RTP timestamp to milliseconds
const msTimestamp = rtpTimestamp / 90; // 90kHz clock
if (!this.baseTime) {
this.baseTime = performance.now() - msTimestamp;
this.lastTimestamp = msTimestamp;
return msTimestamp;
}
// Detect and handle timestamp jumps
const expectedTimestamp = this.lastTimestamp + this.frameDuration;
const drift = msTimestamp - expectedTimestamp;
if (Math.abs(drift) > 1000) {
// Large jump - reset base time
console.warn('Timestamp jump detected:', drift);
this.baseTime = performance.now() - msTimestamp;
} else if (Math.abs(drift) > 50) {
// Small drift - apply correction
this.driftCorrection += drift * 0.1; // Smooth correction
}
this.lastTimestamp = msTimestamp;
return msTimestamp + this.driftCorrection;
}
getCurrentPlaybackTime() {
if (!this.baseTime) return 0;
return performance.now() - this.baseTime;
}
}
| Component | Latency | Notes |
|---|---|---|
| Encoding (FFmpeg) | 100-200ms | x264 ultrafast preset |
| Frame Batching | 100ms | 3 frames @ 30fps |
| Network (WebSocket) | 10-50ms | TCP overhead included |
| fMP4 Segmentation | 5-10ms | Server-side processing |
| MSE Buffering | 500-1000ms | Browser requirement |
| Decode & Render | 16-33ms | Hardware accelerated |
| Total | 1-2 seconds | End-to-end |
Performance Comparison (1080p @ 30fps stream):
Traditional WebSocket:
┌────────────────────────────────────────┐
│ Memory Usage Over Time │
│ │
│ 500MB │ ████████ │ ← Buffer bloat!
│ 400MB │ █████ │
│ 300MB │ █████ │
│ 200MB │ █████ │
│ 100MB │ █████ │
│ 0MB └────────────────────────────────┘
│ 0 30s 60s 90s 120s │
└────────────────────────────────────────┘
WebSocketStream:
┌────────────────────────────────────────┐
│ Memory Usage Over Time │
│ │
│ 500MB │ │
│ 400MB │ │
│ 300MB │ │
│ 200MB │ │
│ 100MB │ ██████████████████████████████ │ ← Stable!
│ 0MB └────────────────────────────────┘
│ 0 30s 60s 90s 120s │
└────────────────────────────────────────┘
class AdaptiveBufferController {
constructor(player) {
this.player = player;
this.targetLatency = 2.0; // seconds
this.minLatency = 1.0;
this.maxLatency = 5.0;
this.adjustmentRate = 0.1;
}
updatePlaybackRate() {
const buffered = this.player.video.buffered;
if (buffered.length === 0) return;
const currentTime = this.player.video.currentTime;
const bufferEnd = buffered.end(buffered.length - 1);
const latency = bufferEnd - currentTime;
// Adjust playback rate to maintain target latency
if (latency > this.targetLatency + 0.5) {
// Speed up to reduce latency
this.player.video.playbackRate = Math.min(1.5, 1 + this.adjustmentRate);
} else if (latency < this.targetLatency - 0.5) {
// Slow down to build buffer
this.player.video.playbackRate = Math.max(0.5, 1 - this.adjustmentRate);
} else {
// Return to normal
this.player.video.playbackRate = 1.0;
}
// Emergency measures
if (latency > this.maxLatency) {
// Jump forward to reduce latency
this.player.video.currentTime = bufferEnd - this.targetLatency;
} else if (latency < this.minLatency && this.player.video.paused) {
// Resume if we have enough buffer
this.player.video.play();
}
}
}
// Fallback chain for maximum compatibility
class StreamingClient {
async connect(url) {
// Try WebSocketStream first (best performance)
if (typeof WebSocketStream !== 'undefined') {
try {
return await this.connectWebSocketStream(url);
} catch (e) {
console.warn('WebSocketStream failed, trying RTWebSocket');
}
}
// Try RTWebSocket (our custom protocol)
if (typeof WebSocket !== 'undefined') {
try {
return await this.connectRTWebSocket(url);
} catch (e) {
console.warn('RTWebSocket failed, trying standard WebSocket');
}
}
// Fall back to standard WebSocket
try {
return await this.connectStandardWebSocket(url);
} catch (e) {
console.error('All WebSocket methods failed');
throw e;
}
}
}
Here's the key insight: RTWebSocket shouldn't be viewed as a fallback for when MoQ fails. It's the core protocol layer that makes ALL our transports work reliably for streaming. Every transport - whether it's WebSocket, QUIC, or WebTransport - needs the same application-level semantics: flow separation, prioritization, acknowledgments, graceful degradation.
Our unified streaming architecture with RTWebSocket at its core:
This isn't multiple protocols - it's one protocol (RTWebSocket) with transport-specific optimizations. The streaming logic stays the same whether you're on Chrome with WebTransport or Safari with WebSocket. That's the power of treating RTWebSocket as the core, not the fallback.
// WebSocket streaming may need a dedicated buffer server
// due to different requirements than MoQ
type BufferServer struct {
// WebSocket needs longer buffers for MSE
videoRingBuffer *RingBuffer // 5-10 seconds
audioRingBuffer *RingBuffer // 5-10 seconds
// MoQ needs minimal buffering
moqVideoBuffer *RingBuffer // 100-200ms
moqAudioBuffer *RingBuffer // 100-200ms
}
// Potential architecture:
// MediaMTX Core -> Buffer Server -> WebSocket Clients
// \-> MoQ Server -> MoQ Clients
# mediamtx.yml additions for WebSocket support
webSocket: yes
webSocketAddress: :4446
webSocketProtocol: rtwebsocket # or 'standard' or 'stream'
# Buffer management
webSocketBufferSize: 5s # MSE requirement
webSocketSegmentSize: 3 # frames per segment
webSocketMaxConnections: 100
# Integration with existing MoQ
moq: yes
moqFallbackToWebSocket: yes # Auto-fallback
moqUnifiedPort: 443 # Single port with protocol detection
// Proposed unified transport detection
func (s *Server) handleConnection(w http.ResponseWriter, r *http.Request) {
// Detect client capabilities
if r.Header.Get("Sec-WebSocket-Version") != "" {
// WebSocket connection
if r.Header.Get("Sec-WebSocket-Protocol") == "rtwebsocket" {
s.handleRTWebSocket(w, r)
} else {
s.handleStandardWebSocket(w, r)
}
} else if r.ProtoMajor == 3 {
// HTTP/3 - could be WebTransport
if r.Header.Get("Sec-WebTransport") != "" {
s.handleWebTransport(w, r)
}
} else {
// Regular HTTP - serve player page
s.servePlayerHTML(w, r)
}
}
// Future: Tunnel WebTransport through WebSocket for firewall traversal
class WebTransportTunnel {
constructor(websocketUrl) {
this.ws = new WebSocket(websocketUrl);
this.streams = new Map();
}
async createWebTransport() {
// Negotiate CONNECT-UDP through WebSocket
await this.negotiateTunnel();
// Create virtual WebTransport over WebSocket
return new VirtualWebTransport(this.ws);
}
async negotiateTunnel() {
// Send CONNECT-UDP request
this.ws.send(JSON.stringify({
method: 'CONNECT-UDP',
protocol: 'webtransport',
version: '1'
}));
// Wait for acceptance
return new Promise((resolve, reject) => {
this.ws.onmessage = (event) => {
const response = JSON.parse(event.data);
if (response.status === 'accepted') {
resolve();
} else {
reject(new Error(response.error));
}
};
});
}
}
Here's our vision: RTWebSocket isn't just another transport option. It's the core protocol that unifies ALL our streaming transports. Think of it as the application layer that provides consistent streaming semantics regardless of the underlying transport:
One protocol implementation, multiple transport optimizations. The client and server negotiate the best available transport, but the streaming logic remains identical. This means:
RTWebSocket becomes the lingua franca of real-time streaming - the protocol that finally brings order to the chaos of browser-based video delivery.
| Protocol | Latency | Browser Support | Reality Check |
|---|---|---|---|
| MoQ (WebTransport) | 200-300ms | Chrome/Edge only | Dead on arrival without Safari |
| RTWebSocket | 1-2 seconds | All browsers | The pragmatic choice |
| WebRTC | 500ms-1s | All browsers | Complex STUN/TURN setup |
| LL-HLS | 3-5 seconds | All browsers | Better than nothing |
| HLS | 10-30 seconds | All browsers | Legacy, but reliable |
The verdict: 1-second latency that works everywhere beats 200ms that only Chrome users can see.
WebSocket streaming isn't perfect, but it works everywhere. While we continue pushing the boundaries with Media over QUIC (draft-ietf-moq-transport-05) and WebTransport (W3C WebTransport API), RTWebSocket over WebSocket provides a reliable solution that ensures no viewer is left behind. The combination of our MoQ implementation for cutting-edge browsers and RTWebSocket for universal compatibility gives us the best of both worlds.
This work represents our commitment to pragmatic streaming solutions. We're not waiting for perfect standards or universal browser support. We're building what works today while preparing for tomorrow. The RTWebSocket protocol and WebSocketStream implementation may not be standards, but they solve real problems for real users.
Our MediaMTX contributions will continue to evolve. Whether these WebSocket experiments become part of the mainline codebase or remain as reference implementations, they demonstrate our commitment to advancing open-source streaming technology.
Experience both WebSocket streaming implementations: