Everyone knows HLS means 10-30 second latency. It's just accepted as fact in the streaming world. Apple says 6 second segments are the minimum, players need three segments to start, add some CDN caching, and boom - you're looking at your stream from half a minute in the past. But what if I told you we got HLS down to 900 milliseconds glass-to-glass?
At WINK Streaming, we've been experimenting with pushing HLS to its absolute limits. Not because we think ultra-low-latency HLS is the future (spoiler: it's probably not), but because understanding these limits teaches us valuable lessons about streaming architecture that apply to next-generation protocols like Media over QUIC.
* iOS occasionally spikes to 2+ seconds when it decides to rebuffer for no apparent reason
Let me be clear: these ARE real numbers from our testing. We used a simple clock overlay on the source video and compared it to what appeared on screen. Old school, but effective. Here's the FFmpeg command that started it all:
# The magic FFmpeg string that started our journey
# Note the aggresive keyframe settings - this is crucial
ffmpeg -f lavfi -re -i testsrc2=size=640x360:rate=30 \
-vf drawtext=text='%{localtime\:%T.%3N}':x=10:y=10:fontsize=24:fontcolor=white:box=1:boxcolor=black \
-c:v libx264 \
-preset ultrafast \
-tune zerolatency \
-g 6 \ # Keyframe every 6 frames (200ms at 30fps)
-keyint_min 6 \ # MINIMUM keyframe interval - no negotiation
-sc_threshold 0 \ # Disable scene detection - we control keyframes
-b:v 1500k \
-maxrate 1500k \
-bufsize 750k \
-pix_fmt yuv420p \
-f mpegts \
udp://127.0.0.1:1234?pkt_size=1316
The cornerstone of our approach was using 200 millisecond parts. Yes, you read that right - 200ms. That's five parts per second, 5 parts per segment. Most people would say that's insane. They'd be right. But it works.
// This is where the magic happens - 200ms parts
const (
PartDuration = 200 * time.Millisecond // Crazy? Yes. Works? Also yes.
SegmentDuration = 1 * time.Second
PartsPerSegment = 5
)
// Part structure - simple but efective
type Part struct {
SequenceNumber int
PartNumber int
Data []byte
Duration time.Duration
Independent bool // Is this a keyframe? Critical for iOS
}
Why 200ms? We tried 100ms first. Players hated it. Safari literally refused to play. Chrome stuttered like a broken record. 150ms was slightly better but still unstable. 200ms was the sweet spot where players said "okay, this is stupid, but I'll try."
Here's the thing about ultra-low latency: every millisecond counts. We couldn't use off-the-shelf libraries because they all assumed reasonable use cases. We were being unreasonable. So we built our own everything.
First challenge: getting data from the network into memory FAST. Like, really fast. Lock-free fast.
type RingBuffer struct {
buffer []byte
size uint64
writePos uint64 // Atomic write position - no locks!
readPos uint64 // Atomic read position
}
func (rb *RingBuffer) Write(data []byte) error {
dataLen := uint64(len(data))
writePos := atomic.LoadUint64(&rb.writePos)
// Check if we have space - this better be true or we're screwed
readPos := atomic.LoadUint64(&rb.readPos)
available := rb.size - (writePos - readPos)
if dataLen > available {
// Uh oh, we're falling behind. This is bad.
return fmt.Errorf("buffer overflow: need %d, have %d", dataLen, available)
}
// Copy data - handle wrap-around because rings gonna ring
startIdx := writePos % rb.size
endIdx := (writePos + dataLen) % rb.size
if endIdx > startIdx {
copy(rb.buffer[startIdx:endIdx], data)
} else {
// Wrap around - split the copy
firstPart := rb.size - startIdx
copy(rb.buffer[startIdx:], data[:firstPart])
copy(rb.buffer[:endIdx], data[firstPart:])
}
// Update atomically - this is the secret sauce
atomic.AddUint64(&rb.writePos, dataLen)
return nil
}
This ring buffer was CRITICAL. We're talking about processing packets every few milliseconds. A single mutex lock would add precious microseconds. Microseconds add up to milliseconds. Milliseconds add up to "why is my stream delayed?"
You know what's fun? Parsing H.264 NAL units in real-time to detect keyframes. Said no one ever. But we had to do it because marking parts as "independent" is crucial for player compatibility, especially on iOS.
func (p *H264Parser) Parse(data []byte) []NALUnit {
p.buffer = append(p.buffer, data...)
nalUnits := []NALUnit{}
// Find start codes - the eternal struggle
// Could be 0x00 0x00 0x00 0x01 OR 0x00 0x00 0x01
// Because standars are more like guidelines
i := 0
for i < len(p.buffer)-4 {
if p.buffer[i] == 0 && p.buffer[i+1] == 0 {
startCodeLen := 0
if p.buffer[i+2] == 0 && p.buffer[i+3] == 1 {
startCodeLen = 4 // Long start code
} else if p.buffer[i+2] == 1 {
startCodeLen = 3 // Short start code
}
if startCodeLen > 0 {
nalStart := i + startCodeLen
nalType := p.buffer[nalStart] & 0x1F
// The golden question: is this a keyframe?
switch nalType {
case 5: // IDR frame - JACKPOT!
nal.IsIDR = true
case 7: // SPS
nal.IsSPS = true
case 8: // PPS
nal.IsPPS = true
}
// ... more parsing code ...
}
}
}
return nalUnits
}
HLS playlists are supposed to be simple. Ours... weren't. We had to use every Low-Latency HLS extension Apple ever invented, plus some creative interpretation of the spec.
#EXTM3U
#EXT-X-VERSION:9
#EXT-X-TARGETDURATION:1
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=0.4,CAN-SKIP-UNTIL=6.0
#EXT-X-PART-INF:PART-TARGET=0.200
#EXT-X-MEDIA-SEQUENCE:1045
# Here's where it gets interesting - incomplete segments with parts
#EXT-X-PART:DURATION=0.200,URI="/p/1050/0.m4s",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.200,URI="/p/1050/1.m4s"
#EXT-X-PART:DURATION=0.200,URI="/p/1050/2.m4s"
#EXT-X-PART:DURATION=0.200,URI="/p/1050/3.m4s"
#EXT-X-PART:DURATION=0.200,URI="/p/1050/4.m4s",INDEPENDENT=YES
# This segment is still being created!
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="/p/1051/0.m4s"
# Complete segments from the past
#EXTINF:1.000,
/s/1045.m4s
#EXTINF:1.000,
/s/1046.m4s
#EXTINF:1.000,
/s/1047.m4s
#EXTINF:1.000,
/s/1048.m4s
#EXTINF:1.000,
/s/1049.m4s
# Tell the player what's available elsewhere for fast switching
#EXT-X-RENDITION-REPORT:URI="/index.m3u8",LAST-MSN=1044,LAST-PART=4
This playlist updates FIVE TIMES PER SECOND. That's not a typo. Every 200ms, a new part appears. The player has to keep up or get left behind.
Here's where things get interesting (read: frustrating). Different platforms handled our ultra-low-latency stream VERY differently.
Platform | Avg Latency | Stability | Quirks |
---|---|---|---|
Chrome + hls.js | 900ms | Excellent | Just works. Seriously. No drama. |
Safari (macOS) | 1100ms | Good | Occasionally decides to buffer for "safety" |
Safari (M1/M2 iPad) | 1100ms | Excellent | Desktop-class processor handles it perfectly |
Safari (A-series iPad) | 1200ms | Good | Stable but occasional hiccups on older models |
Safari (iPhone) | 1400ms | Poor | Random stops, refuses to start, general chaos |
VLC | 950ms | Good | Old reliable. Minimal config needed. |
iPhones were our nemesis. There's a pattern we noticed: the same stream that plays perfectly on an iPad will randomly fail on an iPhone. Same iOS version, same Safari, different behavior. But here's the kicker - M1 and M2 iPads? Zero issues. None. They just chewed through 200ms parts like it was nothing.
Our theory? It's not just about power saving - it's about processing headroom. The M-series chips have so much spare capacity that they can handle the constant playlist updates and rapid part fetching without breaking a sweat. iPhones, especially older ones with A14/A15 chips, are probably hitting some internal threshold where iOS says "this is using too much CPU for video playback" and throttles the requests. The M-series iPads? They're basically desktop processors masquerading as tablet chips. They don't even notice the load.
We actually tested this theory: M1 iPad Pro played for 6 hours straight, rock solid 1100ms latency. iPhone 14? Started stuttering after 20 minutes. iPhone 12? Gave up after 5 minutes and just... stopped requesting parts. No error, no warning, just silence.
// iPhone-specific workarounds we had to implement
func (s *Server) GenerateIOSPlaylist() string {
// Increase part hold back for iOS - they need more time apparently
partHoldBack := PartDuration.Seconds() * 3 // 600ms instead of 400ms
// Add more historical segments because iOS likes to seek backwards
// No idea why. It just does.
historicalSegments := 10 // vs 5 for everyone else
// Include ALL parts, even incomplete ones
// iOS handles missing parts better than gaps in the sequence
// Again, no idea why. Welcome to iOS development.
// ... rest of playlist generation ...
}
You want to know what kills ultra-low-latency HLS faster than anything? CDNs. Traditional CDN architecture is the antithesis of everything we're trying to achieve.
Here's what breaks:
Cache-Control: no-cache
, edge servers still add 50-200ms of latency.We tried every trick in the book:
# Nginx config attempting to disable ALL caching
location ~ \.m3u8$ {
proxy_cache off; # Please don't cache
proxy_cache_bypass $http_pragma; # Really, don't cache
proxy_cache_revalidate on; # If you cached, check again
proxy_cache_valid 200 0s; # 0 seconds = don't cache
add_header Cache-Control "no-cache, no-store, must-revalidate"; # DONT CACHE
add_header X-Accel-Expires 0; # Nginx, please don't cache
# Spoiler: It still cached sometimes
}
Let me be brutally honest: this is NOT production ready. Not even close. Here's what would need to happen to make this real:
Fix the 10,000 edge cases. Handle network failures. Detect encoder problems. Stop iOS from randomly dying.
Multiple bitrates. Quality switching. Bandwidth detection. You know, the basics that every player expects.
One server handling 10 streams is cute. Try 10,000 concurrent viewers. Horizontal scaling. Load balancing. Distributed storage.
Monitoring. Analytics. Alerting. DRM (ugh). Support tickets from users whose 2015 Android phone won't play the stream.
Realistically? You're looking at a team of 3-5 engineers working for 6 months to make this production-worthy. And even then, you're fighting an uphill battle against player quirks, CDN limitations, and the fundamental reality that HLS was never designed for this.
For those brave enough to try this themselves, here's the server setup that achieved our best results:
package main
import (
"bytes"
"encoding/binary"
"fmt"
"log"
"net"
"net/http"
"sync"
"sync/atomic"
"time"
)
func main() {
log.Println("🚀 WINK Streaming Ultra-Low-Latency HLS")
log.Println(" Target: sub-1-second glass-to-glass")
log.Println(" Reality: 900-1400ms depending on player")
log.Println(" iPhone: Good luck with that")
server := NewServer()
// Start all the goroutines - this is where the magic happens
go server.receiveUDP() // Ingest from encoder
go server.demuxPipeline() // Parse MPEG-TS
go server.processFrames() // Detect keyframes, extract H.264
go server.createSegments() // Package into fMP4
go server.cleanupOldSegments() // Don't run out of RAM
// HTTP server for playlist and segments
http.HandleFunc("/index.m3u8", server.handlePlaylist)
http.HandleFunc("/p/", server.handlePart) // Parts endpoint
http.HandleFunc("/s/", server.handleSegment) // Segments endpoint
http.HandleFunc("/init.mp4", server.handleInit) // Init segment
log.Fatal(http.ListenAndServe(":8445", nil))
}
The blocking playlist request handler is where the Low-Latency magic happens:
func (s *Server) handlePlaylist(w http.ResponseWriter, r *http.Request) {
// Parse Apple's special query parameters
msnStr := r.URL.Query().Get("_HLS_msn") // Media Sequence Number
partStr := r.URL.Query().Get("_HLS_part") // Part number
if msnStr != "" && partStr != "" {
// Player is requesting a specific part that doesn't exist yet
// BLOCK until it's ready (this is the secret)
msn, _ := strconv.Atoi(msnStr)
part, _ := strconv.Atoi(partStr)
// Poll every 50ms - aggressive but necessary
ticker := time.NewTicker(50 * time.Millisecond)
defer ticker.Stop()
timeout := time.After(5 * time.Second) // Don't block forever
for {
select {
case <-timeout:
// Give up and return current playlist
w.Write([]byte(s.GeneratePlaylist()))
return
case <-ticker.C:
// Check if requested part exists now
if s.hasPartReady(msn, part) {
// YES! Send it immediately
w.Header().Set("Cache-Control", "no-cache")
w.Write([]byte(s.GeneratePlaylist()))
return
}
// Not ready yet, keep waiting...
}
}
}
// Non-blocking request - just return current playlist
w.Write([]byte(s.GeneratePlaylist()))
}
When you're operating at these time scales, every operation counts. Here's what we measured:
Operation | Time (μs) | Impact |
---|---|---|
UDP packet receive | 5-10 | Negligible |
Ring buffer write | 0.5-1 | Negligible |
MPEG-TS parsing | 20-50 | Moderate |
H.264 NAL parsing | 100-200 | Significant |
fMP4 packaging | 50-100 | Moderate |
HTTP response | 500-1000 | Major |
Add these up and you're looking at roughly 1-2ms per part just in processing. Over 5 parts per second, that's 5-10ms of pure processing overhead. Not terrible, but when you're targeting sub-second latency, every millisecond is precious.
Apple's official Low-Latency HLS specification is... interesting. Let's look at what they actually recommend:
From Apple's LL-HLS documentation:
"A partial segment duration SHOULD be between 0.2 and 0.5 seconds."
"The server SHOULD produce partial segments with similar durations."
"Clients MAY use blocking playlist reload."
Notice all those SHOULDs and MAYs? That's not a specification - that's a suggestion. The loose language means every player implements it differently. Apple continues:
"For lowest latency, partial segment durations SHOULD be as short as possible while still being efficient."
What does "as short as possible" mean? What's "efficient"? We took it literally - 200ms is as short as possible before players completely break. Apple probably meant 500ms or more, but they didn't say that. The spec also states:
"The PART-HOLD-BACK attribute specifies the minimum distance from the end of the playlist at which clients SHOULD begin playback... A value of 3.0 or more is RECOMMENDED."
Three seconds of hold-back defeats the entire purpose of low latency! We used 0.4 seconds (2 parts). Did it break the spec? Technically no - it says SHOULD, not MUST. This vague language is why iOS Safari behaves differently than macOS Safari, why VLC works differently than FFplay, and why we could push the boundaries this far.
The base HLS RFC 8216 is even more permissive. It doesn't even mention parts or low latency - that's all in Apple's draft extensions that still aren't finalized after 3+ years. We're operating in the wild west of streaming specifications here.
Look, I'll be honest with you. Ultra-low-latency HLS is a neat party trick, but it's not the future of live streaming. Here's why:
"Just because you CAN do something doesn't mean you SHOULD do something. Ultra-low-latency HLS is a perfect example of this principle."
- Me, after the fifteenth iPhone crash
So if ultra-low-latency HLS isn't the answer, what is? At WINK Streaming, we believe it's Media over QUIC (MoQ). Here's why:
The only problem? WebTransport support. Safari STILL doesn't support it. Given that Safari represents about 30% of users (and nearly 100% of iOS users), that's a problem. But assuming Apple eventually gets on board (big assumption, I know), MoQ could deliver sub-300ms latency without all the HLS hacks.
Check out our MoQ work at WINK MoQ Implementation Guide. We're betting big on it being the future, assuming we can all forget about UDP firewall issues and Safari's reluctance to implement anything new.
After months of banging our heads against walls (both metaphorical and literal), here's what we learned:
If you're asking whether you should implement ultra-low-latency HLS in production, the answer is probably no. Use WebRTC if you need sub-second latency today. It's mature, well-supported, and actually designed for this use case.
But if you're asking whether you should experiment with pushing protocols to their limits to learn how streaming really works under the hood? Absolutely. You'll learn more about video streaming in a week of ultra-low-latency HLS development than in a year of normal development.
I've been verbose enough. Here's a complete working example that achieves sub-second latency. Use at your own risk:
#!/usr/bin/env go
// ultrallhls.go - Because sometimes you need to break things to understand them
// WINK Streaming - January 2025
//
// WARNING: This code is experimental. It will crash. It will lose data.
// It will make you question your life choices. You've been warned.
package main
import (
"bytes"
"encoding/binary"
"fmt"
"log"
"net"
"net/http"
"strconv"
"strings"
"sync"
"sync/atomic"
"time"
)
const (
PartDuration = 200 * time.Millisecond // The magic number
SegmentDuration = 1 * time.Second
PartsPerSegment = 5
RingBufferSize = 10 * 1024 * 1024 // 10MB - adjust based on bitrate
)
type Server struct {
segments map[int]Segment
segmentsMu sync.RWMutex
lastSegNum int
lastPartNum int
ringBuffer *RingBuffer
frameQueue chan Frame
}
type Segment struct {
Parts []Part
Complete bool
Duration time.Duration
}
type Part struct {
Data []byte
Duration time.Duration
Independent bool // Keyframe = independent = iOS happiness
}
func NewServer() *Server {
return &Server{
segments: make(map[int]Segment),
frameQueue: make(chan Frame, 100),
ringBuffer: NewRingBuffer(RingBufferSize),
}
}
func (s *Server) Start() {
// The gang's all here - each goroutine does ONE thing
go s.receiveUDP() // Network -> Ring Buffer
go s.demuxPipeline() // Ring Buffer -> Frames
go s.processFrames() // Frames -> Parts
go s.createSegments() // Parts -> Segments
go s.cleanupOldSegments() // Segments -> /dev/null (eventually)
// HTTP endpoints - where the magic is delivered
http.HandleFunc("/index.m3u8", s.handlePlaylist)
http.HandleFunc("/init.mp4", s.handleInit)
http.HandleFunc("/p/", s.handlePart) // Parts: /p/{seg}/{part}.m4s
http.HandleFunc("/s/", s.handleSegment) // Segments: /s/{seg}.m4s
log.Println("[SERVER] Listening on :8445")
log.Println("[SERVER] Playlist: http://localhost:8445/index.m3u8")
log.Println("[SERVER] Target latency: <1000ms")
log.Println("[SERVER] Actual latency: ¯\\_(ツ)_/¯")
log.Fatal(http.ListenAndServe(":8445", nil))
}
func (s *Server) receiveUDP() {
conn, err := net.ListenUDP("udp", &net.UDPAddr{Port: 1234})
if err != nil {
log.Fatal("[UDP] Failed to listen:", err)
}
buffer := make([]byte, 1500) // MTU-sized buffer
for {
n, _, err := conn.ReadFromUDP(buffer)
if err != nil {
log.Printf("[UDP] Read error: %v", err)
continue
}
// Straight to the ring buffer - no time to waste!
if err := s.ringBuffer.Write(buffer[:n]); err != nil {
log.Printf("[RING] Overflow! We're too slow: %v", err)
}
}
}
// ... Rest of implementation ...
// Full code at: github.com/winkstreaming/ultra-ll-hls-experiment
func main() {
server := NewServer()
server.Start()
}
Here's the thing: this actually worked. Like, really worked. We achieved sub-second latency with HLS - something the industry said was impossible. Safari on macOS? Solid 1100ms. iPads (even the non-M series)? Rock solid. Chrome with hls.js? A beautiful 900ms glass-to-glass. I streamed this on my tablet over terrible WiFi and it still held up.
The only real hiccup? iPhones. And honestly, that might not even be our problem to solve. The exact same code that runs flawlessly on an iPad struggles on an iPhone running the same iOS version. That's not a protocol limitation - that's either a performance threshold issue or just iOS being iOS. Different implementation priorities, different power management, who knows. We could file a bug report with Apple, but let's be realistic about how that usually goes.
But here's where it gets interesting: what if we adjusted our goals slightly? Instead of pushing for 200ms parts, what about 350ms? You'd still achieve around 1.5 seconds glass-to-glass latency. That's still a massive breakthrough for live broadcast events - we're talking about one-to-many streaming at near real-time speeds without any of the complexity, scaling nightmares, or firewall issues of WebRTC.
Think about it: 1.5 second latency for live sports, concerts, news broadcasts, all using standard HTTP infrastructure. No special servers, no TURN relays, no NAT traversal headaches. Just regular CDN-compatible HTTP streaming at latencies that would have been considered impossible two years ago.
This isn't the end of ultra-low-latency HLS development - it's just the end for now. The techniques work. The protocol can handle it. Most devices can play it. We've proven sub-second HLS is not only possible but practical. The fact that one specific device category has issues doesn't invalidate the achievement.
At WINK Streaming, we're going to keep pushing. Maybe 350ms parts for broader compatibility. Maybe dedicated iPhone optimizations. Maybe waiting for iOS 19 to magically fix whatever's broken. Or maybe we just accept that 900-1400ms latency using standard HLS is already a revolution, even if iPhones need a bit more time to catch up.
Because at the end of the day, we built something that works. And in the world of live streaming, "works most of the time on most devices" at sub-second latency is still lightyears ahead of where HLS was supposed to be.