
Key takeaways
• Digital video is a six-layer stack: sensor capture, frame structure, color space, codec, container, transport. Each layer has its own engineering decisions and cost implications.
• 2026 codec choices reset the math. H.264 still ships everywhere, HEVC dominates premium, AV1 is finally mainstream on web (Chrome, Firefox, Safari 17+) and saves 30–50% bitrate vs H.264 at the same quality.
• Frame rate, resolution, bit depth, and bitrate trade off against each other. A 4K 60fps HDR stream needs 4–5× more bandwidth than a 1080p 30fps SDR stream. Most product decisions live in the middle.
• Transport protocol is where buyer-facing decisions concentrate. WHIP for sub-second ingest, LL-HLS for sub-2 s playback, HLS/DASH for OTT, WebRTC for conferencing. Pick wrong and your latency floor breaks the use case.
• Use this article as a vendor-fluency check. If your candidate streaming partner can’t walk through these six layers in a scoping call, they’ll learn on your codebase — expensively.
Digital video sits underneath every product we build at Fora Soft — from BrainCert classroom streaming to TradeCaster sub-second trader broadcasts to CirrusMED HIPAA telehealth. Founders who understand the stack pick better vendors and ship better products. This article walks the layers in plain language and ties each one to a buyer-facing decision.
We’re Fora Soft. Since 2005 we’ve shipped 200+ multimedia products. The numbers and trade-offs below come from production builds running at scale.
Why Fora Soft wrote this digital video primer for 2026 buyers
Most digital video explainers stop at “a video is a series of frames.” Useful, but it doesn’t help a founder decide between H.264 and AV1, or between WebRTC and LL-HLS, or between Mux and a custom WHIP build. This article goes deeper on the layers that actually shape product decisions and tie each one to the trade-off the buyer faces.
Companion reads we maintain on this surface: the WebRTC architecture playbook, the SaaS vs custom streaming cost analysis, the live streaming platform development guide, and the low-latency streaming guide.
Need a partner who actually understands the stack you’re building?
Tell us your use case — OTT, conferencing, telehealth, trader, classroom — and we’ll walk through codec / protocol / transport choices in 30 minutes.
Analog vs digital video: the foundation
Real-world signals are continuous: light intensity at every point varies smoothly. Analog video stores that continuous signal as continuous magnetic patterns on tape. The result is rich but rigid — you can’t skip ahead, copy losslessly, or transmit over the internet without a digital intermediary.
Digital video samples the analog signal — in space (pixels) and time (frames) — and represents each sample as numbers. Once it’s numbers, you can compress it, copy it, transmit it, edit it, and analyse it programmatically. Every product we build assumes digital video by default; the only place analog still shows up is in legacy security cameras and broadcast infrastructure waiting to be replaced.
Frame rate: 24, 30, 60, 120 fps and what each is for
Frame rate is how many still images per second the video plays. 24 fps is the cinema standard. 30 fps is the broadcast TV default in North America (29.97 for legacy NTSC compatibility). 50 fps for European broadcast. 60 fps is the modern web/gaming/streaming standard. 90–120 fps for slow-motion playback, sports, and VR/AR.
For product decisions: most consumer video ships at 30 or 60 fps. Higher frame rate doubles bandwidth (rough rule). Live conferencing benefits from 30 fps; live sports needs 60. Slow-motion replay needs the source to be 90+ fps. WebRTC defaults to 30 fps but supports up to 60.
Pixels and resolution: 720p, 1080p, 4K, 8K
A pixel is a single point of colour in the frame. Resolution is the count of pixels: 1280×720 (720p / HD), 1920×1080 (1080p / Full HD), 3840×2160 (4K / UHD), 7680×4320 (8K). Each step roughly doubles the pixel count.
Pragmatically: 720p is the floor for usable consumer video and the default for low-bandwidth scenarios. 1080p is the dominant consumer standard. 4K is the OTT premium tier (Netflix, Disney+) and the default for new TV apps. 8K exists but adds bandwidth without proportional perceived quality — outside niche use cases (large-screen presentations, certain medical imaging) it’s overkill.
Color spaces and HDR: Rec. 709, Rec. 2020, P3
Each pixel stores a colour. The colour space defines what range of colours can be represented. Rec. 709 is the SDR HD standard (the baseline most apps assume). Rec. 2020 is the wide-gamut HDR standard. DCI-P3 sits between them and is what most modern phones and laptops display.
For product decisions: SDR (Rec. 709) is fine for almost all conferencing, telehealth, classroom, and surveillance use cases. HDR (Rec. 2020 + PQ/HLG transfer functions) matters for premium OTT, sports, gaming. Mismatched colour space pipelines (HDR captured, SDR delivered without proper tone-mapping) produce washed-out playback — one of the most common quality complaints we audit.
Bit depth: 8-bit, 10-bit, 12-bit
Bit depth is how many bits represent each colour channel per pixel. 8-bit gives 256 levels per channel (16.7M colours). 10-bit gives 1,024 levels (1.07B colours). 12-bit gives 4,096 levels (68.7B colours). HDR usually requires 10-bit minimum.
For product decisions: 8-bit is the consumer default and what most encoders ship by default. 10-bit becomes important for HDR delivery and premium OTT. 12-bit is broadcast cinema territory.
Reach for 10-bit when: you’re delivering HDR, archiving for editorial, or producing premium content. For everything else, 8-bit AV1 is now the cost-efficient default.
Bitrate: how many bits per second your video consumes
Bitrate is how many bits per second the encoded video stream consumes. Common reference points (after H.264 compression): 720p 30fps ~1–2 Mbps; 1080p 30fps ~3–5 Mbps; 1080p 60fps ~5–8 Mbps; 4K 30fps ~12–20 Mbps; 4K 60fps HDR ~25–40 Mbps. AV1 cuts those numbers by 30–50% for the same perceived quality.
Bitrate is the layer where business decisions concentrate: CDN egress is billed per gigabyte; SaaS streaming is billed per minute at a given bitrate; viewer experience depends on whether the bitrate fits the user’s connection. Adaptive bitrate streaming (ABR) addresses this by encoding multiple quality levels and dynamically picking the right one for each viewer.
Want our codec / bitrate / CDN cost math against your scope?
Send us your viewer count and quality target. We’ll quote AV1 vs H.264 vs HEVC, projected CDN bill, and a SaaS-vs-custom verdict in 30 minutes.
Codecs: H.264, HEVC, AV1, VP9, and the 2026 picture
A codec compresses raw video into a smaller stream. The 2026 picture:
| Codec | Best for | Compression vs H.264 | 2026 status |
|---|---|---|---|
| H.264 / AVC | Universal compatibility | Baseline | Default fallback; 100% device coverage |
| HEVC / H.265 | Premium OTT, 4K HDR | ~25–50% smaller | Strong on Apple, mature elsewhere |
| AV1 | Modern web streaming | ~30–50% smaller | Mainstream on Chrome/Firefox/Safari 17+ |
| VP9 | YouTube legacy | ~20–35% smaller | Being replaced by AV1 |
| VVC / H.266 | 8K, future archival | ~50% smaller | Early adoption only |
Real-world pattern in 2026: ship H.264 baseline as the universal fallback, AV1 as the modern primary for web/mobile/TV apps, HEVC for Apple-heavy audiences and premium 4K HDR. The CDN savings from AV1 typically pay back the encoding cost inside three months at any non-trivial scale.
Reach for AV1 as primary when: your viewer base skews modern web/mobile and your CDN egress is over $5k/month. Below that, the encoding overhead and limited TV-app coverage tip the balance toward H.264 + HEVC.
Containers: MP4, WebM, fMP4 (CMAF), MKV
A container holds the encoded video, audio, subtitles, and metadata. MP4 is the universal container (works on every device). WebM is the open container preferred for AV1 + Opus on the web. fMP4 (Fragmented MP4, formalised as CMAF) is the modern HTTP streaming container — one container that works for both HLS and DASH. MKV is the open desktop/archive container.
For product decisions: CMAF is the modern OTT default because it lets you ship one set of segments to both Apple HLS and DASH players. MP4 for downloadable content. WebM for AV1-native web playback.
Streaming protocols: WebRTC, WHIP, HLS, LL-HLS, DASH, RTMP
Transport is the layer where buyer-facing decisions concentrate. The protocol map:
WebRTC. Sub-500 ms peer-to-peer video. The default for conferencing, classrooms, telehealth, trader streaming. SFU-mediated for groups.
WHIP (RFC 9725, March 2025). WebRTC over HTTP for ingest. The new standard for low-latency live streaming ingest, replacing RTMP for new builds. FFmpeg merged WHIP support June 2025.
HLS. HTTP Live Streaming, Apple’s adaptive-bitrate protocol. 5–15 s latency typical. The dominant OTT and on-demand standard. Scales to millions via CDN.
LL-HLS. Low-Latency HLS. Sub-2 s glass-to-glass with chunked transfer encoding and partial segments. The right choice when you need both scale and reduced latency.
DASH. MPEG-DASH, the open adaptive-bitrate alternative. Functionally similar to HLS, used in some non-Apple ecosystems. Most modern stacks ship CMAF that works for both.
RTMP. Legacy ingest protocol. Still ubiquitous on encoders (OBS, Wirecast) but being replaced by WHIP for new builds. Keep RTMP as a fallback path through 2027.
Reach for WebRTC / WHIP when: latency must be under 1 s. Reach for LL-HLS when latency target is 1–3 s and you need CDN scale. Reach for HLS / DASH when 5–15 s is fine and viewer count is in the millions.
Adaptive bitrate streaming and per-title encoding
Different viewers have different network speeds. ABR encodes the same video at multiple quality levels (e.g. 240p / 480p / 720p / 1080p / 4K) and lets the player pick the best one for each viewer’s connection in real time. The OTT-standard ABR ladder is 5–7 levels.
Per-title encoding is the 2026 refinement: the optimal bitrate ladder differs per piece of content (a high-motion action movie needs more bits than a static documentary). Modern encoders (Mux Optimize, AWS Elemental, Bitmovin) auto-tune per-title and save 15–30% on CDN egress for large catalogues.
Reach for per-title encoding when: your catalogue has more than ~100 hours of content with varied motion characteristics. Below that, a fixed ABR ladder is fine. Above that, per-title compounds savings every viewer-hour.
Encoding economics: AV1 vs H.264 over a year
A typical OTT product with 100k monthly active users watching 3 hours/week at 1080p produces ~24M GB of egress per year on H.264. Cloudfront retail rates put that at ~$2.4M/year egress; volume contracts cut it to ~$1.2M/year. Switch to AV1 and the same viewer experience consumes ~12–14M GB — a $600–1.2M annual saving. Encoding cost (one-time per-title) is ~$5–15k for a typical 5,000-hour catalogue. Payback inside three months.
Pattern we use: AV1 as primary, HEVC as Apple-tier fallback, H.264 as universal compatibility. Encoders fan out automatically; the player picks the best supported codec per device. Want a similar codec-mix audit on your existing pipeline?
2026 inflection points: what changed since 2024
WHIP became the live ingest standard. RFC 9725 in March 2025, FFmpeg in June 2025. New builds default to WHIP; legacy keeps RTMP as fallback.
AV1 hit web mainstream. Safari 17+ added AV1 decode in 2024; Chrome and Firefox have shipped it since 2018. By 2026 you can reasonably ship AV1 as primary on the open web.
LL-HLS reached production maturity. Apple’s LL-HLS spec is now widely supported by encoders, packagers, and players. Sub-2 s OTT is realistic.
Twilio Programmable Video EOL December 5, 2026. Migration time. Our Twilio Video alternatives playbook covers the move.
AI in the video pipeline went mainstream. AI noise cancellation, AI captioning, AI summarisation, AI scene detection — all production-grade in 2026.
Vendor fluency check: questions a senior streaming partner can answer
If you’re scoping a video build, the candidate vendor should be able to answer these in a 30-minute scoping call without prep.
1. What codec/protocol mix would you ship for our use case? The right answer is specific (e.g., “AV1 primary, HEVC for Apple, H.264 fallback; LL-HLS for delivery”).
2. What bitrate ladder for adaptive streaming? The right answer mentions 5–7 levels and per-title encoding for catalogues.
3. What’s your latency floor with our scope? The right answer cites WebRTC sub-500 ms or LL-HLS sub-2 s based on your latency requirement.
4. How will you handle WHIP vs RTMP ingest? The right answer ships WHIP-first with RTMP as a fallback.
5. What’s the AV1 / HEVC / H.264 cost trade-off in our case? The right answer cites real CDN egress numbers and encoding cost.
Want our answers to those five questions for your scope?
30 minutes, real engineering opinions, codec and protocol recommendations, fixed-range estimate at the end.
Five pitfalls in 2026 video pipelines
1. Defaulting to H.264 only. CDN egress on H.264 vs AV1 is 30–50% more expensive. Multi-codec encoding is now table stakes; H.264-only saves no money long-term.
2. Picking RTMP for new ingest. WHIP is the standard since March 2025. New builds should be WHIP-first.
3. Ignoring per-title encoding. Generic ABR ladders waste 15–30% bitrate on uneven content. Per-title encoding is mature and pays back fast.
4. HDR pipeline mismatches. HDR captured + SDR displayed without proper tone-mapping = washed-out playback. Either commit to HDR end-to-end or stay SDR.
5. Latency-protocol mismatch. Picking HLS for trader streaming or WebRTC for OTT VOD is the most common architectural mistake. Match the protocol to the latency target.
KPIs to track on a video pipeline
Quality KPIs. Glass-to-glass latency p95 (target by use case), MOS audio (target ≥4.0), buffer ratio (<1.5%), startup time (target <2 s), AV bitrate served vs requested.
Business KPIs. CDN egress cost per viewer-hour, encoding cost per hour of content, viewer-quality-index (a composite of buffer + bitrate + startup), retention vs quality cohort.
Reliability KPIs. CDN/origin uptime (≥99.95%), session error rate (<0.5%), MTTD on outages (<5 min), playback failure rate by device class (<0.5%).
When NOT to build a custom video pipeline
For under 100k participant-min/month or under 50k viewer-hours/month, a SaaS video pipeline (Mux, Cloudflare Stream, AWS IVS, Daily, LiveKit Cloud) ships faster and costs less than custom. The crossover happens around 500k participant-min/month or 250k viewer-hours/month, where custom on a vanilla LiveKit OSS / FFmpeg / S3 stack starts paying back.
Where custom video pipelines truly pay off is high-volume products, sub-second latency requirements, regulated workloads, branded OTT, or specific feature requirements SaaS won’t ship. Our video and audio streaming services page maps the scope.
FAQ
Should I ship AV1 in 2026 or stick with H.264?
Both. Modern players (Chrome, Firefox, Safari 17+) decode AV1 natively, so AV1 as primary delivers 30–50% bandwidth savings; H.264 as universal fallback covers everything older. The CDN savings typically pay back AV1 encoding costs within three months at any non-trivial scale.
What latency should I expect from each protocol?
WebRTC: 150–500 ms glass-to-glass (sub-second). WHIP ingest: same. LL-HLS: 1–3 s. HLS: 5–15 s. DASH: similar to HLS. RTMP ingest + HLS playback: 5–10 s combined. Match protocol to latency target.
What’s the difference between HEVC and AV1?
HEVC (H.265) is the older premium codec, mature on Apple devices and most TVs, with patent royalties. AV1 is newer, royalty-free, with similar compression. AV1 is the right primary choice for new web/mobile builds; HEVC remains useful for Apple-heavy audiences and 4K HDR delivery to TVs that don’t yet decode AV1 in hardware.
When does HDR matter?
For premium OTT, sports, gaming, and produced video where colour fidelity is part of the experience. SDR is fine for conferencing, classroom, telehealth, surveillance. If you’re committing to HDR, commit end-to-end: HDR capture, 10-bit pipeline, HDR-aware encoding, HDR-capable players. Half-pipelines produce washed-out output.
What encoding setup should I budget for?
For SaaS, Mux or AWS Elemental MediaConvert handle encoding without ops overhead. For custom, FFmpeg + a GPU fleet (NVIDIA T4 or L4 instances) at $0.50–$2 per hour of content encoded depending on codec. AV1 software encoding is ~5× slower than H.264; budget accordingly or use hardware AV1 encoders.
How does a CDN affect my video pipeline?
A CDN caches and serves video segments close to viewers, dramatically improving startup time and reducing origin bandwidth. Cloudfront, Akamai, Fastly, and Cloudflare are the main options. Egress pricing typically dominates the CDN bill; volume contracts cut retail rates 30–50%. For large catalogues, multi-CDN strategies (Fastly + Cloudfront + Cloudflare with traffic steering) deliver better availability.
Should I use FFmpeg or a managed encoder?
FFmpeg is the open-source default and gives you full control. Managed encoders (Mux, AWS Elemental, Bitmovin) trade some control for ops simplicity, per-title optimisation, and reliability. For most teams: managed during MVP, FFmpeg-on-GPU at scale once cost compounding makes ops sense.
How does Fora Soft price a video pipeline build?
A focused video pipeline (encoder + packager + CDN integration + player) typically lands at $40–120k MVP depending on codec mix and protocol scope. Multi-codec ABR with AV1, HEVC, H.264 + LL-HLS + multi-CDN runs higher. Book a scoping call and we’ll quote a specific range.
What to read next
Architecture
WebRTC Architecture Guide for 2026
P2P, SFU, MCU, hybrid — how they fit your scope.
Cost analysis
SaaS vs Custom Streaming Cost
24/60-month TCO math for the build-vs-buy choice.
Latency
Low-Latency Streaming Solutions
Sub-second WebRTC and WHIP architecture deep-dive.
Migration
Twilio Video Alternatives
Migration guide before December 2026 EOL.
Pricing
LiveKit vs Agora Pricing
Per-minute math and OSS migration paths.
Ready to ship a 2026-current video pipeline?
Digital video in 2026 isn’t hard if you understand the layers. Six well-known decisions — capture, frame structure, color, codec, container, transport — cover most of the engineering surface. The vendor decision is whether your partner can navigate them confidently. AV1 + HEVC + H.264 multi-codec, LL-HLS or WebRTC for low-latency, WHIP for ingest, CMAF for OTT, multi-CDN for scale. Mismatches between use case and protocol or codec are the most common architectural mistakes we audit.
If you’re scoping a video build in 2026 — OTT, conferencing, telehealth, classroom, trader, surveillance — we can show you what we’ve shipped on each layer, recommend codec/protocol/CDN choices for your scope, and quote a fixed range in 30 minutes.
Build with a partner who knows every layer of the stack
30 minutes, real engineering opinions, no slides, a fixed-range estimate at the end.



.avif)

Comments