Learning course · Updated June 2026

Audio for video, end to end: codecs, loudness, sync, WebRTC

How audio actually works inside a video product — sample rate and loudness, the AAC and Opus codecs, LUFS targets per platform, the WebRTC audio pipeline, lip-sync, and Dolby Atmos. A practical course from Fora Soft engineers, from the microphone to the viewer’s ears.

Every chapter starts with a question and ends with a production decision. Specs cited by document number — ITU-R, EBU, RFC. No marketing slides.

6 chapters 68 articles 100+ glossary terms ~23 hrs total reading

Choose a learning path Browse all chapters Open glossary →

Outcomes

What you'll be able to ship.

Six blocks that take you from the sound wave to the viewer’s ears. By the end, you can choose codecs, hit loudness compliance, build and debug the real-time audio pipeline, keep audio locked to the picture, and deliver immersive sound — for live, VOD, and conferencing.

Choose the right audio codec for any use case

AAC for streaming reach, Opus for real-time, AC-4 and MPEG-H for immersive broadcast, LC3 for Bluetooth. Know which codec wins, and why.

Hit loudness compliance on every platform

LUFS targets, true peak, and dialnorm to EBU R128, ITU-R BS.1770, and ATSC A/85 — so audio passes Spotify, YouTube, Netflix, and broadcast checks.

Build the WebRTC audio pipeline end to end

Acoustic echo cancellation, noise suppression, AGC, the NetEQ jitter buffer, PLC, and FEC — a real-time path that stays clear under packet loss.

Keep audio and video in sync

PTS/DTS, PCR, RTP/RTCP, and the lip-sync tolerance window (ITU-R BT.1359). Diagnose drift and correct it across WebRTC, HLS, and DASH.

Deliver immersive audio

Dolby Atmos and MPEG-H from master to stream, plus ambisonics, HRTF, and binaural rendering for VR, AR, and conferencing.

Measure audio quality objectively

PESQ, POLQA, ViSQOL, and subjective MUSHRA/MOS testing — so you can prove a codec or pipeline change improved quality, not just claim it.

Pick a path

Three routes through audio for video

The same 57 articles, ordered for what you actually need to do this quarter.

Path A · 4 hrs

Audio foundations

From "what is digital audio" to reading any audio spec without a glossary. Sample rate, bit depth, channels, lossless formats, and how loudness is measured.

What is digital audio

Sample rate and bit depth

Channels and channel layouts

Loudness, peak, RMS, LUFS

Containers, frames, and packets

12 articlesStart path →

Path B · 11 hrs

Streaming and real-time audio

Ship audio that holds up. The codec landscape, audio in HLS/DASH/CMAF, loudness compliance, and the full WebRTC audio pipeline.

Audio codecs — AAC, Opus, AC-4, MPEG-H, LC3

Audio in HLS, DASH, and CMAF

Loudness normalization and LUFS targets

The WebRTC audio pipeline — AEC, NS, AGC

Jitter buffer (NetEQ), PLC, and FEC

30 articlesStart path →

Path C · 8 hrs

Sync, spatial, and quality

The hard parts. Lip-sync and timestamps, drift correction, Dolby Atmos and MPEG-H, spatial audio, objective quality metrics, and where AI is taking audio next.

Lip-sync windows — ITU-R BT.1359

PTS, DTS, PCR, and RTP/RTCP sync

Dolby Atmos and MPEG-H 3D Audio

Ambisonics, HRTF, and binaural rendering

Quality metrics and neural codecs

21 articlesStart path →

Syllabus

The full course in six chapters

Every chapter is self-contained. Read in order, or jump straight to the block you need — from digital fundamentals to immersive audio.

Foundations of Digital Audio

What audio is in digital terms — sample rate, bit depth, channels, lossless formats, loudness, containers, and frames.

Beginner8 articles · ~2.5 hrs

Read→

Audio Codecs

Every codec a video product meets — AAC, Opus, AC-3/E-AC-3, AC-4, MPEG-H, LC3, speech and lossless — plus a 2026 comparison table and decision tree.

beginner13 articles · ~4 hrs

Read→

Streaming Audio

Audio in HLS/DASH/CMAF, loudness normalization, LUFS targets per platform, true peak, multi-language, and Atmos in streaming.

intermediate12 articles · ~4 hrs

Read→

Real-Time Audio (WebRTC)

The WebRTC audio pipeline — AEC, AGC, noise suppression, VAD/DTX, the NetEQ jitter buffer, PLC, FEC, recording, and group calls at scale.

intermediate14 articles · ~5 hrs

Read→

AV-Sync and Timestamps

Lip-sync tolerance, PTS/DTS/PCR, RTP/RTCP/NTP, the end-to-end timestamp diagram, drift correction, and lip-sync test methodology.

advanced9 articles · ~3 hrs

Read→

Spatial, Quality, and the Future

Dolby Atmos and MPEG-H, ambisonics/HRTF, audio quality metrics (PESQ/POLQA/ViSQOL), AI dubbing, and neural codecs.

Advanced12 articles · ~4.5 hrs

Read→

Ship production-grade audio in your video product

Talk to the engineers who built it. Fora Soft helps teams choose audio codecs, hit loudness compliance, build and debug the WebRTC audio pipeline, fix lip-sync, and deliver immersive sound for telemedicine, conferencing, e-learning, live, and OTT.

Book a 30-min call Estimate cost

Featured

Where to start.

Hand-picked deep dives across codecs, loudness, the real-time pipeline, and sync — the highest-impact reads first, before you commit to a learning path.

Reference

The vocabulary of audio for video

100+ terms with crisp definitions, aliases, and links to deep dives. From LUFS and Opus to NetEQ and Dolby Atmos — the full A–Z is one click away.

LUFS

Loudness Units relative to Full Scale (ITU-R BS.1770). The standard unit for perceived loudness; every streaming platform normalizes to a LUFS target.

Opus

The open, royalty-free codec (RFC 6716) that dominates WebRTC. Switches between SILK (speech) and CELT (music) and scales from 6 to 510 kbps.

AAC

Advanced Audio Coding — the default codec for MP4, HLS, and DASH playback, and the standard on Apple devices (AAC-LC, HE-AAC, xHE-AAC).

AEC

Acoustic Echo Cancellation — the WebRTC stage that removes far-end echo from a microphone signal (WebRTC AEC3).

Lip-sync

Audio-to-video timing alignment. ITU-R BT.1359 defines the tolerance window before viewers notice the drift.

Dolby Atmos

Object-based immersive audio that places sounds in 3D space, delivered to film, streaming, and music.

Browse all 100+ terms

Written and maintained by

The author.

Nikolay Sapunov

CEO at Fora Soft

Leads a software studio specialising in video- and audio-centric products — streaming platforms, WebRTC apps, video conferencing, and AI-driven video and audio tools. Writes this course so product and engineering teams can reason clearly about the codes, protocols, audio pipelines, and advanced audio features for modern video and audio software.

Full author page →

LinkedIn →

GitHub →

FAQ

Frequently asked questions.

What is audio for video?

Audio for video is the engineering discipline of capturing, encoding, delivering, and synchronizing sound alongside a video stream. It spans digital fundamentals (sample rate, bit depth, loudness), the codecs that compress sound (AAC, Opus, AC-4, MPEG-H), loudness normalization for streaming, the real-time WebRTC pipeline, lip-sync, and immersive formats like Dolby Atmos. Unlike music production, it is judged by intelligibility, loudness compliance, and staying in sync with the picture.

AAC vs Opus — which audio codec should I use?

Use AAC for on-demand and broadcast streaming: it is the default in MP4, HLS, and DASH, and it is universally supported on Apple devices and smart TVs. Use Opus for real-time and interactive audio: it is the de-facto WebRTC codec (RFC 6716), royalty-free, and scales from 6 to 510 kbps while switching between speech (SILK) and music (CELT) modes. Many products ship both — AAC for playback, Opus for live.

What LUFS should I target for streaming platforms?

Most platforms normalize to a fixed integrated-loudness target measured in LUFS (ITU-R BS.1770). Common 2026 targets: Spotify and YouTube around −14 LUFS, Apple Music around −16 LUFS, podcasts around −16 to −19 LUFS, and broadcast (ATSC A/85 / EBU R128) at −24 LKFS / −23 LUFS. Keep true peak at or below −1 dBTP to avoid clipping after encoding. Master to the platform's target, not to one universal number.

How does the WebRTC audio pipeline work?

On capture, WebRTC runs the microphone signal through acoustic echo cancellation (AEC), noise suppression, and automatic gain control, then voice activity detection (VAD) with discontinuous transmission (DTX) to save bandwidth in silence. The audio is encoded with Opus, protected by in-band FEC, and packetized over RTP. At the receiver, the NetEQ jitter buffer absorbs network jitter, packet loss concealment (PLC) hides lost frames, and the decoder feeds the renderer — inside a sub-150 ms budget.

What is acceptable audio-to-video (lip-sync) latency?

Per ITU-R BT.1359, lip-sync stays imperceptible when audio leads the video by no more than about 45 ms or lags by no more than about 125 ms — the ear tolerates sound arriving late better than early. Broadcast specs such as ATSC IS-191 and EBU R37 tighten the target to roughly +40/−60 ms at the distribution point. Past these windows the mismatch becomes objectionable, so live, conferencing, and OTT pipelines budget sync explicitly.

What's the difference between Dolby Atmos and MPEG-H?

Both are immersive, object-based audio: instead of fixed channels, they carry sound objects plus positional metadata that a renderer maps to any speaker layout or to headphones. Dolby Atmos is proprietary and dominant in cinema and streaming (Netflix, Disney+, Apple Music). MPEG-H 3D Audio is the open ISO standard (used in ATSC 3.0 broadcast and by some music services) and adds listener interactivity such as dialogue-level control. Atmos has wider device reach; MPEG-H is more flexible.

Need to ship audio in video, not just understand it?

Fora Soft has built real-time video, audio, and AI products since 2005 — WebRTC, LiveKit, generative pipelines, and AI agents at scale. Tell us what you’re building and we’ll send a real engineer your way.

Book a 30-min call Try the cost calculator Lip-Sync Test Methodology Checklist →

Audio for video, end to end: codecs, loudness, sync, WebRTC

What you'll be able to ship.

Choose the right audio codec for any use case

Hit loudness compliance on every platform

Build the WebRTC audio pipeline end to end

Keep audio and video in sync

Deliver immersive audio

Measure audio quality objectively

Three routes through audio for video

Audio foundations

Streaming and real-time audio

Sync, spatial, and quality

The full course in six chapters

Foundations of Digital Audio

Audio Codecs

Streaming Audio

Real-Time Audio (WebRTC)

AV-Sync and Timestamps

Spatial, Quality, and the Future

Ship production-grade audio in your video product

Where to start.

AI in Audio for Video: Voice Cloning, Dubbing, Restoration, Generative Music

RTP Timestamps, RTCP Sender Reports, and NTP Synchronization

Diagnosing Audio Problems in Production: A Runbook

How To Choose an Audio Codec for Your Service in 2026: a Decision Tree

A short history of audio codecs: from MP2 (1991) to LC3 (2020)

What is digital audio: from sound wave to bits

The vocabulary of audio for video

The author.

Nikolay Sapunov

Frequently asked questions.

Need to ship audio in video, not just understand it?