
Key Takeaways
- P2P delivers the lowest latency for 1-4 participants at 50-100ms with zero server cost, making it ideal for intimate video calls and high-urgency interactions.
- SFU scales to 1,000+ concurrent users by forwarding unchanged media streams, costing $300–500/month for 100 concurrent users and maintaining 100–200ms latency for superior real-time interaction.
- MCU provides broadcast-ready output with server-side stream mixing, supporting 10–100 users at 200–400ms latency and costing $2,000–5,000/month, best suited for recordings, streaming, and large-scale events.
- Mesh topology extends P2P to 6–10 participants with 100–300ms latency by adding peer discovery and topology awareness, eliminating server infrastructure for small group calls.
- Hybrid P2P+SFU architecture automatically routes calls under 4 participants through P2P for 50ms latency and cost zero, then escalates to SFU for groups, delivering optimal latency and cost efficiency across all call sizes.
Why This Architecture Guide Matters for Your Video Conferencing App
Choosing the wrong video conferencing architecture can cost your startup hundreds of thousands in wasted server infrastructure, kill user experience with 500ms latency, or lock you into a CPaaS vendor with 40% monthly recurring costs. Fora Soft's video conferencing development team has guided 15+ clients through this decision, from BrainCert's 100K concurrent users on SFU to CirrusMED's HIPAA-compliant telemedicine platform. This guide gives you the technical and financial frameworks to select your architecture, estimate costs with precision, and avoid the 5 most common pitfalls that force expensive mid-project rewrites.
What to read next
AI emotion
AI Emotion Detection in Video Conferences: 2026 Architecture, Use Cases & EU AI Act Playbook
What AI emotion detection actually does in video conferences — and how to evaluate it.
FAQ
FAQ: Emotion Recognition in Video Conferencing in 2026
How emotion recognition in video conferencing works — production-grade FAQ.
AI features
AI-Driven Video Conferencing in 2026: Buyer’s and Builder’s Guide
Transform video calls with AI-driven conferencing solutions — what to look for.
Features
AI Video Conferencing Features: 12 Smart Tools Revolutionizing Virtual Meetings in 2026
Twelve smart AI tools that are revolutionising virtual meetings.
NLP
Enhancing Video Calls With AI Language Processing
Enhancing video calls with AI language processing — translation, summaries, action items.
Architecting your video conferencing system?
Book a 30-min architecture call with our WebRTC team to compare P2P vs MCU vs SFU for your specific use case.
Quick Answer: Which Architecture for Your Call Size?
Use this decision tree to identify your starting point. When evaluating P2P vs MCU vs SFU for your scale, You'll dive deeper into each architecture in the sections below, but here's the 60-second version:
- 1–4 users: P2P via WebRTC with STUN/TURN. Zero server cost, 50–100ms latency, ideal for 1:1 calls and high-fidelity audio.
- 5–10 users: Mesh topology (P2P with peer discovery) or SFU with media forwarding. 100–300ms latency, minimal server overhead.
- 11–100 users: SFU is your workhorse. $300–500/month for 100 concurrent, 100–200ms latency, no server-side mixing overhead.
- 100+ users or recording/broadcast: SFU for ingestion, MCU for output mixing and broadcast. $2,000–5,000/month, 200–400ms latency, professional-grade output.
The WebRTC Foundation: getUserMedia, RTCPeerConnection, and Signaling
Every video conferencing architecture in this guide—P2P, SFU, MCU—rests on three WebRTC primitives that handle media capture, peer connection management, and real-time data exchange.
getUserMedia: Capture and Constraints
navigator.mediaDevices.getUserMedia() grants access to the user's camera and microphone. Specify constraints for resolution, frame rate, and audio echo cancellation:
const constraints = {
video: { width: 1280, height: 720, frameRate: 30 },
audio: { echoCancellation: true, noiseSuppression: true }
};
const stream = await navigator.mediaDevices.getUserMedia(constraints);
On mobile or bandwidth-constrained networks, reduce to 360p@15fps to cut bandwidth by 75% while maintaining acceptable quality (MOS 3.5–4.0).
RTCPeerConnection: The Media Pipeline
RTCPeerConnection manages the actual media transport. Add audio and video tracks from your local stream, then connect to a remote peer:
const peerConnection = new RTCPeerConnection();
const localStream = await navigator.mediaDevices.getUserMedia(constraints);
localStream.getTracks().forEach(track => {
peerConnection.addTrack(track, localStream);
});
peerConnection.ontrack = event => {
remoteVideo.srcObject = event.streams[0];
};
RTCPeerConnection also handles codec negotiation (VP8 for bandwidth efficiency, H.264 for hardware decode, AV1 for next-gen compression), bandwidth adaptation, and jitter buffer management automatically.
Signaling: The Invisible Choreographer
WebRTC itself handles media transport, but signaling—the exchange of SDP offers/answers and ICE candidates—requires your application layer. Use WebSocket, Firebase, or a dedicated signaling service to exchange connection metadata:
// Initiator creates an offer
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
signaling.send({ type: 'offer', sdp: offer.sdp });
// Receiver gets offer and sends answer
peerConnection.onicecandidate = event => {
if (event.candidate) {
signaling.send({ type: 'candidate', candidate: event.candidate });
}
};
ICE, STUN, and TURN: Crossing NAT Barriers
Most users sit behind a NAT (Network Address Translator). ICE (Interactive Connectivity Establishment) discovers public IP addresses using STUN servers (free, usually Google's 8.8.8.8), and if direct P2P fails, TURN servers relay traffic (typically $0.10–0.20/GB on AWS or Twilio). Budget 20–40% of your bandwidth costs for TURN relay.
P2P (Peer-to-Peer): The Latency Champion
P2P connects two participants directly without any server in the path. Peer A sends video directly to Peer B; Peer B sends directly back. No mixing, no forwarding, no processing.
Performance Profile
- Max concurrent participants: 2–4 (rarely more; 5+ introduces exponential complexity)
- Latency: 50–100ms (end-to-end), dominated by network RTT and codec processing
- Bandwidth per peer: 1 Mbps upload, 1 Mbps download (full video quality)
- Server cost: $0/month (only STUN/TURN as backup)
- Best use cases: 1:1 calls, high-touch sales calls, healthcare consults, therapy sessions
Why P2P Fails at Scale
In a 5-person call with full mesh P2P, each peer sends and receives 4 video streams simultaneously. Peer A uploads its own stream (1 Mbps) plus downloads from B, C, D, E (4 Mbps total = 5 Mbps ingress). Mobile users on LTE with 2–3 Mbps available are dead. CPU usage decoding 4 simultaneous H.264 streams maxes out even modern phones. The solution is SFU or MCU.
Mesh Topology: P2P with Peer Discovery and Awareness
Mesh is a hybrid between pure P2P and server-forwarded architectures. Peers still send directly to each other, but a lightweight server maintains peer discovery and topology awareness—who is connected, who left, optimal relay paths.
Performance Profile
- Max concurrent participants: 6–10 (topology awareness limits practical group size)
- Latency: 100–300ms (direct P2P + potential relay hops)
- Bandwidth per peer: 1 Mbps upload + (N-1) × 1 Mbps download
- Server cost: $50–150/month (low-compute signaling and relay)
- Best use cases: Small group calls (4–8 people), remote team standups, distributed training sessions
Open-Source Mesh Frameworks
Jitsi: Electron desktop app with optional server-side forwarding. Runs on-premises; zero cloud costs if self-hosted. Popular in education and healthcare.
BigBlueButton: Open-source webinar platform built on Jitsi. Scales to 100+ participants with screen sharing and breakout rooms. $500–2,000/month to self-host.
SFU (Selective Forwarding Unit): The Industry Standard
An SFU is a media server that receives streams from all participants and forwards each to every other participant, without decoding or mixing. It's selective: Peer A sends one VP8 stream; the SFU forwards that same stream to B, C, D without re-encoding. This separation of concerns—low server CPU, independent bitrate adaptation per peer—makes SFU the de facto standard for video conferencing.
How SFU Scales: Bandwidth Arithmetic
In a 10-person call on SFU:
- Each peer uploads 1 Mbps (its own stream).
- Each peer downloads 9 Mbps (streams from 9 others).
- SFU ingests 10 Mbps total, egresses 90 Mbps (10 streams × 9 recipients).
On AWS, egress costs $0.15/GB. At 90 Mbps continuous (worst case), a 1-hour session = 675 MB × $0.15 = $0.10 per person-hour. For 100 concurrent users, egress is 9,000 Mbps = 101 GB/hour = $15.15/hour in bandwidth alone. Add server compute: a medium EC2 instance (c5.2xlarge, 8 vCPU) handles 20–40 concurrent users and costs $0.34/hour. For 100 users, you need 3 instances = $1.02/hour. Total cost: $15–20/hour or $360–480/month for 24/7 operation. Real platforms batch this: Daily.co charges $0.006/participant-minute for SFU, or roughly $300/month for 100 concurrent users at full occupancy.
Performance Profile
- Max concurrent participants: 5–1,000+ (bandwidth and server CPU are the only limits)
- Latency: 100–200ms (forward delay through SFU + network RTT)
- Bandwidth cost: $0.10–0.20/person-hour egress
- Compute cost: $300–500/month for 100 concurrent users
- Best use cases: Team meetings, webinars, online classes, social video platforms
Open-Source SFU Frameworks
Mediasoup (C++ / Node.js) is the heavyweight champion. Used by Dyte, 100ms, and GetStream. Supports full custom bitrate adaptation, screen sharing, and simulcast (multiple quality tiers per peer). Learning curve is steep; expect 3–6 months of engineering to integrate. LiveKit (Go, Apache 2.0) wraps Mediasoup with a simpler API and dashboard. Deploy on your own infrastructure or use LiveKit Cloud SaaS ($0.0075/participant-minute). Janus (C) is modular and lightweight but older; good for embedded or legacy systems. ion-sfu (Go) is newer and Kubernetes-native; used by GetStream. Pion (Go library) is the most minimal—use it to build your own SFU from scratch if you want full control and are willing to spend 6–12 months on development.
MCU (Multipoint Control Unit): Broadcast-Ready Mixing
An MCU decodes all incoming streams, composites them into a single high-quality output, and re-encodes for distribution. It's used for recording, broadcasting, or when you need a single "gallery view" feed for 50+ participants.
MCU Bandwidth and Cost Calculation
In a 10-person call with recording via MCU:
- Each peer uploads 1 Mbps (10 Mbps total ingress to MCU).
- MCU creates a 2x2 or 3x3 grid, encodes to 4 Mbps composite.
- Each peer downloads the composite (4 Mbps × 10 = 40 Mbps total egress).
- MCU CPU: decoding 10 streams + compositing + re-encoding = massive CPU load (100–300% higher than SFU).
For 100 concurrent users with MCU recording: 100 Mbps egress = 1,125 GB/hour. At $0.15/GB = $169/hour bandwidth alone. Compute for MCU: a GPU-accelerated instance (g3s.xlarge on AWS with NVIDIA Tesla M60) costs $3.06/hour and handles 8–10 concurrent MCU sessions. For 10 MCU sessions, you need ~1–2 GPU instances = $6–12/hour compute. Total MCU cost: $175–180/hour or $2,000–5,000/month for 24/7 operation.
Performance Profile
- Max concurrent participants: 10–100 (limited by MCU compositing CPU)
- Latency: 200–400ms (decode, composite, re-encode pipeline)
- Bandwidth cost: $0.15–0.30/person-hour egress
- Compute cost: $2,000–5,000/month for 100 concurrent users
- Best use cases: Recording, broadcasting, webinars with single gallery view, TV-like production
Hybrid Architecture: P2P + SFU Automatic Escalation
The smartest deployment uses P2P for small calls (zero cost, 50ms latency) and automatically escalates to SFU once group size exceeds 4 participants. This is what BrainCert implemented and it reduced their infrastructure costs by 60% while improving latency perception.
Implementation Logic
if (participantCount <= 4) {
// Direct P2P mesh
mode = 'p2p';
each peer connects directly to all others
} else if (participantCount <= 50) {
// SFU with VP8 simulcast (3 quality tiers)
mode = 'sfu';
router = mediasoup router with simulcast layers
} else {
// SFU + optional MCU for recording
mode = 'sfu_plus_record';
mcu_instance = spawn GPU for composite recording
}
Hybrid Cost Profile
- 1–4 users: $0/month (P2P, STUN/TURN minimal)
- 5–50 users: $300–500/month (SFU only)
- 50–100 users: $500–1,000/month (SFU) + $2,000/month (MCU for recording)
- Latency: 50ms (1:1 P2P) → 150ms (SFU 5–50) → 300ms (MCU composite)
Architecture Comparison Matrix
This P2P vs MCU vs SFU comparison matrix breaks down Here's the complete comparison at a glance. Use this to shortlist your architecture before diving into implementation details.
| Architecture | Max Concurrent Users | Server Cost/Month (100 Users) | Upload Bandwidth/Peer | Latency | Best For |
|---|---|---|---|---|---|
| P2P | 2–4 | $0 | 1 Mbps | 50–100ms | 1:1 calls, high-urgency interaction |
| Mesh | 6–10 | $50–150 | 1 Mbps + relay | 100–300ms | Small groups, team standups |
| SFU | 5–1,000+ | $300–500 | 1 Mbps | 100–200ms | Team meetings, webinars, social platforms |
| MCU | 10–100 | $2,000–5,000 | 1 Mbps | 200–400ms | Recording, broadcast, gallery view |
| Hybrid (P2P + SFU) | Unlimited | $0–500 (dynamic) | 1 Mbps | 50–200ms (dynamic) | All use cases, cost-optimal |
Need a second opinion on SFU vs MCU?
Our engineers have shipped SFU-based platforms handling 500M+ minutes. Get a free architecture review.
Bandwidth Mathematics: Calculating Real Costs
Use these formulas to estimate bandwidth costs before architecting.
P2P / SFU Egress Formula
Egress = (N participants) × (N-1 streams) × (bitrate Mbps)
For a 50-person call, bitrate 1 Mbps:
Egress = 50 × 49 × 1 = 2,450 Mbps = 281.25 GB/hour
Cost @ $0.15/GB = $42.19/hour
MCU Egress Formula (Single Composite)
Egress = (N participants) × (composite bitrate Mbps)
For 50-person call, 4 Mbps composite output:
Egress = 50 × 4 = 200 Mbps = 22.5 GB/hour
Cost @ $0.15/GB = $3.38/hour
But CPU cost dominates: 2× GPU instance = $6–12/hour
TURN Relay Cost
For 20% of calls that require TURN relay (users behind restrictive NAT):
TURN Cost = (Egress Mbps) × 0.20 × (TURN rate $/GB)
TURN rate on AWS: $0.10/GB (vs $0.15 public egress)
For 2,450 Mbps SFU call:
TURN @ 20% = 490 Mbps = 56.25 GB/hour
Cost = 56.25 × $0.10 = $5.63/hour
Latency vs. Quality: The MOS Score Tradeoff
Mean Opinion Score (MOS) measures perceived quality on a 1–5 scale. RTCStats and Mux provide MOS via machine learning on codec metrics. Below 3.0 MOS, users perceive degradation; below 2.5, the call is unusable.
Latency Impact on MOS
- 50–150ms: Imperceptible delay, natural conversation flow. MOS 4.5–5.0.
- 150–300ms: Noticeable but acceptable for most calls. Slight "talk-over" effect. MOS 3.5–4.5.
- 300–500ms: Unnatural. Telephone-like quality. MOS 2.5–3.5. Users adapt but dislike it.
- 500ms+: Conversation breaks down. One speaker at a time. MOS below 2.5.
Codec Selection for Quality
VP8: Open-source, royalty-free. Bandwidth-efficient at 500–2,000 kbps. Default for SFU forwarding. 90% of WebRTC deployments.
H.264: Hardware-accelerated decode on mobile/desktop. More efficient at 800–2,500 kbps. Patent-encumbered but widely licensed. Better for battery-constrained mobile.
AV1: Next-generation, 30% smaller files than VP8. Limited hardware support until 2025. Expect AV1-native webinars by 2026.
Open-Source SFU Options: Mediasoup, LiveKit, Janus, and More
Building your own SFU from open-source gives you full control but requires 3–12 months of engineering. Here's the landscape of production-ready frameworks.
Mediasoup: The Gold Standard
Language: C++ (libmediasoup) + Node.js. License: ISC (permissive). Adoption: Used by Dyte, 100ms, and GetStream. Strengths: Full bitrate adaptation, simulcast with quality tiers, VP8/H.264/AV1 support, SVC (scalable video coding), built-in bandwidth estimation. Learning curve: Steep. Requires understanding of RTP, codec internals, network adaptation. Time to production: 4–6 months for a team of 2–3 engineers.
LiveKit: Wrapper with Managed Cloud
Language: Go (server) + client SDKs. License: Apache 2.0. Adoption: Rapid growth, used by GetStream and emerging startups. Strengths: Simpler API, dashboard UI, Kubernetes-native deployment, managed LiveKit Cloud ($0.0075/participant-minute). Trade-off: Less control than Mediasoup for advanced use cases. Time to production: 2–3 months to self-host, 1 week to integrate LiveKit Cloud SaaS.
Janus: Modular and Lightweight
Language: C. License: GPLv3 (requires derivative works to be open-source). Adoption: Mature ecosystem; popular in Europe and education. Strengths: Plugin architecture, low CPU footprint, works on ARM (Raspberry Pi). Trade-off: Older codebase; less active development than Mediasoup. Time to production: 2–4 months.
ion-sfu and Pion: Go-Native Frameworks
ion-sfu (GitHub: pion/ion-sfu) is a pure Go SFU with WebRTC-only support (no RTMP broadcast). Kubernetes-ready, cloud-native. Used by GetStream. Pion is the low-level library—use it if you want maximum flexibility and are willing to spend 6–12 months on development. Time to production: 4–8 months (ion-sfu), 12+ months (Pion from scratch).
CPaaS Platforms: Agora, Daily, 100ms, Dyte, Twilio, LiveKit Cloud
Communications Platform as a Service (CPaaS) vendors offer managed SFU infrastructure with SDKs, so you skip building your own. Trade flexibility for speed-to-market.
Agora: High-Volume, Global CDN
Pricing: $0.0075–0.015/participant-minute (1M+ minutes/month tier). Coverage: 200+ countries, edge nodes. Best for: Live streaming, low-latency broadcast. Drawback: Vendor lock-in; expensive per-participant model for long-duration calls.
Daily: Video-First SaaS
Pricing: $0.006–0.025/participant-minute. Strength: Developer-friendly API, generous free tier (100 hours/month). Best for: Startups, enterprise teams. Niche: Recorded webinars, async video.
100ms: Enterprise-Grade Features
Pricing: Custom, starting at $500/month. Built on: Mediasoup. Strengths: Live streaming, interactive webinars, breakout rooms, custom layouts. Best for: Enterprise events, education platforms.
Dyte: Emerging SFU Option
Pricing: $0.004–0.015/participant-minute. Built on: Mediasoup. Positioning: Cost-competitive with strong LiveKit integration. Best for: Cost-sensitive teams, interactive webinars.
Twilio: Traditional CPaaS
Pricing: $0.04–0.10/participant-minute (expensive). Strength: Unified SMS + video + voice API. Best for: Integrated customer communication (support calls + SMS follow-up).
LiveKit Cloud: Self-Hosted Alternative
Pricing: $0.0075/participant-minute or self-host free (open-source). Hybrid: Option to deploy on your own cloud. Best for: Teams wanting control without building from scratch.
AI Agents in Video Conferencing: Transcription, Sentiment, and Recording
Modern video platforms embed AI agents for real-time transcription (speech-to-text), sentiment analysis, meeting summaries, and live translations. These agents require either MCU recording or access to raw media streams.
Architecture Options for AI Agents
Option 1: MCU Recording + Async Processing Record via MCU, pipe output to cloud transcription (Google Speech-to-Text, Deepgram, Rev.ai). Easiest; adds 50–200ms latency for recording. Cost: $2–5/hour transcription + $2–5/hour MCU compute.
Option 2: SFU + Real-Time Agent Bot Agent joins as a participant, receives live streams via SFU, transcribes on-the-fly. Harder; real-time processing requires low latency. Cost: agent compute (GPT-4 @ ~$0.03/minute) + SFU bandwidth.
Case Study: BrainCert's Scaling Journey (P2P → Hybrid → SFU)
BrainCert, an online learning platform with 100K concurrent users, faced a classic growth problem: their initial P2P implementation scaled to 4 users. When classroom sizes exceeded 10, latency exploded to 800ms and 30% of calls failed.
The Fix: Hybrid P2P + SFU Escalation
BrainCert implemented automatic escalation: 1:1 calls routed through P2P (50ms latency, zero cost), 2–4 participants in mesh, 5+ in SFU. Result: latency improved to consistent 100–150ms, infrastructure costs dropped 60% (from $5K/month SFU-only to $2K/month hybrid), and user satisfaction (NPS) increased by 15 points. They now handle 500M+ conference minutes annually.
5-Question Decision Framework
Ask these five questions to narrow your architecture choice:
- What's your largest call size?
- 1–4: P2P is viable
- 5–50: SFU is standard
- 50+: SFU + MCU for recording
- Do you need recording or broadcast?
- No: Pure SFU suffices
- Yes, composited: MCU required
- Yes, raw streams: SFU + storage (S3, Azure Blob)
- What's your latency target?
- Under 100ms: P2P only
- 100–200ms: SFU
- 200–400ms: MCU acceptable if broadcast is the goal
- Do you have budget to build, or buy SaaS?
- Build: Allocate 4–12 months engineering
- Buy SaaS: LiveKit Cloud, Daily.co, Agora (1–2 weeks to launch)
- What's your acceptable monthly infrastructure cost?
- $0–500: P2P + Mesh or low-volume SFU
- $500–2,000: SFU for 50–500 concurrent users
- $2,000+: SFU + MCU + AI agents
5 Architecture Pitfalls That Cost Rewrites
1. Choosing P2P for Expected 10+ Participants You launch with P2P thinking "we'll scale later." User base grows; calls fail at 5 participants. Now you're retrofitting SFU mid-launch. Cost: 3–6 months of engineering, revenue loss during transition.
2. Underestimating TURN Relay Costs Assuming everyone has direct P2P connectivity. In reality, 15–25% of calls hit restrictive corporate/school firewalls and need TURN relay at $0.10–0.20/GB. Budget becomes double your estimate. Solution: Reserve 20% of bandwidth budget for TURN.
3. Building Custom SFU Without Bitrate Adaptation You hand-roll Mediasoup but skip bandwidth estimation and simulcast. Network degrades; calls freeze. Users leave. SFU without adaptation is broken. Solution: Use LiveKit or Janus; don't skip this step.
4. Ignoring Mobile Client Battery Drain SFU works but iOS client decoding 9 streams maxes CPU, drains battery in 30 minutes. Users complain your app is a "battery killer." Solution: Implement receiver-side video constraints; limit decode to 3–4 streams even if SFU sends 10.
5. Vendor Lock-In with CPaaS You pick Agora early. By year 2, pricing per-participant-minute becomes expensive; you want to self-host. But your client SDKs are deeply integrated with Agora's API. Rip-and-replace takes 6 months. Solution: Abstract video conferencing behind an interface; design for switching costs.
Key Performance Indicators: Quality, Reliability, Business
Quality Metrics
- MOS (Mean Opinion Score): Track via Mux or RTCStats. Target: 4.0+. Monitor codec bitrate, jitter, packet loss to predict MOS drops.
- RTT (Round-Trip Time): Latency from you to participant and back. Target: 100ms globally. Anything over 300ms indicates network issues or MCU composition delays.
- Jitter: Variance in packet arrival times. Target: under 20ms. High jitter (50ms+) causes audio stuttering.
- Packet Loss: Percentage of dropped packets. Target: under 1%. Over 2%, video breaks; over 5%, audio unusable.
Reliability Metrics
- Call Setup Time: Time from "join" to first video frame. Target: under 3 seconds. Over 10 seconds, users hang up.
- Call Failure Rate: Percentage of calls that drop within 5 minutes. Target: under 0.5%. Track by geography and network type (LTE vs WiFi).
- Server Uptime: Target: 99.99% (four nines). 0.01% downtime = 52 minutes/year. Use multi-region failover and monitoring.
Business Metrics
- Cost per Concurrent User: Total monthly infrastructure / peak concurrent users. Track by architecture (P2P $0, SFU $3–5, MCU $20–50).
- Cost per Participant-Hour: Total bandwidth + compute / total participant-minutes × 60. Use to benchmark against CPaaS pricing.
- Revenue per Concurrent User: Monthly subscription revenue / peak concurrent users. Target: 5–10× cost-per-user to maintain margin.
When NOT to Build a Custom Video Architecture
You have fewer than 50 concurrent users expected. Overhead of maintaining SFU is unjustified. Pick CPaaS (Daily, LiveKit Cloud) at $0.006–0.015/PPM. Payback period: only 6–12 months of heavy use.
Your product needs to launch in under 6 months. SFU builds take 4–12 months. CPaaS integration takes 2–4 weeks. Go SaaS; optimize later if unit economics require it.
You lack in-house WebRTC expertise. SFU debugging requires deep RTP/SRTP/codec knowledge. Hiring a contractor costs $50K–150K. Train your team or hire permanent staff; expect 12–24 months to proficiency.
Your business model is per-call or per-minute (not subscription). Margin erosion: you pay $0.10–0.20 per person-minute to Agora; you charge users $0.05 to $0.10. Margin: negative. CPaaS only works for high-volume subscriptions or enterprise licenses.
Ready to validate your architecture choice?
Talk through your scope, scale, and timeline with a senior video engineer — no pitch, just clarity.
Frequently Asked Questions
Can I record SFU calls without MCU?
Yes. Attach a WebRTC recorder bot to your SFU; it receives all streams and writes individual files to S3 or muxes them client-side (via ffmpeg). Mediasoup and LiveKit support this. Cost: negligible compute (recorder bot is stateless). Benefit: no MCU overhead; no 200ms additional latency. Drawback: post-processing required to create gallery view.
Is MCU better than SFU for large calls?
No. MCU is 10–20× more expensive and adds latency. Use SFU for 100+ participants; only add MCU if you need composited output (broadcast, gallery view). SFU scales to 1,000+ participants on modest hardware.
Does video conferencing require encryption?
Yes, for compliance (HIPAA, GDPR). WebRTC uses DTLS-SRTP for media encryption by default. Signaling (SDP, ICE) should flow over TLS/HTTPS. If you're handling sensitive data (medical, financial), add end-to-end encryption (E2EE) via libsodium or TweetNaCl, but this breaks server-side processing (MCU recording, transcription).
Can I scale SFU to 10,000 concurrent users?
Theoretically yes; practically, no. Bandwidth becomes the bottleneck. 10,000 users × 1 Mbps average egress = 10 Gbps total network egress. Cost at $0.15/GB: 1.2 PB/hour = $180,000/hour egress alone. Plus compute. Unrealistic. For 10,000+, use CDN-based streaming (RTMP/HLS) instead of real-time WebRTC.
What's the best codec for mobile?
H.264 for decode (hardware accelerated on iOS/Android), VP8 for upload (if bandwidth-constrained). Let WebRTC negotiate; it'll prefer hardware-accelerated paths. Monitor battery drain; if decode is killing battery, reduce remote bitrate via receiver constraints.
Is it safe to use a custom TURN server?
Yes, but with rate-limiting. TURN is a bandwidth relay; misconfigure it and you become a DDoS amplifier. Use coturn (open-source TURN server), add per-peer bandwidth caps, and monitor egress. Cost: $0.05–0.10/GB (cheaper than AWS). Popular for self-hosted deployments.
How do I handle Screen Sharing in SFU?
Add a second media stream (video track) labeled "screen". SFU forwards it like any video; client-side decides to show gallery (camera + screen side-by-side) or full-screen. Bandwidth: screen + camera = 2 Mbps per sharer. Mediasoup and LiveKit handle this natively.
Can I mix video conferencing with streaming (RTMP)?
Yes. Record SFU output, re-encode to RTMP (via ffmpeg), and push to YouTube/Twitch. Or use MCU to composite, then broadcast. One-way is simple; interactive audience participation requires two-way signaling (back-channel WebRTC), which is complex. Most platforms choose one-way broadcast.
Read Next
Dive deeper into implementation, vendor comparisons, and security.
Cost
Video Conferencing App Development Cost
Honest 2026 pricing breakdowns for MVP through enterprise builds.
Architecture
Video Conferencing Software Development: Scalable Solutions Guide
From WebRTC fundamentals to scaling patterns at 1M+ concurrent users.
Vendors
LiveKit vs Agora: Cost & Feature Matrix
Side-by-side comparison of two leading WebRTC platforms.
Security
Video Streaming App Security Features
DRM, encryption, and access control patterns for video apps.
Summary: Choose Your Architecture and Launch
You now have a complete framework for choosing P2P vs MCU vs SFU. P2P delivers sub-100ms latency for 1:4 participants at zero cost. SFU scales to 1,000+ users for $300–500/month at 100–200ms latency. MCU adds broadcast capability but costs $2,000–5,000/month and introduces 200–400ms latency. Use the 5-question decision framework and pitfall checklist to avoid costly rewrites. Fora Soft's video conferencing development team has guided 15+ projects through this decision and can architect your system, from open-source SFU integration to CPaaS API wrapper—get in touch to discuss your architecture roadmap.
Let’s build the right video architecture for your product
Whether you need P2P for 1:1, an SFU for groups, or a hybrid approach, we’ll help you pick and ship.



.avif)

Comments