Video Conferencing Systems Architecture: P2P vs MCU vs SFU

Nov 19, 2025
·
Обновлено
5.6.2026
Video conferencing architecture comparing P2P, MCU, and SFU network topologies for scalability

Key Takeaways

  • P2P delivers the lowest latency for 1-4 participants at 50-100ms with zero server cost, making it ideal for intimate video calls and high-urgency interactions.
  • SFU scales to 1,000+ concurrent users by forwarding unchanged media streams, costing $300–500/month for 100 concurrent users and maintaining 100–200ms latency for superior real-time interaction.
  • MCU provides broadcast-ready output with server-side stream mixing, supporting 10–100 users at 200–400ms latency and costing $2,000–5,000/month, best suited for recordings, streaming, and large-scale events.
  • Mesh topology extends P2P to 6–10 participants with 100–300ms latency by adding peer discovery and topology awareness, eliminating server infrastructure for small group calls.
  • Hybrid P2P+SFU architecture automatically routes calls under 4 participants through P2P for 50ms latency and cost zero, then escalates to SFU for groups, delivering optimal latency and cost efficiency across all call sizes.

Why This Architecture Guide Matters for Your Video Conferencing App

Choosing the wrong video conferencing architecture can cost your startup hundreds of thousands in wasted server infrastructure, kill user experience with 500ms latency, or lock you into a CPaaS vendor with 40% monthly recurring costs. Fora Soft's video conferencing development team has guided 15+ clients through this decision, from BrainCert's 100K concurrent users on SFU to CirrusMED's HIPAA-compliant telemedicine platform. This guide gives you the technical and financial frameworks to select your architecture, estimate costs with precision, and avoid the 5 most common pitfalls that force expensive mid-project rewrites.

AI emotion

AI Emotion Detection in Video Conferences: 2026 Architecture, Use Cases & EU AI Act Playbook

What AI emotion detection actually does in video conferences — and how to evaluate it.

FAQ

FAQ: Emotion Recognition in Video Conferencing in 2026

How emotion recognition in video conferencing works — production-grade FAQ.

AI features

AI-Driven Video Conferencing in 2026: Buyer’s and Builder’s Guide

Transform video calls with AI-driven conferencing solutions — what to look for.

Features

AI Video Conferencing Features: 12 Smart Tools Revolutionizing Virtual Meetings in 2026

Twelve smart AI tools that are revolutionising virtual meetings.

NLP

Enhancing Video Calls With AI Language Processing

Enhancing video calls with AI language processing — translation, summaries, action items.

Architecting your video conferencing system?

Book a 30-min architecture call with our WebRTC team to compare P2P vs MCU vs SFU for your specific use case.

Book a 30-min call → WhatsApp → Email us →

Quick Answer: Which Architecture for Your Call Size?

Use this decision tree to identify your starting point. When evaluating P2P vs MCU vs SFU for your scale, You'll dive deeper into each architecture in the sections below, but here's the 60-second version:

  • 1–4 users: P2P via WebRTC with STUN/TURN. Zero server cost, 50–100ms latency, ideal for 1:1 calls and high-fidelity audio.
  • 5–10 users: Mesh topology (P2P with peer discovery) or SFU with media forwarding. 100–300ms latency, minimal server overhead.
  • 11–100 users: SFU is your workhorse. $300–500/month for 100 concurrent, 100–200ms latency, no server-side mixing overhead.
  • 100+ users or recording/broadcast: SFU for ingestion, MCU for output mixing and broadcast. $2,000–5,000/month, 200–400ms latency, professional-grade output.

The WebRTC Foundation: getUserMedia, RTCPeerConnection, and Signaling

Every video conferencing architecture in this guide—P2P, SFU, MCU—rests on three WebRTC primitives that handle media capture, peer connection management, and real-time data exchange.

getUserMedia: Capture and Constraints

navigator.mediaDevices.getUserMedia() grants access to the user's camera and microphone. Specify constraints for resolution, frame rate, and audio echo cancellation:

const constraints = {
  video: { width: 1280, height: 720, frameRate: 30 },
  audio: { echoCancellation: true, noiseSuppression: true }
};
const stream = await navigator.mediaDevices.getUserMedia(constraints);

On mobile or bandwidth-constrained networks, reduce to 360p@15fps to cut bandwidth by 75% while maintaining acceptable quality (MOS 3.5–4.0).

RTCPeerConnection: The Media Pipeline

RTCPeerConnection manages the actual media transport. Add audio and video tracks from your local stream, then connect to a remote peer:

const peerConnection = new RTCPeerConnection();
const localStream = await navigator.mediaDevices.getUserMedia(constraints);
localStream.getTracks().forEach(track => {
  peerConnection.addTrack(track, localStream);
});
peerConnection.ontrack = event => {
  remoteVideo.srcObject = event.streams[0];
};

RTCPeerConnection also handles codec negotiation (VP8 for bandwidth efficiency, H.264 for hardware decode, AV1 for next-gen compression), bandwidth adaptation, and jitter buffer management automatically.

Signaling: The Invisible Choreographer

WebRTC itself handles media transport, but signaling—the exchange of SDP offers/answers and ICE candidates—requires your application layer. Use WebSocket, Firebase, or a dedicated signaling service to exchange connection metadata:

// Initiator creates an offer
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
signaling.send({ type: 'offer', sdp: offer.sdp });

// Receiver gets offer and sends answer
peerConnection.onicecandidate = event => {
  if (event.candidate) {
    signaling.send({ type: 'candidate', candidate: event.candidate });
  }
};

ICE, STUN, and TURN: Crossing NAT Barriers

Most users sit behind a NAT (Network Address Translator). ICE (Interactive Connectivity Establishment) discovers public IP addresses using STUN servers (free, usually Google's 8.8.8.8), and if direct P2P fails, TURN servers relay traffic (typically $0.10–0.20/GB on AWS or Twilio). Budget 20–40% of your bandwidth costs for TURN relay.

P2P (Peer-to-Peer): The Latency Champion

P2P connects two participants directly without any server in the path. Peer A sends video directly to Peer B; Peer B sends directly back. No mixing, no forwarding, no processing.

Performance Profile

  • Max concurrent participants: 2–4 (rarely more; 5+ introduces exponential complexity)
  • Latency: 50–100ms (end-to-end), dominated by network RTT and codec processing
  • Bandwidth per peer: 1 Mbps upload, 1 Mbps download (full video quality)
  • Server cost: $0/month (only STUN/TURN as backup)
  • Best use cases: 1:1 calls, high-touch sales calls, healthcare consults, therapy sessions

Why P2P Fails at Scale

In a 5-person call with full mesh P2P, each peer sends and receives 4 video streams simultaneously. Peer A uploads its own stream (1 Mbps) plus downloads from B, C, D, E (4 Mbps total = 5 Mbps ingress). Mobile users on LTE with 2–3 Mbps available are dead. CPU usage decoding 4 simultaneous H.264 streams maxes out even modern phones. The solution is SFU or MCU.

Mesh Topology: P2P with Peer Discovery and Awareness

Mesh is a hybrid between pure P2P and server-forwarded architectures. Peers still send directly to each other, but a lightweight server maintains peer discovery and topology awareness—who is connected, who left, optimal relay paths.

Performance Profile

  • Max concurrent participants: 6–10 (topology awareness limits practical group size)
  • Latency: 100–300ms (direct P2P + potential relay hops)
  • Bandwidth per peer: 1 Mbps upload + (N-1) × 1 Mbps download
  • Server cost: $50–150/month (low-compute signaling and relay)
  • Best use cases: Small group calls (4–8 people), remote team standups, distributed training sessions

Open-Source Mesh Frameworks

Jitsi: Electron desktop app with optional server-side forwarding. Runs on-premises; zero cloud costs if self-hosted. Popular in education and healthcare.

BigBlueButton: Open-source webinar platform built on Jitsi. Scales to 100+ participants with screen sharing and breakout rooms. $500–2,000/month to self-host.

SFU (Selective Forwarding Unit): The Industry Standard

An SFU is a media server that receives streams from all participants and forwards each to every other participant, without decoding or mixing. It's selective: Peer A sends one VP8 stream; the SFU forwards that same stream to B, C, D without re-encoding. This separation of concerns—low server CPU, independent bitrate adaptation per peer—makes SFU the de facto standard for video conferencing.

How SFU Scales: Bandwidth Arithmetic

In a 10-person call on SFU:

  • Each peer uploads 1 Mbps (its own stream).
  • Each peer downloads 9 Mbps (streams from 9 others).
  • SFU ingests 10 Mbps total, egresses 90 Mbps (10 streams × 9 recipients).

On AWS, egress costs $0.15/GB. At 90 Mbps continuous (worst case), a 1-hour session = 675 MB × $0.15 = $0.10 per person-hour. For 100 concurrent users, egress is 9,000 Mbps = 101 GB/hour = $15.15/hour in bandwidth alone. Add server compute: a medium EC2 instance (c5.2xlarge, 8 vCPU) handles 20–40 concurrent users and costs $0.34/hour. For 100 users, you need 3 instances = $1.02/hour. Total cost: $15–20/hour or $360–480/month for 24/7 operation. Real platforms batch this: Daily.co charges $0.006/participant-minute for SFU, or roughly $300/month for 100 concurrent users at full occupancy.

Performance Profile

  • Max concurrent participants: 5–1,000+ (bandwidth and server CPU are the only limits)
  • Latency: 100–200ms (forward delay through SFU + network RTT)
  • Bandwidth cost: $0.10–0.20/person-hour egress
  • Compute cost: $300–500/month for 100 concurrent users
  • Best use cases: Team meetings, webinars, online classes, social video platforms

Open-Source SFU Frameworks

Mediasoup (C++ / Node.js) is the heavyweight champion. Used by Dyte, 100ms, and GetStream. Supports full custom bitrate adaptation, screen sharing, and simulcast (multiple quality tiers per peer). Learning curve is steep; expect 3–6 months of engineering to integrate. LiveKit (Go, Apache 2.0) wraps Mediasoup with a simpler API and dashboard. Deploy on your own infrastructure or use LiveKit Cloud SaaS ($0.0075/participant-minute). Janus (C) is modular and lightweight but older; good for embedded or legacy systems. ion-sfu (Go) is newer and Kubernetes-native; used by GetStream. Pion (Go library) is the most minimal—use it to build your own SFU from scratch if you want full control and are willing to spend 6–12 months on development.

MCU (Multipoint Control Unit): Broadcast-Ready Mixing

An MCU decodes all incoming streams, composites them into a single high-quality output, and re-encodes for distribution. It's used for recording, broadcasting, or when you need a single "gallery view" feed for 50+ participants.

MCU Bandwidth and Cost Calculation

In a 10-person call with recording via MCU:

  • Each peer uploads 1 Mbps (10 Mbps total ingress to MCU).
  • MCU creates a 2x2 or 3x3 grid, encodes to 4 Mbps composite.
  • Each peer downloads the composite (4 Mbps × 10 = 40 Mbps total egress).
  • MCU CPU: decoding 10 streams + compositing + re-encoding = massive CPU load (100–300% higher than SFU).

For 100 concurrent users with MCU recording: 100 Mbps egress = 1,125 GB/hour. At $0.15/GB = $169/hour bandwidth alone. Compute for MCU: a GPU-accelerated instance (g3s.xlarge on AWS with NVIDIA Tesla M60) costs $3.06/hour and handles 8–10 concurrent MCU sessions. For 10 MCU sessions, you need ~1–2 GPU instances = $6–12/hour compute. Total MCU cost: $175–180/hour or $2,000–5,000/month for 24/7 operation.

Performance Profile

  • Max concurrent participants: 10–100 (limited by MCU compositing CPU)
  • Latency: 200–400ms (decode, composite, re-encode pipeline)
  • Bandwidth cost: $0.15–0.30/person-hour egress
  • Compute cost: $2,000–5,000/month for 100 concurrent users
  • Best use cases: Recording, broadcasting, webinars with single gallery view, TV-like production

Hybrid Architecture: P2P + SFU Automatic Escalation

The smartest deployment uses P2P for small calls (zero cost, 50ms latency) and automatically escalates to SFU once group size exceeds 4 participants. This is what BrainCert implemented and it reduced their infrastructure costs by 60% while improving latency perception.

Implementation Logic

if (participantCount <= 4) {
  // Direct P2P mesh
  mode = 'p2p';
  each peer connects directly to all others
} else if (participantCount <= 50) {
  // SFU with VP8 simulcast (3 quality tiers)
  mode = 'sfu';
  router = mediasoup router with simulcast layers
} else {
  // SFU + optional MCU for recording
  mode = 'sfu_plus_record';
  mcu_instance = spawn GPU for composite recording
}

Hybrid Cost Profile

  • 1–4 users: $0/month (P2P, STUN/TURN minimal)
  • 5–50 users: $300–500/month (SFU only)
  • 50–100 users: $500–1,000/month (SFU) + $2,000/month (MCU for recording)
  • Latency: 50ms (1:1 P2P) → 150ms (SFU 5–50) → 300ms (MCU composite)

Architecture Comparison Matrix

This P2P vs MCU vs SFU comparison matrix breaks down Here's the complete comparison at a glance. Use this to shortlist your architecture before diving into implementation details.

Architecture Max Concurrent Users Server Cost/Month (100 Users) Upload Bandwidth/Peer Latency Best For
P2P 2–4 $0 1 Mbps 50–100ms 1:1 calls, high-urgency interaction
Mesh 6–10 $50–150 1 Mbps + relay 100–300ms Small groups, team standups
SFU 5–1,000+ $300–500 1 Mbps 100–200ms Team meetings, webinars, social platforms
MCU 10–100 $2,000–5,000 1 Mbps 200–400ms Recording, broadcast, gallery view
Hybrid (P2P + SFU) Unlimited $0–500 (dynamic) 1 Mbps 50–200ms (dynamic) All use cases, cost-optimal

Need a second opinion on SFU vs MCU?

Our engineers have shipped SFU-based platforms handling 500M+ minutes. Get a free architecture review.

Book a 30-min call → WhatsApp → Email us →

Bandwidth Mathematics: Calculating Real Costs

Use these formulas to estimate bandwidth costs before architecting.

P2P / SFU Egress Formula

Egress = (N participants) × (N-1 streams) × (bitrate Mbps)
For a 50-person call, bitrate 1 Mbps:
Egress = 50 × 49 × 1 = 2,450 Mbps = 281.25 GB/hour
Cost @ $0.15/GB = $42.19/hour

MCU Egress Formula (Single Composite)

Egress = (N participants) × (composite bitrate Mbps)
For 50-person call, 4 Mbps composite output:
Egress = 50 × 4 = 200 Mbps = 22.5 GB/hour
Cost @ $0.15/GB = $3.38/hour

But CPU cost dominates: 2× GPU instance = $6–12/hour

TURN Relay Cost

For 20% of calls that require TURN relay (users behind restrictive NAT):

TURN Cost = (Egress Mbps) × 0.20 × (TURN rate $/GB)
TURN rate on AWS: $0.10/GB (vs $0.15 public egress)
For 2,450 Mbps SFU call:
TURN @ 20% = 490 Mbps = 56.25 GB/hour
Cost = 56.25 × $0.10 = $5.63/hour

Latency vs. Quality: The MOS Score Tradeoff

Mean Opinion Score (MOS) measures perceived quality on a 1–5 scale. RTCStats and Mux provide MOS via machine learning on codec metrics. Below 3.0 MOS, users perceive degradation; below 2.5, the call is unusable.

Latency Impact on MOS

  • 50–150ms: Imperceptible delay, natural conversation flow. MOS 4.5–5.0.
  • 150–300ms: Noticeable but acceptable for most calls. Slight "talk-over" effect. MOS 3.5–4.5.
  • 300–500ms: Unnatural. Telephone-like quality. MOS 2.5–3.5. Users adapt but dislike it.
  • 500ms+: Conversation breaks down. One speaker at a time. MOS below 2.5.

Codec Selection for Quality

VP8: Open-source, royalty-free. Bandwidth-efficient at 500–2,000 kbps. Default for SFU forwarding. 90% of WebRTC deployments.

H.264: Hardware-accelerated decode on mobile/desktop. More efficient at 800–2,500 kbps. Patent-encumbered but widely licensed. Better for battery-constrained mobile.

AV1: Next-generation, 30% smaller files than VP8. Limited hardware support until 2025. Expect AV1-native webinars by 2026.

Open-Source SFU Options: Mediasoup, LiveKit, Janus, and More

Building your own SFU from open-source gives you full control but requires 3–12 months of engineering. Here's the landscape of production-ready frameworks.

Mediasoup: The Gold Standard

Language: C++ (libmediasoup) + Node.js. License: ISC (permissive). Adoption: Used by Dyte, 100ms, and GetStream. Strengths: Full bitrate adaptation, simulcast with quality tiers, VP8/H.264/AV1 support, SVC (scalable video coding), built-in bandwidth estimation. Learning curve: Steep. Requires understanding of RTP, codec internals, network adaptation. Time to production: 4–6 months for a team of 2–3 engineers.

LiveKit: Wrapper with Managed Cloud

Language: Go (server) + client SDKs. License: Apache 2.0. Adoption: Rapid growth, used by GetStream and emerging startups. Strengths: Simpler API, dashboard UI, Kubernetes-native deployment, managed LiveKit Cloud ($0.0075/participant-minute). Trade-off: Less control than Mediasoup for advanced use cases. Time to production: 2–3 months to self-host, 1 week to integrate LiveKit Cloud SaaS.

Janus: Modular and Lightweight

Language: C. License: GPLv3 (requires derivative works to be open-source). Adoption: Mature ecosystem; popular in Europe and education. Strengths: Plugin architecture, low CPU footprint, works on ARM (Raspberry Pi). Trade-off: Older codebase; less active development than Mediasoup. Time to production: 2–4 months.

ion-sfu and Pion: Go-Native Frameworks

ion-sfu (GitHub: pion/ion-sfu) is a pure Go SFU with WebRTC-only support (no RTMP broadcast). Kubernetes-ready, cloud-native. Used by GetStream. Pion is the low-level library—use it if you want maximum flexibility and are willing to spend 6–12 months on development. Time to production: 4–8 months (ion-sfu), 12+ months (Pion from scratch).

CPaaS Platforms: Agora, Daily, 100ms, Dyte, Twilio, LiveKit Cloud

Communications Platform as a Service (CPaaS) vendors offer managed SFU infrastructure with SDKs, so you skip building your own. Trade flexibility for speed-to-market.

Agora: High-Volume, Global CDN

Pricing: $0.0075–0.015/participant-minute (1M+ minutes/month tier). Coverage: 200+ countries, edge nodes. Best for: Live streaming, low-latency broadcast. Drawback: Vendor lock-in; expensive per-participant model for long-duration calls.

Daily: Video-First SaaS

Pricing: $0.006–0.025/participant-minute. Strength: Developer-friendly API, generous free tier (100 hours/month). Best for: Startups, enterprise teams. Niche: Recorded webinars, async video.

100ms: Enterprise-Grade Features

Pricing: Custom, starting at $500/month. Built on: Mediasoup. Strengths: Live streaming, interactive webinars, breakout rooms, custom layouts. Best for: Enterprise events, education platforms.

Dyte: Emerging SFU Option

Pricing: $0.004–0.015/participant-minute. Built on: Mediasoup. Positioning: Cost-competitive with strong LiveKit integration. Best for: Cost-sensitive teams, interactive webinars.

Twilio: Traditional CPaaS

Pricing: $0.04–0.10/participant-minute (expensive). Strength: Unified SMS + video + voice API. Best for: Integrated customer communication (support calls + SMS follow-up).

LiveKit Cloud: Self-Hosted Alternative

Pricing: $0.0075/participant-minute or self-host free (open-source). Hybrid: Option to deploy on your own cloud. Best for: Teams wanting control without building from scratch.

AI Agents in Video Conferencing: Transcription, Sentiment, and Recording

Modern video platforms embed AI agents for real-time transcription (speech-to-text), sentiment analysis, meeting summaries, and live translations. These agents require either MCU recording or access to raw media streams.

Architecture Options for AI Agents

Option 1: MCU Recording + Async Processing Record via MCU, pipe output to cloud transcription (Google Speech-to-Text, Deepgram, Rev.ai). Easiest; adds 50–200ms latency for recording. Cost: $2–5/hour transcription + $2–5/hour MCU compute.

Option 2: SFU + Real-Time Agent Bot Agent joins as a participant, receives live streams via SFU, transcribes on-the-fly. Harder; real-time processing requires low latency. Cost: agent compute (GPT-4 @ ~$0.03/minute) + SFU bandwidth.

Case Study: BrainCert's Scaling Journey (P2P → Hybrid → SFU)

BrainCert, an online learning platform with 100K concurrent users, faced a classic growth problem: their initial P2P implementation scaled to 4 users. When classroom sizes exceeded 10, latency exploded to 800ms and 30% of calls failed.

The Fix: Hybrid P2P + SFU Escalation

BrainCert implemented automatic escalation: 1:1 calls routed through P2P (50ms latency, zero cost), 2–4 participants in mesh, 5+ in SFU. Result: latency improved to consistent 100–150ms, infrastructure costs dropped 60% (from $5K/month SFU-only to $2K/month hybrid), and user satisfaction (NPS) increased by 15 points. They now handle 500M+ conference minutes annually.

5-Question Decision Framework

Ask these five questions to narrow your architecture choice:

  1. What's your largest call size?
    • 1–4: P2P is viable
    • 5–50: SFU is standard
    • 50+: SFU + MCU for recording
  2. Do you need recording or broadcast?
    • No: Pure SFU suffices
    • Yes, composited: MCU required
    • Yes, raw streams: SFU + storage (S3, Azure Blob)
  3. What's your latency target?
    • Under 100ms: P2P only
    • 100–200ms: SFU
    • 200–400ms: MCU acceptable if broadcast is the goal
  4. Do you have budget to build, or buy SaaS?
    • Build: Allocate 4–12 months engineering
    • Buy SaaS: LiveKit Cloud, Daily.co, Agora (1–2 weeks to launch)
  5. What's your acceptable monthly infrastructure cost?
    • $0–500: P2P + Mesh or low-volume SFU
    • $500–2,000: SFU for 50–500 concurrent users
    • $2,000+: SFU + MCU + AI agents

5 Architecture Pitfalls That Cost Rewrites

1. Choosing P2P for Expected 10+ Participants You launch with P2P thinking "we'll scale later." User base grows; calls fail at 5 participants. Now you're retrofitting SFU mid-launch. Cost: 3–6 months of engineering, revenue loss during transition.

2. Underestimating TURN Relay Costs Assuming everyone has direct P2P connectivity. In reality, 15–25% of calls hit restrictive corporate/school firewalls and need TURN relay at $0.10–0.20/GB. Budget becomes double your estimate. Solution: Reserve 20% of bandwidth budget for TURN.

3. Building Custom SFU Without Bitrate Adaptation You hand-roll Mediasoup but skip bandwidth estimation and simulcast. Network degrades; calls freeze. Users leave. SFU without adaptation is broken. Solution: Use LiveKit or Janus; don't skip this step.

4. Ignoring Mobile Client Battery Drain SFU works but iOS client decoding 9 streams maxes CPU, drains battery in 30 minutes. Users complain your app is a "battery killer." Solution: Implement receiver-side video constraints; limit decode to 3–4 streams even if SFU sends 10.

5. Vendor Lock-In with CPaaS You pick Agora early. By year 2, pricing per-participant-minute becomes expensive; you want to self-host. But your client SDKs are deeply integrated with Agora's API. Rip-and-replace takes 6 months. Solution: Abstract video conferencing behind an interface; design for switching costs.

Key Performance Indicators: Quality, Reliability, Business

Quality Metrics

  • MOS (Mean Opinion Score): Track via Mux or RTCStats. Target: 4.0+. Monitor codec bitrate, jitter, packet loss to predict MOS drops.
  • RTT (Round-Trip Time): Latency from you to participant and back. Target: 100ms globally. Anything over 300ms indicates network issues or MCU composition delays.
  • Jitter: Variance in packet arrival times. Target: under 20ms. High jitter (50ms+) causes audio stuttering.
  • Packet Loss: Percentage of dropped packets. Target: under 1%. Over 2%, video breaks; over 5%, audio unusable.

Reliability Metrics

  • Call Setup Time: Time from "join" to first video frame. Target: under 3 seconds. Over 10 seconds, users hang up.
  • Call Failure Rate: Percentage of calls that drop within 5 minutes. Target: under 0.5%. Track by geography and network type (LTE vs WiFi).
  • Server Uptime: Target: 99.99% (four nines). 0.01% downtime = 52 minutes/year. Use multi-region failover and monitoring.

Business Metrics

  • Cost per Concurrent User: Total monthly infrastructure / peak concurrent users. Track by architecture (P2P $0, SFU $3–5, MCU $20–50).
  • Cost per Participant-Hour: Total bandwidth + compute / total participant-minutes × 60. Use to benchmark against CPaaS pricing.
  • Revenue per Concurrent User: Monthly subscription revenue / peak concurrent users. Target: 5–10× cost-per-user to maintain margin.

When NOT to Build a Custom Video Architecture

You have fewer than 50 concurrent users expected. Overhead of maintaining SFU is unjustified. Pick CPaaS (Daily, LiveKit Cloud) at $0.006–0.015/PPM. Payback period: only 6–12 months of heavy use.

Your product needs to launch in under 6 months. SFU builds take 4–12 months. CPaaS integration takes 2–4 weeks. Go SaaS; optimize later if unit economics require it.

You lack in-house WebRTC expertise. SFU debugging requires deep RTP/SRTP/codec knowledge. Hiring a contractor costs $50K–150K. Train your team or hire permanent staff; expect 12–24 months to proficiency.

Your business model is per-call or per-minute (not subscription). Margin erosion: you pay $0.10–0.20 per person-minute to Agora; you charge users $0.05 to $0.10. Margin: negative. CPaaS only works for high-volume subscriptions or enterprise licenses.

Ready to validate your architecture choice?

Talk through your scope, scale, and timeline with a senior video engineer — no pitch, just clarity.

Book a 30-min call → WhatsApp → Email us →

Frequently Asked Questions

Can I record SFU calls without MCU?

Yes. Attach a WebRTC recorder bot to your SFU; it receives all streams and writes individual files to S3 or muxes them client-side (via ffmpeg). Mediasoup and LiveKit support this. Cost: negligible compute (recorder bot is stateless). Benefit: no MCU overhead; no 200ms additional latency. Drawback: post-processing required to create gallery view.

Is MCU better than SFU for large calls?

No. MCU is 10–20× more expensive and adds latency. Use SFU for 100+ participants; only add MCU if you need composited output (broadcast, gallery view). SFU scales to 1,000+ participants on modest hardware.

Does video conferencing require encryption?

Yes, for compliance (HIPAA, GDPR). WebRTC uses DTLS-SRTP for media encryption by default. Signaling (SDP, ICE) should flow over TLS/HTTPS. If you're handling sensitive data (medical, financial), add end-to-end encryption (E2EE) via libsodium or TweetNaCl, but this breaks server-side processing (MCU recording, transcription).

Can I scale SFU to 10,000 concurrent users?

Theoretically yes; practically, no. Bandwidth becomes the bottleneck. 10,000 users × 1 Mbps average egress = 10 Gbps total network egress. Cost at $0.15/GB: 1.2 PB/hour = $180,000/hour egress alone. Plus compute. Unrealistic. For 10,000+, use CDN-based streaming (RTMP/HLS) instead of real-time WebRTC.

What's the best codec for mobile?

H.264 for decode (hardware accelerated on iOS/Android), VP8 for upload (if bandwidth-constrained). Let WebRTC negotiate; it'll prefer hardware-accelerated paths. Monitor battery drain; if decode is killing battery, reduce remote bitrate via receiver constraints.

Is it safe to use a custom TURN server?

Yes, but with rate-limiting. TURN is a bandwidth relay; misconfigure it and you become a DDoS amplifier. Use coturn (open-source TURN server), add per-peer bandwidth caps, and monitor egress. Cost: $0.05–0.10/GB (cheaper than AWS). Popular for self-hosted deployments.

How do I handle Screen Sharing in SFU?

Add a second media stream (video track) labeled "screen". SFU forwards it like any video; client-side decides to show gallery (camera + screen side-by-side) or full-screen. Bandwidth: screen + camera = 2 Mbps per sharer. Mediasoup and LiveKit handle this natively.

Can I mix video conferencing with streaming (RTMP)?

Yes. Record SFU output, re-encode to RTMP (via ffmpeg), and push to YouTube/Twitch. Or use MCU to composite, then broadcast. One-way is simple; interactive audience participation requires two-way signaling (back-channel WebRTC), which is complex. Most platforms choose one-way broadcast.

Dive deeper into implementation, vendor comparisons, and security.

Cost

Video Conferencing App Development Cost

Honest 2026 pricing breakdowns for MVP through enterprise builds.

Architecture

Video Conferencing Software Development: Scalable Solutions Guide

From WebRTC fundamentals to scaling patterns at 1M+ concurrent users.

Vendors

LiveKit vs Agora: Cost & Feature Matrix

Side-by-side comparison of two leading WebRTC platforms.

Security

Video Streaming App Security Features

DRM, encryption, and access control patterns for video apps.

Summary: Choose Your Architecture and Launch

You now have a complete framework for choosing P2P vs MCU vs SFU. P2P delivers sub-100ms latency for 1:4 participants at zero cost. SFU scales to 1,000+ users for $300–500/month at 100–200ms latency. MCU adds broadcast capability but costs $2,000–5,000/month and introduces 200–400ms latency. Use the 5-question decision framework and pitfall checklist to avoid costly rewrites. Fora Soft's video conferencing development team has guided 15+ projects through this decision and can architect your system, from open-source SFU integration to CPaaS API wrapper—get in touch to discuss your architecture roadmap.

Let’s build the right video architecture for your product

Whether you need P2P for 1:1, an SFU for groups, or a hybrid approach, we’ll help you pick and ship.

Book a 30-min call → WhatsApp → Email us →

  • Technologies

Comments

Type in your message
Thank you for comment
Refresh the page to see it
Cообщение не отправлено, что-то пошло не так при отправке формы. Попробуйте еще раз.
e-learning-software-development-how-to
Jayempire
9.10.2024
Cool
simulate-slow-network-connection-57
Samrat Rajput
27.7.2024
The Redmi 9 Power boasts a 6000mAh battery, an AI quad-camera setup with a 48MP primary sensor, and a 6.53-inch FHD+ display. It is powered by a Qualcomm Snapdragon 662 processor, offering a balance of performance and efficiency. The phone also features a modern design with a textured back and is available in multiple color options.
how-to-implement-rabbitmq-delayed-messages-with-code-examples-1214
Ali
9.4.2024
this is defenetely what i was looking for. thanks!
how-to-implement-screen-sharing-in-ios-1193
liza
25.1.2024
Can you please provide example for flutter as well . I'm having issue to screen share in IOS flutter.
guide-to-software-estimating-95
Nikolay Sapunov
10.1.2024
Thank you Joy! Glad to be helpful :)
guide-to-software-estimating-95
Joy Gomez
10.1.2024
I stumbled upon this guide from Fora Soft while looking for insights into making estimates for software development projects, and it didn't disappoint. The step-by-step breakdown and the inclusion of best practices make it a valuable resource. I'm already seeing positive changes in our estimation accuracy. Thanks for sharing your expertise!
free-axure-wireframe-kit-1095
Harvey
15.1.2024
Please, could you fix the Kit Download link?. Many Thanks in advance.
Fora Soft Team
15.1.2024
We fixed the link, now the library is available for download! Thanks for your comment
how-to-implement-screen-sharing-in-ios-1193
grebulon
3.1.2024
Do you have the source code for download?
mobytap-testimonial-on-software-development-563
Naseem
3.1.2024
Meri jaa naseem
what-is-done-during-analytical-stage-of-software-development-1066
7
2.1.2024
7
how-to-make-a-custom-android-call-notification-455
Hadi
28.11.2023
Could you share full code? Could you consider adding ringing sound when notification arrives ?

Similar articles

Black arrow icon (pointing left)Black arrow icon (pointing right)
Describe your project and we will get in touch
Enter your message
Enter your email
Enter your name

By submitting data in this form, you agree with the Personal Data Processing Policy.

Your message has been sent successfully
We will contact you soon
Message not sent. Please try again.