Kurento Media Server: Everything You Need To Know — cover illustration

Key takeaways

Kurento is a low-level media pipeline server, not a conferencing SDK. Use it when you need programmable media graphs, computer vision or custom RTP bridges — not when you just want rooms and dial-ins.

It is alive but niche in 2026. Kurento 7.x still ships, OpenVidu 3 is active, but the ecosystem is smaller than mediasoup, LiveKit or Janus — pick it for capability, not momentum.

Expect ~20 SFU sessions per CPU core as a working baseline. A 16 vCPU box realistically holds ~300 participants at 720p/30fps before you need to scale horizontally.

No Simulcast, SVC, AV1 or H.265. Kurento supports H.264, VP8, VP9 and OPUS. If your product leans on Simulcast or AV1, pick mediasoup, LiveKit or Janus instead.

Apache 2.0 license, you own the infra. Typical monthly infra for a 100-user group-video product sits in the $150–$600 range on Hetzner/AWS, before engineer time.

Why Fora Soft wrote this playbook

Fora Soft has shipped WebRTC, streaming and real-time video products since 2005. We have run Kurento, Janus, mediasoup, Jitsi Videobridge and LiveKit in production, paid the AWS egress bills and carried the pager for each of them. This guide is the opinion we actually give to founders and CTOs who ask us, in a 30-minute call, whether Kurento is the right pick for their 2026 roadmap.

We ship with Agent Engineering, which means our senior engineers drive AI agents to generate, review and test Kurento pipelines, OpenVidu rooms and Node.js signaling servers in parallel. That compresses classical SFU backend setup from 10–12 weeks to 4–6 weeks on a typical MVP, including recording, TURN and load tests. See how we do it on our WebRTC development service page and in cases like BrainCert, where we built a virtual classroom with live video, whiteboarding and multi-tenant routing.

The article does three things: it tells you what Kurento really is under the hood, it compares it honestly against modern alternatives, and it gives you a five-question decision framework so you stop arguing about “which media server is best” and start arguing about your actual product constraints.

Stuck choosing between Kurento, mediasoup and LiveKit?

Book a 30-minute scoping call and we’ll map your product to the media server that actually fits — no upsell, no fluff.

Book a 30-min call → WhatsApp → Email us →

Kurento in one look

Kurento is an open-source WebRTC media server that lets you build media pipelines out of programmable elements: WebRTC endpoints, recorders, RTSP players, mixers and computer-vision filters. It is written in C++ on top of GStreamer, released under Apache 2.0, and controlled from your application through a JSON-RPC protocol (Java, Node.js and Python clients exist).

Unlike SDK-first products such as LiveKit, Agora or Twilio, Kurento is not a “drop-in conferencing API”. You write the signaling yourself, design the pipeline topology yourself and decide when to tear rooms down. That flexibility is the whole point: when your product needs a face filter, an AR overlay, a QR code scanner in the stream, or a bridge between WebRTC and an IP camera, you can do it inside the same server that is already forwarding RTP.

The project was acquired by Twilio in 2016. After a period of uncertainty the core team kept it maintained, and as of 2026 it still ships point releases (7.x line). OpenVidu, a higher-level room framework from the same authors, remains the most common way teams adopt Kurento without writing the pipeline code by hand.

Should you use Kurento in 2026

Short answer: only if you need what Kurento uniquely does — programmable media pipelines with built-in computer vision and RTP bridging. For generic “we need rooms with chat and recording”, newer tools are faster to ship and cheaper to operate.

The honest 2026 positioning looks like this:

  • Pick Kurento when you need custom server-side media processing — AR filters, face/QR/license-plate detection, live compositing, audio DSP, bridging WebRTC to SIP/RTSP/RTMP.
  • Pick mediasoup when you need the fastest, most scalable pure SFU in Node.js with Simulcast and good clustering primitives.
  • Pick LiveKit when you want a batteries-included SDK-first stack and can live with their abstraction.
  • Pick Janus when you need a mature plugin architecture with strong SIP, streaming and data-channel plugins.
  • Pick Jitsi Videobridge when you want a proven MCU-style conferencing backend with LastN routing.
  • Pick a SaaS (Agora / Twilio / 100ms / Daily) when you value time-to-market over per-minute economics.

Reach for Kurento when: your product needs server-side CV / AR / transcoding / protocol bridging, your team is comfortable with C++/GStreamer debugging, and you can invest 4–8 weeks on pipeline design before hitting your first production load.

How Kurento works: the media pipeline model

Everything in Kurento revolves around two concepts: the Media Pipeline (a container that holds the media graph for a session or room) and Media Elements (the nodes on that graph). You ask the server to create a pipeline, then you create elements inside it and connect them to each other. The server handles the actual RTP/SRTP, jitter buffers, NACK, FEC and packetization.

Typical elements

  • WebRtcEndpoint — one per participant, handles the ICE/DTLS/SRTP handshake.
  • RtpEndpoint — plain RTP in or out; used for SIP, legacy IP cameras, RTSP bridges.
  • RecorderEndpoint — writes a participant or a mix to WebM, MP4 or FLV on the local filesystem or to a network volume.
  • PlayerEndpoint — pulls in an MP4 / WebM / RTSP / RTMP source and injects it into the pipeline (music, pre-roll, IP-camera feed).
  • Composite — the MCU-style mixer that tiles multiple inputs into one output frame.
  • Dispatcher / DispatcherOneToMany — switch between speakers, active-speaker broadcast.
  • Filters — GStreamer-based elements (ZBarFilter, FaceOverlayFilter, custom GStreamerFilter).

Your application decides how to wire them up. For a 6-person meeting you create one pipeline, six WebRtcEndpoints and connect each endpoint to the other five — or to a Composite — depending on whether you want SFU or MCU behaviour. For a recorded webinar you add a RecorderEndpoint and connect the speakers’ outputs to it. For a QR-code scanner you drop in a ZBarFilter between the client and the forwarder.

SFU vs MCU inside Kurento

Kurento is one of the few open-source servers that natively supports both SFU and MCU modes in the same runtime. You pick by how you wire the pipeline.

SFU (Selective Forwarding Unit) — the server receives each participant’s RTP and routes selected copies to other participants. Kurento does this when you connect WebRtcEndpoints directly to each other. Server CPU stays low (no transcoding), outbound bandwidth grows quadratically with participants, and each client renders N−1 independent streams.

MCU (Multipoint Conferencing Unit) — the server decodes, mixes and re-encodes all inputs into a single composite stream. Kurento does this via the Composite element. Server CPU spikes because every mix means a full decode-encode cycle, but each client only uploads one stream and downloads one stream, which is friendly to weak networks, SIP endpoints and mobile devices on 3G.

Hybrid — real products usually mix both. SFU for browser peers, MCU for a SIP leg, MCU for the recording output so you ship a single MP4, SFU for everything else. Kurento lets you build that inside one pipeline, which is a concrete advantage over pure-SFU engines.

Reach for MCU mode when: you serve SIP or IoT endpoints that cannot negotiate Simulcast, you need a single clean recording track, or you want predictable downstream bandwidth per client. Everyone else should default to SFU.

For a deeper treatment of the architectural trade-offs see our P2P vs MCU vs SFU explainer and the 2026 WebRTC architecture guide for business.

Core modules you’ll actually use

WebRTC group calls

The WebRtcEndpoint is the unit of a participant. You create one per user, negotiate an SDP offer/answer, and feed ICE candidates. Several endpoints in the same pipeline can be connected as an SFU mesh or routed through a Composite for MCU mixing.

Server-side recording

RecorderEndpoint writes directly from the pipeline — no client upload, no MediaRecorder API pain. Common patterns: record each speaker separately (for post-processing), record the MCU-mixed stream (for a single MP4), or record audio only for transcription. The file ends up in WebM (VP8) or MP4 (H.264). Support for FLV (since v6.18) lets you push recordings into CDNs that prefer RTMP pull.

Third-party playback and IP cameras

PlayerEndpoint pulls RTSP, RTMP, HTTP or local files into the pipeline. This is the feature that makes Kurento attractive to surveillance and intercom products — you expose an IP camera as a WebRTC stream without writing your own RTSP parser.

Custom filters

GStreamerFilter lets you drop any GStreamer plugin into a pipeline — face blur, background removal, custom video effects, pose estimation plugged in via a tensor inference element. This is where Kurento genuinely beats mediasoup or LiveKit for AR and ML-enhanced video.

Computer vision and AR filters: Kurento’s secret weapon

Most SFUs are “dumb forwarders” by design — they do not decode frames. Kurento does. Because GStreamer is under the hood, every frame passes through pipelines you control, so you can run detection, tracking or overlay at line rate.

Out of the box you get:

  • ZBarFilter — QR / barcode detection inside the live stream. Useful for KYC, check-in apps and self-service kiosks.
  • FaceOverlayFilter — detects a face and overlays an image; demos use it for virtual masks, but in production it backs moderation, blur-for-privacy and AR glasses try-on.
  • kms-crowddetector — crowd density estimation; used in retail analytics and physical-security platforms.
  • kms-platedetector — ANPR (license-plate recognition) from the stream, widely used in parking and logistics SaaS.
  • Custom GStreamerFilter — plug in anything: OpenCV, ONNX runtime, TensorFlow Lite, FFmpeg filters, your own C++ element.

If your roadmap includes AI-assisted video — automatic highlights, violence detection, sign-language translation, smile-to-pay — Kurento is the server where that work belongs. We covered the wider landscape in our AI-powered video and streaming guide.

Codecs, Simulcast, and streaming limits

Kurento’s codec matrix is smaller than most 2026 competitors. Know this before you pick.

Feature Kurento Notes
H.264 Yes Default for MP4 recording and for Apple/Safari compatibility.
VP8 Yes Default for WebM recording, widely used in pipelines.
VP9 Yes Supported, no SVC exposure in the API.
AV1 No Pick LiveKit, mediasoup or Janus if you need AV1.
H.265 / HEVC No Not on the WebRTC roadmap anyway in most browsers.
OPUS Yes Primary audio codec, full quality.
Simulcast / SVC Partial / No No native Simulcast forwarding. SVC modes not exposed.

The big gap is Simulcast. Modern WebRTC SFUs (mediasoup, LiveKit, Janus) use Simulcast to forward the right spatial layer to each subscriber on the fly — Kurento does not, which means you either transcode (expensive) or you ship a single resolution to everyone and let weaker peers lose frames. For a 50-participant webinar that is fine; for a 500-person all-hands on mixed networks it is not.

Scalability: real capacity numbers

Published OpenVidu benchmarks and our own load tests give a consistent working range. Use these as planning anchors, not guarantees.

Instance class vCPU / RAM SFU sessions @ 720p30 MCU sessions Recorded streams
AWS c5.large 2 / 4 GB ~28 (4 rooms × 7 ppl) ~10 ~15 parallel
AWS c5.xlarge 4 / 8 GB ~80 ~25 ~40 parallel
AWS c5.2xlarge 8 / 16 GB ~150 ~50 ~80 parallel
Hetzner AX41-NVMe 6c/12t / 64 GB ~220 ~70 ~120 parallel
AWS c5.4xlarge 16 / 32 GB ~320 ~100 ~160 parallel

Two caveats. First, Kurento does not cluster natively — you run N instances and put a room-to-instance router in your signaling server. Second, egress bandwidth becomes your real bottleneck around 200–300 concurrent peers at 1.5–2 Mbps each, which is why we often move heavy installs off AWS to Hetzner or a managed bare-metal provider.

Cost model: self-hosted Kurento vs SaaS APIs

The Apache 2.0 license is free; the infrastructure and the engineers are not. A realistic monthly bill for a single-region Kurento backend looks like this:

Profile Concurrency Media server cost TURN / egress Total / month
MVP pilot up to 50 peers $60–$120 $50–$200 $150–$400
Small SaaS up to 200 peers $150–$350 $250–$900 $500–$1,500
Mid-scale up to 1,000 peers $600–$1,800 $1,200–$4,500 $2,200–$7,500
Enterprise 5,000+ peers $3k–$10k $6k–$25k $12k–$45k

Compare with SaaS per-minute pricing. A product with 1,000 daily active users spending 15 minutes each on video pays roughly $0.004–$0.006 per participant-minute on Agora/Twilio, which is ~$1,800–$2,700/month, before you add recording, transcription or SIP. Kurento becomes cheaper once you cross ~500 daily active users; below that, a SaaS is usually the economically sane pick. Our LiveKit vs Agora cost analysis unpacks the per-minute math with worked examples.

Want a realistic Kurento infrastructure estimate for your traffic?

Share your expected peak concurrency and we’ll size the boxes, TURN egress and on-call cost in one page — free.

Book a 30-min call → WhatsApp → Email us →

Kurento vs mediasoup, Janus, LiveKit, Jitsi, Ant Media

Side-by-side, this is how we talk about the main 2026 open-source options on a client call.

Engine Model Simulcast Server-side CV SDK comfort Best for
Kurento SFU + MCU + pipelines No Yes (GStreamer) Low — you write signaling CV / AR / RTP bridges
mediasoup Pure SFU Yes No (forwarder) Medium High-perf conferencing in Node
Janus Plugin-based SFU Yes Limited Medium SIP, streaming, flexibility
LiveKit SFU + SDK Yes (inc. AV1) Via agents High Fast-ship startups, AI agents
Jitsi Videobridge Selective SFU Yes (LastN) No Medium Large rooms, stable codebase
Ant Media Hybrid streaming Yes Limited High Live broadcast, CMAF/HLS

Reach for mediasoup or LiveKit when: your product is pure “rooms with chat”, you need Simulcast or AV1, and you want a Node.js- or Go-first stack. Reach for Kurento when you need server-side pixels, not just routing.

See our media streaming software development guide for a wider landscape, and the Vonage Video API alternatives post for proprietary-vs-open comparisons.

When Kurento is still the right pick

Seven concrete scenarios where we still put Kurento on the shortlist:

  • AR / face filters / background replacement done server-side, not on the client.
  • Computer-vision inline with the stream — QR, ANPR, crowd counting, smile detection, safety compliance.
  • RTSP/RTMP ingestion from IP cameras, drones or OBS into a WebRTC product (telemedicine carts, surveillance SaaS, remote inspection).
  • SIP bridging where you need an MCU leg towards a classical conferencing bridge.
  • Recording with custom layouts — tile speakers exactly the way your product demands, not the way a SaaS allows.
  • Text-to-speech and music injection into live rooms (e-learning, virtual classrooms, voice agents).
  • On-prem / air-gapped deployments where SaaS is not an option (defense, healthcare, government).

When not to use Kurento

Equally important is knowing when to walk away. We push clients off Kurento in these cases:

  • You need time-to-market in weeks, not months. Pick LiveKit, 100ms, Daily or Agora.
  • You need Simulcast or AV1. Pick mediasoup, LiveKit or Janus.
  • You plan for tens of thousands of simultaneous peers globally. Pick a cloud-native SFU with clustering or a managed service.
  • You do not have a DevOps/SRE resource. Self-hosted Kurento eats sysadmin time — TURN, codecs, Let’s Encrypt, crash recovery.
  • Your feature list is pure 1-on-1 calls. Use plain WebRTC P2P and save the media server cost altogether.

Mini case: what a Kurento project looks like

A healthcare client came to us with a telemedicine product that had outgrown a proprietary video API — per-minute bills were approaching $14,000/month and HIPAA audits demanded on-prem recording. They needed live video consultations with HIPAA-compliant session recording, a server-side “face blur” option for nurses who wanted privacy on camera, and RTSP ingestion from bedside cameras.

Our 12-week plan put Kurento behind a custom Node.js signaling layer, wired the FaceOverlayFilter + a GStreamer blur element for the privacy feature, used PlayerEndpoint for RTSP camera streams and RecorderEndpoint for HIPAA-logged sessions. The infrastructure was two Hetzner AX41 boxes plus a coturn cluster. AI agents inside our delivery pipeline generated the pipeline wiring code, the signaling messages and ~80% of the test suite, which is how we finished under the 12-week target.

Outcome: monthly infra dropped from ~$14,000 to ~$1,600. Median join time improved from 4.8s to 1.9s. On-prem recording and blur passed the security review in the first pass. For another end-to-end account, see how we built MyOnCallDoc’s telemedicine video chat. Want a similar assessment for your product? Book a 30-minute call and we’ll sketch the stack on the call.

A decision framework — pick Kurento in five questions

Answer these in order. If you answer “no” to any of Q1–Q3, Kurento is probably not your best pick.

1. Do you need server-side access to pixels or audio samples? AR, CV, ML, blur, overlays, transcription, DSP — all require decoded frames on the server. If yes, Kurento (or mediasoup + custom worker pipelines) is in play. If no, a plain SFU is cheaper.

2. Do you need to bridge non-WebRTC protocols? RTSP cameras, RTMP OBS feeds, SIP trunks, hardware codecs. Kurento’s Player/RtpEndpoint model is ideal for this. A pure SFU will force you to write the bridge yourself.

3. Can you self-host and staff DevOps? Kurento is not SaaS. You need someone who owns TURN, TLS, kernel tuning, log aggregation, crash recovery. No DevOps, no Kurento.

4. Can you accept no Simulcast and no AV1? These are real gaps. If your UX depends on layered video for mobile users, consider mediasoup/LiveKit first.

5. Do you value flexibility over speed? Kurento will let you do anything, but you will spend the first sprint wiring signaling before you see a single frame. SDK-first stacks hide that complexity; so does a good SaaS.

Five pitfalls we see on Kurento projects

1. Skipping TURN from day one. Teams test on the office LAN, ship, then watch 25–40% of real-world users fail to connect behind symmetric NAT. Stand up a coturn instance (or at least Twilio TURN as a paid fallback) on day one. Our WebRTC primer explains the NAT problem in detail.

2. Running Kurento and the app on the same box. GStreamer will happily take every CPU core it can see. Give Kurento its own box, budget 20–30% headroom for encoding spikes, and put TURN on yet another machine.

3. Assuming a single instance will scale. Kurento has no built-in clustering. You need a room-to-instance sharding policy in your signaling layer, a shared session store, and a clear “drain and rotate” deploy strategy.

4. Forgetting to tune the UDP port range. Kurento defaults to 49152–65535; on many cloud VMs this range is closed. Open it in your security group or your users will sit forever on “connecting”.

5. Using the abandoned JavaScript client without eyes open. kurento-utils-js has been unmaintained for years. Either use the bare WebRTC API directly (which is the current recommendation) or adopt OpenVidu, which wraps Kurento with a maintained client.

KPIs: what to measure on a Kurento backend

Three buckets, each with numeric thresholds we use as defaults.

Quality KPIs. Median round-trip time under 150ms, sustained packet loss under 2%, MOS above 4.0 for OPUS audio. Freeze-rate above 5% of session time is a red flag — check CPU and the number of peers per room first.

Business KPIs. P50 join time under 2.5s, call setup failure rate under 1%, connection success rate (including TURN) above 98%, recording success rate above 99.5%. These are the numbers users feel.

Reliability KPIs. Kurento process uptime > 30 days between restarts, crash rate < 1 per 1000 sessions, CPU saturation < 75% at peak, outbound bandwidth saturation < 60%. Alert on any of these and you will catch bad deploys before users do.

OpenVidu: when to pick the framework on top

OpenVidu is a higher-level framework that wraps Kurento with a room abstraction, maintained JS/Android/iOS SDKs, and optional paid add-ons for monitoring and clustering. It trades some of Kurento’s flexibility for a much faster start.

Pick OpenVidu if you want Kurento’s media capabilities but do not want to write signaling, session lifecycle and mobile clients from scratch. It is a reasonable default for e-learning, telehealth and SMB conferencing products that need recording plus modest CV.

Skip OpenVidu if you need deep customization of the media graph, integration with an existing signaling backend, or if your plan relies on clustering beyond what the commercial tier provides. At that point you want raw Kurento (or mediasoup).

Need a second opinion on Kurento vs OpenVidu vs mediasoup?

30 minutes with a senior engineer who has shipped all three in production — you bring the use case, we bring the receipts.

Book a 30-min call → WhatsApp → Email us →

FAQ

Is Kurento dead in 2026?

No. Kurento 7.x still ships point releases and OpenVidu 3 is actively developed. The ecosystem is smaller than mediasoup or LiveKit, so expect fewer third-party tutorials — but production deployments run fine and security patches land.

What is the license of Kurento Media Server?

Apache 2.0. You can use it in commercial products, modify it, and bundle it without paying royalties. You must preserve attribution and follow the standard Apache clauses on modified source.

Does Kurento support Simulcast or SVC?

Not natively. You can run multiple encoding ladders as separate streams, but Kurento does not forward Simulcast layers the way mediasoup, Janus or LiveKit do. If Simulcast is a hard requirement, pick a different SFU.

How many concurrent users can one Kurento instance handle?

Roughly 20 SFU participants per vCPU at 720p30 is a reliable planning number. A 16 vCPU box realistically holds ~300 peers before outbound bandwidth becomes the new bottleneck. MCU mode drops that to ~6–8 participants per vCPU because every frame is transcoded.

What is the difference between Kurento and OpenVidu?

Kurento is the low-level media server. OpenVidu is a room-oriented framework from the same authors that sits on top of Kurento and ships maintained web, Android and iOS SDKs. Think of it as Kurento with the signaling, clients and session lifecycle already written for you.

Is Kurento better than mediasoup?

Neither is universally better. Kurento wins when you need server-side media processing, recording with custom layouts, filter pipelines and RTP bridging. mediasoup wins when you need Simulcast, high peer counts per node and a modern Node.js-first stack.

Can Kurento record video calls?

Yes, via RecorderEndpoint. You can record individual participants, a mixed MCU output, or audio-only tracks. Supported containers include WebM (VP8/OPUS), MP4 (H.264/AAC) and FLV.

How much does a Kurento deployment cost per month?

For an MVP-size product with up to 50 concurrent peers, plan for $150–$400/month including TURN and egress. For a 200-peer SaaS, $500–$1,500/month. Engineer-time is the bigger line — initial setup is usually 4–6 weeks with our Agent Engineering approach, more with a classical team.

Architecture

P2P vs MCU vs SFU for Video Conference Apps

Choose the right media topology before you commit to a server.

WebRTC 2026

WebRTC Architecture Guide for Business

Non-technical framework for picking WebRTC topologies in 2026.

Streaming

Media Streaming Software Development Guide

Where Kurento fits in the 2026 media server landscape.

Cost

LiveKit vs Agora Cost Analysis

Per-minute SaaS vs self-hosted economics, with worked numbers.

Alternatives

Vonage Video API Alternatives

When self-hosted Kurento beats paying per-minute APIs.

Ready to ship Kurento-grade video without the heavy lifting

Kurento remains the right answer when your product needs programmable pixels — AR, CV, custom recording, RTSP bridges — and you are willing to own the infrastructure. It is the wrong answer when speed-to-market or Simulcast-heavy UX is the priority. Between those poles sits OpenVidu, which we recommend whenever the media capabilities of Kurento matter but the signaling complexity does not.

If you think Kurento might be your answer, the next step is a 30-minute scoping call. We will sketch the pipeline, size the boxes and estimate the engineering window on the call — most teams walk away with a cleaner picture than they had after a week of internal debate.

Ready to ship your Kurento or OpenVidu backend?

Fora Soft ships WebRTC and media-server products with Agent Engineering — faster, cheaper, production-ready.

Book a 30-min call → WhatsApp → Email us →

  • Technologies