
Key takeaways
• Android calling is harder than iOS in 2026. Foreground service rules, OEM background killers, full-screen-intent restrictions and unreliable push delivery all conspire against you. Plan for them or your app will quietly miss calls.
• Twelve features cover >95% of real demand. ConnectionService integration, full-screen-intent incoming calls, foreground services with the right type, FCM high-priority data + push-to-wake, AAudio + AEC + NS, CameraX video, PiP, Bluetooth routing, recording with consent, ICE reconnection, MOS monitoring, and an honest battery story.
• SDK choice is a four-way fork: native WebRTC, LiveKit, mediasoup, or a SaaS (Twilio / Agora / SendBird Calls). The right one depends on regulation, scale and how much of the operational burden you can carry.
• Latency, drop-rate and battery are the only KPIs procurement cares about. Ship dashboards on day one: call-setup p95, drop rate, MOS, time-to-first-frame, and ICE-reconnect time.
• Test on real OEM devices, not just Pixel. Samsung, Xiaomi, Oppo and Vivo background-killer behaviour breaks call delivery in ways the emulator can’t reproduce.
Why Fora Soft wrote this guide
Fora Soft has been building Android calling apps since the WebRTC era opened on mobile. Recent and relevant builds include TransLinguist (real-time video interpretation), MyOnCallDoc (HIPAA-grade telemedicine calls), Speakk (anonymous voice chat) and Talensy (multi-party calling).
This piece is for the founder, head of product or platform engineer who is shipping or rebuilding an Android calling app and wants the short version of what we’d do today. It’s opinionated and Android-specific — the iOS equivalent is a different article.
Building or rebuilding an Android calling app?
30 minutes with a Fora Soft mobile architect — we’ll size the foreground-service plan, push-delivery story and SDK choice in one call.
Why Android calling is harder than iOS in 2026
iOS has CallKit. Android has a buffet. ConnectionService is the closest thing to CallKit, but it’s optional, and most apps end up combining it with custom in-app UI, foreground services and FCM. Doze and App Standby aggressively throttle background work, but a calling app still needs to wake within seconds when an incoming call arrives. Foreground service rules tightened again in Android 14, and OEM background killers (Samsung, Xiaomi, Oppo, Vivo) routinely terminate apps that are technically following all of Google’s rules. Push delivery is best-effort; you need a fallback signalling channel.
None of this is unsolvable. All of it has to be designed in from week one — retrofitting after launch is what kills calling apps in store reviews.
The twelve features every Android calling app must ship
| # | Feature | Android-specific surface |
|---|---|---|
| 1 | Reliable incoming call delivery | FCM high-priority data + WebSocket fallback + foreground service |
| 2 | Full-screen-intent incoming UI (Android 12+) | USE_FULL_SCREEN_INTENT permission + CallStyle notification |
| 3 | ConnectionService / Telecom integration | MANAGE_OWN_CALLS + register account; native call log + DnD respect |
| 4 | Foreground service with mic/camera type | Manifest foregroundServiceType + 14+ permissions |
| 5 | Audio: AAudio + AEC + NS + AGC | Low-latency mode + AcousticEchoCanceler + NoiseSuppressor |
| 6 | Bluetooth audio routing (HFP / SCO) | BluetoothHeadset state machine + speaker fallback |
| 7 | Video capture & encode (CameraX + HW codec) | CameraX + H.264 / VP9 / AV1 with simulcast or SVC |
| 8 | Picture-in-Picture (PiP) | android:supportsPictureInPicture + lifecycle handling |
| 9 | Network resilience & ICE reconnect | ConnectivityManager.NetworkCallback + ICE restart |
| 10 | Recording with consent + retention | Banner before record + GDPR / two-party-consent compliance |
| 11 | Battery hygiene | PARTIAL_WAKE_LOCK only when needed; OEM allowlist guidance |
| 12 | Call quality telemetry (MOS / RTT / loss) | WebRTC stats → backend dashboards + alerting |
Reach for ConnectionService when: you want native dialer integration, do-not-disturb respect, and your buyer expects calls to appear in the OS call log. Skip it for niche in-app conferencing where the in-app UI is the only call surface.
The reference architecture — signalling, media, push, foreground
Every credible Android calling app has the same four planes. Where each one runs is the difference between “works in the demo” and “works on Android 14 on a Samsung after eight hours of standby.”
Signalling. WebSocket (or SIP / XMPP if you have a reason) for SDP offer/answer and ICE candidate exchange, room state, presence. Keep the connection alive in the foreground service when the app is open; reconnect quickly on resume.
Media. WebRTC PeerConnection on the device. DTLS-SRTP for transport. STUN for direct paths, TURN for everything blocked by symmetric NAT — budget for TURN, it’s 30%+ of calls in the wild. Simulcast or SVC for video so weak peers don’t drag everyone’s quality down.
Push. FCM high-priority data messages (not notification messages) on incoming call. The receiver wakes the foreground service, raises a CallStyle notification with full-screen intent, and starts ringing. For OEMs known to drop FCM, layer a WebSocket keep-alive while the app process is allowed to live, and (where it matters) a proprietary push channel like MiPush.
Foreground service. Started before any audio or camera capture, declared in the manifest with the matching foregroundServiceType (microphone, camera, mediaPlayback). On Android 14+, the matching FOREGROUND_SERVICE_* permission must also be in the manifest. Skipping this throws ForegroundServiceStartNotAllowedException at runtime, which in production looks like a crash on first call.
Manifest essentials, in one block
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.CAMERA"/>
<uses-permission android:name="android.permission.POST_NOTIFICATIONS"/>
<uses-permission android:name="android.permission.USE_FULL_SCREEN_INTENT"/>
<uses-permission android:name="android.permission.MANAGE_OWN_CALLS"/>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE"/>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_MICROPHONE"/>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_CAMERA"/>
<uses-permission android:name="android.permission.BLUETOOTH_CONNECT"/>
<service
android:name=".CallForegroundService"
android:exported="false"
android:foregroundServiceType="microphone|camera"/>
SDK options — pick the one that matches your team
Four broad paths. The right one depends on engineering depth, regulation and how much of the operational tail you want to own.
Google WebRTC native library
The foundation. Battle-tested, zero licensing cost, full control. You build signalling, UI, push handling, foreground services, audio routing, and recording. Right when you have 1–2 strong native Android engineers and chat/calling is core. Our WebRTC-on-Android primer is here.
LiveKit Android SDK — the multi-party fast lane
High-level wrapper that gives you rooms, multi-party, and an SFU. Great docs, growing ecosystem, sane defaults. Cloud-hosted or self-hosted. Reach for it when you want to ship in weeks rather than months and you’re comfortable with usage-based pricing or running an SFU. Pair with our LiveKit voice-agent build guide if you’re layering AI on top.
Reach for LiveKit when: calls are multi-party, your team is small or moving fast, and either cloud hosting fits the budget or you can run an SFU on your own infra (HIPAA self-host included).
mediasoup-client-android
Lower-level than LiveKit but more control over media; pairs with a self-hosted mediasoup SFU. Strong fit when you need fine-grained bandwidth adaptation, custom layouts, or tight cost control on dedicated infrastructure.
SaaS calling (Twilio, Agora, SendBird Calls) — the procurement-friendly path
Fully managed. Twilio leads on regulated workloads (HIPAA, SOC 2). Agora leads on low-latency interactive video at scale. SendBird Calls leads when you also need their chat. Trade time-to-launch for per-minute cost and lock-in. Agora alternatives covered separately.
Reach for SaaS calling when: regulation is heavy and you want compliance off the shelf, or your team has no realtime experience and time-to-market beats per-minute economics.
Comparison matrix — SDKs at a glance
| SDK | Effort to integrate | Licensing | Multi-party | Compliance | Best for |
|---|---|---|---|---|---|
| Google WebRTC | High | Free / OSS | P2P (or via your own SFU) | DIY | Custom builds, full control |
| LiveKit | Medium | Usage-based / OSS server | SFU, hundreds+ | Self-hostable for HIPAA | Modern multi-party builds |
| mediasoup | High | OSS server | SFU, hundreds+ | Self-host (you own it) | Tight cost control, custom topology |
| Twilio Programmable Video | Low | Per-minute | Yes | HIPAA, SOC 2 | Regulated, fast launch |
| Agora | Low | Per-minute | Yes (large) | Partial | Live interactive, low latency |
| SendBird Calls | Low | Usage-based | Yes (small) | Partial | Apps that also need SendBird chat |
Picking between WebRTC, LiveKit and a SaaS?
We’ve shipped on all four. 30 minutes and we’ll tell you which one fits your team, your buyer, and your unit economics.
Audio — the deceptively hard layer
Audio is where calling apps lose stars. Use AAudio in low-latency mode on Android 8+. Wrap the WebRTC AudioDeviceModule with AcousticEchoCanceler, NoiseSuppressor and AutomaticGainControl effects, falling back to software echo cancellation when hardware AEC isn’t available. Watch for AudioDeviceCallback for device changes; don’t poll.
Bluetooth is the trap. SCO/HFP connection lifecycles vary by OEM and headset; route changes mid-call (user pulls headset out, car Bluetooth connects) need a deterministic state machine and a clean speaker fallback. Test with at least four devices — AirPods (yes, on Android), Samsung Buds, a generic car system, and an enterprise headset.
Reach for hardware AEC + NS when: the device exposes them (most flagships do, many low-end Androids don’t). Fall back to software AEC for the long tail; budget the CPU cost.
Push delivery — the most common production failure
FCM high-priority data messages are the right primary channel for incoming calls, but they are best-effort. OEMs aggressively kill apps to save battery, and the only reliable mitigation is a layered design.
Layer 1 — FCM data, not notification. Notification messages are subject to system suppression on Android 13+; data messages give your app the chance to display its own UI.
Layer 2 — foreground-service-on-receipt pattern. The FCM listener immediately starts the foreground service, raises a CallStyle notification with full-screen intent, and acquires a brief partial wake lock to ensure the device wakes.
Layer 3 — OEM allowlist guidance. First-launch flow that detects Samsung / Xiaomi / Oppo / Vivo and walks the user through battery allowlisting. Without this step, a non-trivial share of users will miss calls regardless of how clean your code is.
Layer 4 — WebSocket fallback. While the app process is allowed to live, keep a signalling WebSocket open with sane keep-alives; route incoming calls through it when present. Don’t rely on it for cold-start delivery.
Permissions you actually need to ask for
Request the dangerous ones at runtime, declare the rest in the manifest, and explain each one before the system prompt fires — conversion drops sharply if the prompt is the user’s first encounter.
Core (runtime). RECORD_AUDIO, CAMERA, POST_NOTIFICATIONS (13+).
Telecom & UI. MANAGE_OWN_CALLS (for ConnectionService), USE_FULL_SCREEN_INTENT (12+, granted by special permission UI on 14+).
Foreground services (manifest, Android 14+). FOREGROUND_SERVICE, FOREGROUND_SERVICE_MICROPHONE, FOREGROUND_SERVICE_CAMERA.
Bluetooth. BLUETOOTH_CONNECT (12+).
Optional. READ_PHONE_NUMBERS (display caller-ID), ACCESS_NETWORK_STATE (connectivity awareness).
Security & compliance — the buyer-side checklist
Calls go encrypted or they don’t ship. DTLS-SRTP for transport (handled by WebRTC), TLS 1.3 for signalling, AES-256 at rest for any recording, BAAs for HIPAA-eligible storage. For premium tiers, layer end-to-end encryption with WebRTC Insertable Streams — the keys never leave the device. Our WebRTC security primer covers the full picture.
Recording demands explicit consent banners shown to every participant before capture begins, plus retention policies that satisfy GDPR and the relevant US state two-party-consent laws. More on the NFR framing here.
KPIs that prove the calling app is healthy
Reliability KPIs. Call setup time p95 (target <1 s); call drop rate (target <0.5%); ICE-reconnect success on network change (target >95%); foreground-service uptime per call (target >99.9%).
Quality KPIs. MOS (target >4.0); time-to-first-frame video p95 (target <2 s); one-way audio latency p95 (target <200 ms in-region); jitter buffer underruns per minute.
Adoption & trust KPIs. Inbound-call answer rate; permission-grant rate (RECORD_AUDIO and CAMERA); battery drain per 30-min call (mAh); store rating regression on Android 14 vs 13 cohorts.
Five pitfalls that kill Android calling apps
1. Foreground service crashes on Android 14+. Missing foregroundServiceType or the matching permission throws ForegroundServiceStartNotAllowedException. CI must fail any build that omits these manifest entries.
2. Push delivery you don’t test on real OEMs. Pixel works. Samsung doesn’t. Test on at least Samsung, Xiaomi, Oppo, Vivo and a Pixel before launch. Add the OEM allowlist guidance flow.
3. Bluetooth audio routing left to chance. Headset disconnect mid-call without a clean fallback → silent dead air for the user. Build the state machine, don’t rely on framework defaults.
4. Battery drain in standby. Holding wake locks “just in case” or polling on a timer drains battery and gets you blocked by the OS. Use partial wake locks only on the critical path; rely on FCM for incoming-call wakes.
5. No telemetry on quality. Without MOS, drop-rate and ICE-reconnect dashboards, you discover problems through one-star reviews. Build dashboards in week one, not after a regression.
Mini case — rebuilding an Android calling app for a telemedicine client
A telemedicine client came to us with a 1.0 Android calling app shipped a year earlier on a now-outdated SDK. Symptoms: incoming calls reliably missed on Samsung, foreground-service crashes appearing in Android 14 crash reports, audio routing broken on car Bluetooth, no MOS or call-drop dashboards. Pass-through rate to a connected doctor was sitting at 71%.
In ten weeks we rebuilt the calling layer on Google WebRTC + LiveKit (HIPAA-eligible self-hosted), added the manifest fixes for Android 14+, layered FCM data + WebSocket fallback + OEM allowlist flow, replaced the audio manager with a deterministic Bluetooth state machine, and shipped MOS / drop-rate / ICE-reconnect dashboards.
Outcome twelve weeks post-relaunch: connected-call rate up to 94%, P0 crash rate cut by 86%, store rating recovered from 3.6 to 4.5 over the next two release cycles. The biggest single contributor was push reliability — the SDK choice mattered far less than the four-layer push design.
A five-question decision framework for Android calling
Q1. Are calls 1:1 or multi-party? 1:1 → raw WebRTC peer-to-peer is fine. Multi-party → you need an SFU (LiveKit, mediasoup, or a SaaS).
Q2. Is the data regulated? HIPAA / GDPR-strict residency → self-hosted LiveKit / mediasoup, or Twilio. Standard B2C → any path.
Q3. Who’s on-call for this in 12 months? Strong Android team → native WebRTC. Smaller team or outsourced → LiveKit or SaaS.
Q4. What’s your unit-economics ceiling per minute? If per-minute SaaS pricing breaks the model, self-host. If your team-cost dwarfs minutes, buy SaaS and ship.
Q5. Are you a default-dialer replacement? If yes, ConnectionService is mandatory. If no, it’s optional — but even niche apps benefit from native call-log integration.
When you should not build calling on Android first
If your audience is heavily iOS, your team has no Android calling experience, and your timeline is short, ship iOS first with a polished CallKit experience, then take the Android lessons (and budget) and build it properly. Half-finished Android calling apps are a leading indicator of one-star app-store reviews and burned engineering budgets.
Likewise, if you only need a chat-style voice note feature, don’t build a full calling stack. Use existing recording APIs and async audio messages.
Inheriting an Android calling app that’s breaking on 14+?
We do fixed-fee Android calling-app audits — written report, prioritised fix list, two-week turnaround.
FAQ
Should I integrate ConnectionService for my calling app?
If your app is a default-phone-replacement or a serious VoIP product, yes — ConnectionService gives you native call-log integration, do-not-disturb respect, and consistent UX with the dialer. For niche in-app conferencing or one-off video calls inside another product, it’s optional. The implementation cost is real (a few weeks), so balance against the UX win.
Is raw WebRTC enough or do I need an SDK like LiveKit?
Raw WebRTC handles the media plane. You still build signalling, UI, push handling, foreground services, audio routing and recording yourself. LiveKit (or Twilio / Agora / mediasoup) abstracts a lot of that, especially for multi-party calls. For 1:1 calling with a small experienced team, raw WebRTC is fine. For multi-party or fast launch, an SDK saves months.
How do I handle incoming calls when the app is killed?
Send an FCM high-priority data message. The FCM listener wakes the foreground service, which raises a CallStyle notification with full-screen intent and starts ringing. Add a partial wake lock for the brief moment between FCM receipt and notification display. Layer a WebSocket fallback while the app process is alive, and add OEM allowlist guidance for users on aggressive Samsung/Xiaomi/Oppo/Vivo devices.
What changed for foreground services in Android 14?
Foreground services now require both a manifest foregroundServiceType declaration and the matching FOREGROUND_SERVICE_* permission. For calling apps that’s usually microphone|camera. Skipping either causes ForegroundServiceStartNotAllowedException at runtime, which surfaces as crashes during call setup. Wire CI to fail any build missing these.
How much does a TURN server cost and can I avoid one?
TURN bandwidth is typically the most expensive line item per call — it’s relayed via your server, not direct peer-to-peer. For consumer apps assume 10–30% of calls need TURN; for enterprise / regulated networks closer to 50%. You can’t avoid TURN entirely without dropping the calls behind symmetric NAT. Optimise by preferring direct candidates (mDNS, IPv6) when available.
How do I handle a 4G ↔ Wi-Fi network change mid-call?
Listen on ConnectivityManager.NetworkCallback. On any change, gather fresh ICE candidates and trigger an ICE restart on the PeerConnection (renegotiation with new candidates). Clients should keep the call alive for at least 5–8 seconds across the transition; if no new media arrives by then, drop and reconnect cleanly with a user-visible toast.
Do I need end-to-end encryption (E2EE)?
DTLS-SRTP encrypts media in transit between clients and the SFU. True end-to-end encryption (where even the server can’t decrypt) needs WebRTC Insertable Streams or a similar approach. It’s required for regulated or sensitive workloads (legal, healthcare, defence) and a strong differentiator for consumer privacy positioning. Plan for 4–6 weeks of additional engineering.
Can I support Android Auto and Wear OS for calls?
Yes, but only with ConnectionService integration as the foundation. Android Auto routes calls through Telecom; Wear OS apps use the standard Telecom + companion patterns. Plan extra engineering for both surfaces. For most apps, ship Android phone first, expand to Auto/Wear in a second wave.
What to Read Next
WebRTC
WebRTC in Android — the explainer
The foundation under every calling app, in plain language.
Screen sharing
Android WebRTC screen sharing — the implementation guide
Once your calls work, screen share is the next feature buyers ask for.
SDK choice
Agora alternatives in 2026
LiveKit, mediasoup, Jitsi and Janus compared honestly.
Voice command
Voice command tools for virtual meetings 2026
Layer voice command on top of your calling app for a competitive edge.
Performance
10 ways to optimise Android apps for smooth video
Performance patterns that translate directly to video calling.
Ready to ship an Android calling app that doesn’t miss calls?
An Android calling app in 2026 is a four-plane system — signalling, media, push, foreground service — engineered against OEM background killers, foreground-service rules, and Bluetooth audio routing that breaks in five different ways. Get the twelve features right, ship telemetry from day one, and pick the SDK that matches your team rather than the one that won the marketing page.
Most calling apps that fail in production fail on Android-specific surfaces, not on WebRTC itself. The good news: every one of those surfaces is solvable, and the playbook above is the one we’d follow on a fresh project tomorrow.
Let’s ship an Android calling app that survives Android 14+
30 minutes with a Fora Soft mobile architect — bring your product, leave with a stack picked, foreground-service plan and ship timeline.
.png)


.avif)

Comments