
Key takeaways
• AVFoundation is still the default for iOS video effects in 2026. It covers 90% of product use cases at zero license cost, from a single CIFilter overlay to a full custom AVVideoCompositing pipeline.
• Use Swift Concurrency from day one. AVAsset.load(.tracks, .duration) (iOS 16+) replaces callback APIs and batches I/O, cutting startup latency by 30–50% in real apps.
• Core Image is the fast path, Metal is the ceiling. A chained CIFilter graph delivers 1080p30 on an A14 at 5–10 ms/frame; Metal (or MetalPetal) is only needed above ~20 effects or 4K60.
• Color space and pixel format bugs cause 80% of production issues. Lock kCVPixelFormatType_32BGRA, reuse one CIContext, and respect preferredTransform.
• Budget honestly. A 5-filter MVP ships in 2–4 weeks; a real-time AR beauty stack with 20+ effects is 8–16 weeks; face-tracked AR filters parity with TikTok is 12–24 weeks.
Why Fora Soft wrote this AVFoundation tutorial
We have been shipping iOS video pipelines since before AVFoundation had async APIs. Over 20 years and 625+ products, a big slice of our portfolio has been real-time media: our VALT platform runs 25,000 daily users of recorded, effect-processed video on the web, and our mobile teams have built capture-and-effect stacks for dating apps, fitness apps, and telemedicine. That operational scar tissue is what this tutorial distills.
The original version of this article was a short Objective-C snippet from the iOS 8 era. This 2026 rewrite keeps the teaching goal but uses modern Swift, iOS 17/18 APIs, and the performance playbook we use when clients hire us for custom iOS app development or custom video & audio processing.
Agent Engineering, our combination of senior engineers plus autonomous AI agents, compresses the discovery and prototyping phases of a video-effects build. That is why the estimates later in this article are faster than legacy agency timelines without cutting quality — and when in doubt we undersell rather than over-promise.
Building a video-effects feature on iOS?
Get a 30-minute architecture review with a Fora Soft iOS lead — AVFoundation vs. Metal vs. SDK, timelines, and risk points for your app.
When AVFoundation is the right answer (and when it isn’t)
AVFoundation plus Core Image covers an enormous share of real iOS product needs: filter packs, color grading, text overlays, alpha-channel compositing, slow-motion, green-screen, object-trail trails, bloom and blur, letterbox and crop, audio ducking with the video. For those features you are within native Apple frameworks: zero license fee, zero vendor risk, and a few days of engineering from a good iOS developer.
Where it stops paying off: 52-blend-shape face-tracked beauty filters at TikTok parity, highly customized AR try-on with 3D occlusion, or ultra-low-latency pipelines below 8 ms/frame on a 4K stream. At that point you either invest 3–6 months in a custom Metal engine or license a third-party face SDK (Banuba, DeepAR, BytePlus) at $10K–$500K/year.
The decision in 2026 is rarely “AVFoundation vs. nothing.” It is “AVFoundation + Core Image” vs. “AVFoundation + Metal” vs. “AVFoundation + a third-party SDK.” The rest of this tutorial walks the stack level by level so you can draw the right line for your product.
Reach for AVFoundation + Core Image when: your effect set is under ~20 items, you don’t need real-time 4K60, and face tracking is optional or limited to Vision’s built-in capabilities.
The AVFoundation stack at a glance
Before writing code, it helps to see the layer cake. AVFoundation is the top-level media framework. Core Image sits beside it for GPU-accelerated filtering. Core Video owns the pixel-buffer data model. Metal and Metal Performance Shaders live underneath when you need raw GPU access.
Key types and what they do
AVAsset. The in-memory representation of a media file or stream. Use AVURLAsset for files on disk or in the cloud.
AVAssetReader / AVAssetWriter. Read frames out of an asset into memory; write frames back to an encoded file. This is the pair you use for offline effects.
AVMutableVideoComposition + AVVideoCompositing. The effect graph. Set a custom compositor on a composition and every output frame is routed through your code before display or export.
CMSampleBuffer. The compressed or decoded sample carrier. Contains timing, metadata, and either raw audio/video or a reference to a CVPixelBuffer.
CVPixelBuffer. The Core Video pixel container: BGRA, YUV, HDR 10-bit, or HDR planar formats. It is the handoff point to Core Image and Metal.
CIImage / CIFilter / CIContext. Core Image’s trio. Build a CIImage from a pixel buffer, chain filters, render through a context backed by Metal.
MTKView / MTLDevice / Metal shaders. The Metal path. Direct GPU rendering with a Metal texture on screen and custom kernels for effects.
Load asset properties the modern (async) way
Anything that touches AVAsset needs to load properties before you can use them. Before iOS 16, that meant the callback-based loadValuesAsynchronously(forKeys:completionHandler:). It’s deprecated; use the async API.
import AVFoundation
let asset = AVURLAsset(url: videoURL)
// Batched I/O: framework loads both in one round-trip.
let (duration, tracks) = try await asset.load(.duration, .tracks)
// Per-track properties, also async:
if let video = try await asset.loadTracks(withMediaType: .video).first {
let size = try await video.load(.naturalSize)
let fps = try await video.load(.nominalFrameRate)
let xf = try await video.load(.preferredTransform)
}
The load(_:_:) variant batches the underlying I/O, which typically saves 30–50% startup latency when you need multiple properties. Type safety is a bonus: no more string keys and value(forKey:) casts.
Core Image: the 80% of iOS video effects you will ever need
Core Image gives you roughly 200 built-in CIFilter primitives — color controls, blur variants, distortion, blends, convolution, face-feature mapping, barcode, QR, transitions. Chain them, and Core Image fuses the chain into a single GPU kernel at render time, so a three-filter chain costs about the same as a single filter.
The modern API (iOS 13+) is strongly typed through CIFilter.<name>(). Avoid string-based CIFilter(name:) in new code — typos become crashes at runtime.
import CoreImage
import CoreImage.CIFilterBuiltins
func stylize(_ image: CIImage) -> CIImage {
let blur = CIFilter.gaussianBlur()
blur.inputImage = image
blur.radius = 6
let color = CIFilter.colorControls()
color.inputImage = blur.outputImage
color.brightness = 0.05
color.contrast = 1.15
color.saturation = 1.1
let bloom = CIFilter.bloom()
bloom.inputImage = color.outputImage
bloom.intensity = 0.4
bloom.radius = 8
return bloom.outputImage ?? image
}
At 1080p on an A14, that three-filter chain ran ~8–12 ms per frame in our benchmarks — comfortably inside the 16.7 ms budget for 60 FPS. On A17 Pro the same chain drops to 5–8 ms. The takeaway: Core Image is fast enough for real-time 60 FPS 1080p on any iPhone from the last five years.
Reach for a chained CIFilter graph when: you have under 20 effects, target 1080p60 or below, and don’t need custom per-pixel algorithms. This is the default.
Applying an effect: the canonical CIBlendWithMask pattern
The classic request — “overlay an explosion onto my video” — becomes three inputs: the background video, the effect video (RGB), and the effect video’s alpha mask. CIBlendWithMask does the compositing in one filter call, runs on the GPU, and handles correct pre-multiplication.
import CoreImage
import CoreImage.CIFilterBuiltins
import CoreMedia
func blend(background: CVPixelBuffer,
effect: CVPixelBuffer,
mask: CVPixelBuffer) -> CIImage? {
let bg = CIImage(cvPixelBuffer: background)
let fg = CIImage(cvPixelBuffer: effect)
let alpha = CIImage(cvPixelBuffer: mask)
let blend = CIFilter.blendWithMask()
blend.backgroundImage = bg
blend.inputImage = fg
blend.maskImage = alpha
return blend.outputImage
}
To get those three CVPixelBuffers in sync, you typically run three AVAssetReaderTrackOutputs in lock-step against the same CMTime, pull one frame from each, then render through a cached CIContext backed by Metal.
// One CIContext for the whole session.
let device = MTLCreateSystemDefaultDevice()!
let ciContext = CIContext(mtlDevice: device)
// Render output CIImage back into a writable CVPixelBuffer.
ciContext.render(outputImage,
to: outputPixelBuffer,
bounds: outputImage.extent,
colorSpace: CGColorSpace(name: CGColorSpace.sRGB)!)
writerAdaptor.append(outputPixelBuffer,
withPresentationTime: presentationTime)
Reuse the CIContext for the whole session. Allocating one per frame is the single most common source of memory leaks we see in production iOS video apps.
The custom AVVideoCompositor pattern for real products
For any app shipping more than one effect, the professional shape is an AVVideoCompositing implementation. The framework calls your startRequest(_:) method per output frame; you return the composited pixel buffer, and both playback (AVPlayer) and export (AVAssetExportSession) use the same code.
final class EffectCompositor: NSObject, AVVideoCompositing {
let ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
let renderQueue = DispatchQueue(label: "effect.render",
qos: .userInitiated)
var sourcePixelBufferAttributes: [String: Any]? = [
String(kCVPixelBufferPixelFormatTypeKey):
Int(kCVPixelFormatType_32BGRA)
]
var requiredPixelBufferAttributesForRenderContext: [String: Any] = [
String(kCVPixelBufferPixelFormatTypeKey):
Int(kCVPixelFormatType_32BGRA)
]
func renderContextChanged(_ newRenderContext: AVVideoCompositionRenderContext) {}
func startRequest(_ req: AVAsynchronousVideoCompositionRequest) {
renderQueue.async { [weak self] in
guard let self else { return }
guard let trackID = req.sourceTrackIDs.first?.int32Value,
let input = req.sourceFrame(byTrackID: trackID),
let output = req.renderContext.newPixelBuffer()
else {
req.finish(with: NSError(domain: "Effect", code: -1))
return
}
let source = CIImage(cvPixelBuffer: input)
let stylized = self.stylize(source)
self.ciContext.render(stylized,
to: output,
bounds: stylized.extent,
colorSpace: CGColorSpace(name: CGColorSpace.sRGB))
req.finish(withComposedVideoFrame: output)
}
}
func cancelAllPendingVideoCompositionRequests() {}
func stylize(_ image: CIImage) -> CIImage { image /* your chain */ }
}
Attach it through AVMutableVideoComposition: set customVideoCompositorClass = EffectCompositor.self, set the render size, frame duration, and layer instructions. Pass the composition to AVPlayerItem.videoComposition for preview or AVAssetExportSession.videoComposition for export — same code, two sinks.
When to drop to Metal and what it costs you
Core Image covers a lot, but three scenarios force the Metal path: custom algorithms not in the CI library (hand-tuned LUTs, neural style transfer, optical flow), chains with 20+ passes where the CI fusion no longer compiles cleanly, and target budgets below 8 ms/frame at 1080p on older devices.
Two options exist. You can write a Metal shader in a .ci.metal file and wrap it in a CIColorKernel or CIWarpKernel — you keep the Core Image graph but inject a custom per-pixel function. Or you bypass Core Image entirely and render through MTKView with your own command buffer. Option one is 80% cheaper to build; option two gives you full control and is what MetalPetal does under the hood.
// Film.ci.metal
#include <CoreImage/CoreImage.h>
extern "C" {
float4 filmGrain(sample_t s, float intensity, destination d) {
float2 p = d.coord();
float n = fract(sin(dot(p, float2(12.9898, 78.233))) * 43758.5453);
float3 rgb = s.rgb + (n - 0.5) * intensity;
return float4(clamp(rgb, 0.0, 1.0), s.a);
}
}
Add the Xcode build rule (-fcikernel), load the kernel via CIColorKernel(source:), and call apply(extent:arguments:) inside your filter chain. You now have a GPU-accelerated custom pixel function with minimal rewriting of the rest of the pipeline.
Adding people and faces with the Vision framework
Two Vision APIs pair naturally with AVFoundation pipelines in 2026: VNGeneratePersonInstanceMaskRequest (iOS 17+) returns up to four per-person alpha masks, perfect for background blur in a video-call app. VNDetectFaceLandmarksRequest returns 76 facial landmarks to anchor AR overlays.
import Vision
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cvPixelBuffer: input, options: [:])
try handler.perform([request])
if let first = (request.results as? [VNInstanceMaskObservation])?.first {
let maskBuffer = try first.generateScaledMaskForImage(
forInstances: first.allInstances,
from: handler)
// Feed maskBuffer into CIBlendWithMask alongside a blurred background.
}
Benchmarks on an iPhone 14 Pro: person instance segmentation takes 30–50 ms per 1080p frame. That is too slow for every frame at 30 FPS, but it lands cleanly at 10 FPS and you upsample the mask between Vision updates. For face detection alone, budget 5–10 ms per face — real-time at 30 FPS is fine.
Need real-time background blur or face filters on iOS?
We’ll benchmark Vision + AVFoundation against a third-party SDK on your target device and send you the numbers.
Real-time capture pipeline for live preview
If you are building a camera app, the pipeline is AVCaptureSession → AVCaptureVideoDataOutput → your delegate queue → Core Image/Metal → MTKView. Three things matter for a smooth preview: the right pixel format, a persistent CIContext bound to the Metal device, and display synchronization via CADisplayLink or MTKView’s delegate.
let session = AVCaptureSession()
session.sessionPreset = .hd1920x1080
let out = AVCaptureVideoDataOutput()
out.videoSettings = [
kCVPixelBufferPixelFormatTypeKey as String:
Int(kCVPixelFormatType_32BGRA)
]
out.alwaysDiscardsLateVideoFrames = true
out.setSampleBufferDelegate(self,
queue: DispatchQueue(label: "camera.out",
qos: .userInteractive))
session.addOutput(out)
// Delegate:
func captureOutput(_ output: AVCaptureOutput,
didOutput buffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let pb = CMSampleBufferGetImageBuffer(buffer) else { return }
let image = CIImage(cvPixelBuffer: pb)
let processed = stylize(image)
metalView.currentImage = processed
}
If the chain runs above 16 ms/frame on your target device, downscale the preview to 720p and keep the export at full 1080p/4K. Users never see the preview downscale when exports stay crisp.
Export settings: bitrate, codec, and HDR
HEVC is the default codec in 2026 for any iPhone 6s or later. It is 40–50% smaller than H.264 at equivalent quality, which matters for uploads, MMS, and offline caching. Fall back to H.264 only for AirDrop-to-old-devices compatibility.
| Use case | Resolution | FPS | Codec | Bitrate |
|---|---|---|---|---|
| Social feed (TikTok/Reels) | 1080p | 30 | HEVC | 4–6 Mbps |
| Vlog / YouTube | 1080p | 60 | HEVC | 8–12 Mbps |
| 4K content | 2160p | 30 | HEVC | 15–25 Mbps |
| 4K action / sports | 2160p | 60 | HEVC | 30–50 Mbps |
| Legacy compat | 1080p | 30 | H.264 | 6–8 Mbps |
| Low-bandwidth streaming | 720p | 30 | HEVC | 2–3 Mbps |
For HDR (iOS 17+), export with the HEVC with Alpha or Dolby Vision presets and make sure your CVPixelBuffer carries 10-bit 4:2:0 with the correct color primaries. Bitrate rises ~20–30% vs. SDR HEVC; plan for that in your export UI.
Performance benchmarks and headroom per device
Numbers beat intuition. The table below is a compact version of what we benchmark on every iOS media project. Times are per-frame including upload, GPU compute, and download.
Gaussian blur (radius 10) – 1080p30: A14 ≈ 8–12 ms; A17 Pro ≈ 5–8 ms; M2 iPad ≈ 3–5 ms. All comfortable.
Gaussian blur (radius 10) – 4K30: A14 ≈ 25–32 ms (misses 30 FPS); A17 Pro ≈ 12–18 ms (fine); M2 ≈ 6–10 ms (comfortable).
Color controls + bloom chain – 1080p60: A14 ≈ 8–12 ms; A17 Pro ≈ 4–7 ms.
Custom Metal film-grain kernel – 4K30: M1 iPad ≈ 4–8 ms; A17 Pro ≈ 6–10 ms.
Quick rule of thumb: if your chain eats more than 50% of the frame budget on the oldest device you intend to support, downscale preview or drop to 30 FPS until export. Users forgive a slightly softer live preview much more easily than a stuttering one.
AVFoundation vs. MetalPetal vs. third-party SDKs
The comparison below is the one we use in client workshops to set expectations about what a video-effects feature really costs in time, money, and risk.
| Stack | Strength | Weakness | Time to MVP | License / cost | Best for |
|---|---|---|---|---|---|
| AVFoundation + Core Image | 200 built-ins, fused GPU graph | Caps at ~20 effects | 2–4 weeks | $0 (Apple) | Filter packs, color grading |
| AVFoundation + Metal | Unlimited custom effects | Shader expertise needed | 8–16 weeks | $0 (Apple) | 4K60, proprietary looks |
| MetalPetal (OSS) | Render-graph engine, energy-efficient | Steeper learning curve than CI | 6–12 weeks | Apache 2.0 (free) | Creator tools, long chains |
| Banuba SDK | 52 blend shapes, AR-ready | Commercial license | 1–2 weeks | $50K–$500K / yr | TikTok-parity filters |
| DeepAR | Large free filter library | Less customizable beauty | 1–2 weeks | $30K–$200K / yr | Dating & social apps |
| BytePlus Effects | TikTok-grade tracking | SDK size, regional support | 1–2 weeks | $10K–$100K / yr | Short-form video apps |
| GPUImage3 (legacy) | Simple effect API | Unmaintained, Swift-only since 2020 | 2–4 weeks | MIT (free) | Prototypes only |
Mini case: an AVFoundation pipeline we shipped in eight weeks
Situation. A dating-app client needed an in-app video profile recorder with seven selectable looks (vintage, bloom, soft-glow, B&W, sepia, beauty, and a subtle LUT). Target: iPhone 12 and newer, 1080p30 preview, export in HEVC under 2 MB per 15-second clip. They had briefly evaluated Banuba but the cost was too high for their stage.
12-week plan compressed to 8. Weeks 1–2: design and Figma-to-SwiftUI capture UI. Weeks 3–4: AVCaptureSession + MTKView preview + four CIFilter-chain looks. Weeks 5–6: two custom .ci.metal kernels for the film grain and the LUT path, plus a reusable EffectCompositor. Week 7: export via AVAssetExportSession with the composition, progress UI, Photos library save, and File share sheet. Week 8: QA, TestFlight, performance tuning, App Store submission.
Outcome. Preview frame time on iPhone 12: 9–11 ms. Export ratio at 1080p30, 15-second clip: 1.5× realtime. Crash-free sessions at launch: 99.8%. The client avoided the Banuba license entirely for the MVP and has kept the same pipeline through 1.3 million processed clips. Want a similar eight-week plan for your iOS video feature?
Five pitfalls that cost iOS teams weeks
1. Wrong pixel format. AVCaptureVideoDataOutput defaults to NV12 (YUV). If the rest of your chain assumes BGRA, colors shift cyan or magenta. Force kCVPixelFormatType_32BGRA on the capture output and verify with CVPixelBufferGetPixelFormatType before rendering.
2. Color-space mismatch. If you capture in Display P3 but render the output image with an sRGB color space, skin tones go flat. Attach the correct color space when calling CIContext.render(_:to:bounds:colorSpace:), and keep it consistent end to end.
3. Ignoring preferredTransform. Videos shot in portrait have metadata that rotates them 90° on playback. If you apply effects before applying the transform, the preview is sideways. Always chain the transform into your composition layer instructions.
4. One CIContext per frame. Allocating a fresh CIContext every frame silently leaks 20–50 MB per minute. Create one at session start and hold a reference. The same rule applies to Metal command queues.
5. Blocking the main thread on load. Awaiting asset.load(.tracks) on @MainActor freezes the UI during a cold-start network fetch. Keep property loads on a background task and hop back to main with await MainActor.run only for UI updates.
Stuck on a color-shift or stutter bug?
Bring your capture settings and a sample frame — we’ll read the pixel format, color space, and compositor graph in one call.
Architecture tips for long-lived video apps
A working prototype is not a maintainable product. Four patterns keep an AVFoundation-based video app healthy at scale.
Protocol-oriented effects. Define a VideoEffect protocol with a single apply(_:) method. Every filter, chain, and composition implements it. You can unit-test each node in isolation and swap implementations (CI vs. Metal) without touching the rest of the app.
Dependency-injected compositor. Your AVVideoCompositing class accepts an effect factory in the initializer. Preview and export use the same class wired up differently — easier to test, easier to debug, and you avoid two slightly different render paths that silently diverge.
SwiftUI + UIViewRepresentable for MTKView. Wrap the MTKView in a UIViewRepresentable so the capture surface is a normal SwiftUI view. It composes cleanly with @Observable state, sheets, and toolbars.
Observability from day one. Log per-frame render time, dropped frames, and export duration to an analytics pipeline. You cannot tune what you cannot see, and the numbers become invaluable when a new iOS release or new device silently changes timing.
What it actually costs to ship — honest timelines
We don’t publish fixed prices in blog posts, but here is the shape of the effort for four common asks. Use it to stress-test any vendor estimate.
MVP 5-filter pack with export. Swift + Core Image + AVAssetExportSession. A good iOS developer ships this in 2–4 weeks.
Real-time capture + preview + 10 effects. AVCaptureSession + MTKView + a reusable compositor, five CI chains and five Metal kernels. 4–8 weeks.
Background blur + person masking. Vision person-instance masks (iOS 17+) + CIBlendWithMask. 3–6 weeks including tuning for different lighting.
Face-tracked AR beauty stack (TikTok-like). ARKit face anchors + Metal pipeline + at least 20 tuned effects. 12–24 weeks, or integrate a vendor SDK (Banuba/DeepAR) in 1–2 weeks at the cost of an ongoing license fee.
When to not roll your own AVFoundation pipeline
Three situations tip the math toward a paid SDK or a different platform entirely.
You need 52+ blend shapes and production-grade beauty. Banuba or DeepAR are cheaper than the 12–24 weeks plus shader engineer payroll you would need to match them.
You need identical output on iOS and Android. AVFoundation is iOS-only. If cross-platform parity is the product requirement, look at MediaPipe, a shared C++/Metal+OpenGL pipeline, or a commercial SDK that ships on both.
You need server-side rendering. AVFoundation only runs on Apple hardware. For server pipelines use FFmpeg with libx265 / VVC and a custom shader stage, or a managed service like Mux, Cloudflare Stream, or AWS Elemental MediaConvert.
Privacy, permissions, and App Review checkpoints
Camera and microphone usage strings. Any app using AVCaptureSession must declare NSCameraUsageDescription and NSMicrophoneUsageDescription. Apple rejects ambiguous wording; be specific about “record videos with filters” or “take selfies with AR effects.”
Photo library writes. Use the addOnly authorization scope (iOS 14+) if you only need to save. Reviewers flag apps that request full library access without clear justification.
App Privacy manifest. If you use UserDefaults, file timestamps, or systemBootTime APIs (common in AVFoundation paths), declare them in PrivacyInfo.xcprivacy. Missing entries block submission since May 2024.
On-device processing. Running effects locally — which AVFoundation does by default — is a privacy selling point. Keep it on-device unless the feature actually requires the cloud, and say so in the app description.
FAQ
Do I need Metal if I’m only shipping 5–10 filters?
No. Core Image with chained built-in filters covers that range at 60 FPS on any iPhone from the last five years. Drop to a Metal kernel only for a specific custom effect that Core Image does not ship or when your chain exceeds ~20 passes.
What is the minimum iOS version I should support?
iOS 16 is the sensible floor in 2026 because the async AVAsset.load APIs arrived there. If you need VNGeneratePersonInstanceMaskRequest or HDR export, iOS 17 is the floor.
Can I render the same composition for both preview and export?
Yes, and you should. Implement AVVideoCompositing once and attach the same AVMutableVideoComposition to AVPlayerItem.videoComposition for preview and AVAssetExportSession.videoComposition for export. Single source of truth.
Why are my video colors washed out after applying an effect?
Almost always a color-space or pixel-format mismatch. Confirm the capture pixel format is kCVPixelFormatType_32BGRA, keep the working color space consistent (sRGB or Display P3), and pass that same color space into CIContext.render.
How do I run face filters without buying a $50K/year SDK?
For mid-fidelity features, ARKit’s face anchors plus a Metal render pass cover hats, glasses, color grading, and simple animated overlays at zero license cost. For TikTok-parity beauty modes with 50+ blend shapes, a paid SDK is the honest answer.
Should I use AVAssetExportSession or AVAssetWriter?
AVAssetExportSession is the default — simpler, battle-tested, and hooks straight into videoComposition. Use AVAssetWriter when you need precise control over pixel buffers, are writing directly from a capture session, or need non-standard codecs or bitrate settings.
Is MetalPetal worth the switch from Core Image?
Only if you run 20+ effects in a chain, target real-time 4K60, or need a documented render-graph engine. Up to 1080p60 with a reasonable chain, Core Image is already near-optimal, and the migration cost is a full rewrite of the filter layer.
How do I handle HDR video in 2026?
Capture in HDR mode via AVCaptureDevice.activeFormat.videoHDRSupported, keep 10-bit 4:2:0 pixel formats through the pipeline, and export with AVAssetExportPresetHEVCHighestQualityWithAlpha or the Dolby Vision preset (iOS 17+). Budget a 20–30% bitrate bump vs. SDR HEVC.
What to read next
AI media pipelines
AI-powered video editing solutions
How AI models plug into ingest & export pipelines — complementary to AVFoundation on iOS.
Computer vision
AI video processing trends
Edge inference, transformer models, and distillation — the same ideas that sit behind on-device iOS effects.
Media stack
Best technologies for video streaming app development
Choosing between WebRTC, HLS, LL-HLS, and CMAF — pairs naturally with on-device capture & effects.
Mobile product
Mobile app UX design best practices
Required reading when you’re wrapping an AVFoundation engine in a clean, testable SwiftUI surface.
CV in production
AI video analytics for security
How we shipped the VALT video-analytics platform at 25k DAU — the same pipeline primitives as AVFoundation’s.
Ready to ship iOS video effects this quarter?
AVFoundation in 2026 is the smart default for anyone building video-effects features on iOS. Modern Swift, async property loading, a fused Core Image graph, and a clean AVVideoCompositing implementation take you from zero to a real product in weeks, not quarters — and give you the same code path for preview and export.
Drop to Metal when your effect set or your frame budget demands it. Pay for a third-party SDK when the business case for 52-blend-shape beauty is real. Otherwise, the Apple stack is more than enough for almost every consumer and enterprise app we have shipped in the last two decades, and the cost profile is hard to beat.
Let’s scope your iOS video feature in 30 minutes
Bring your app concept, target devices, and effect wishlist — you’ll leave with a stack recommendation, rough timeline, and a sample architecture memo.



.avif)

Comments