This might be my first editorial style post here. Fippo’s Is everyone switching to MoQ from WebRTC? started some threads on MoQ vs. WebRTC. I started to respond, but my responses quickly became too long so I decided to go even deeper with a post here. Fippo’s post shows hard data that Media over QUIC isn’t here yet and isn’t about to instantly take off, but that still doesn’t mean it will not in the near future. HTTP/3, QUIC, and WebTransport are new Internet protocols with inherent technical advantages. So, will MoQ eventually replace WebRTC?
I have issues with the framing of that question. There are many different use cases for sending media over the Internet. MoQ is pitched as a universal solution, but I think that is worth some scrutiny – at least to help determine where it is mostly likely to emerge first.
This post is not going to go into deep technical details on MoQ. For background and the latest on MoQ, I recommend checking out a couple talks from RTC.on last month – Will Law’s A QUIC Update on MoQ and WebTransport and Ali Begen’s Streaming Bad: Breaking Latency with MoQ. The latter shows a bunch of demos you can try yourself here: https://moqtail.dev/demo/
Special thanks to Renan Dincer and Sergio Garcia Murillo for their feedback and review of this post.

Contents
- MoQ Replacement Plausibility by Use Case
- Conclusions
MoQ Replacement Plausibility by Use Case
Media over the Internet evolved from 2 different camps – video calling and video streaming. Video calling has largely coalesced on WebRTC. Video streaming involves a variety of technologies, but transmissions to viewers that are mostly based on HLS today. Both camps have alternatives – i.e. SIP for video calling and DASH and RTSP but I am going to focus mostly on WebRTC and HLS, since those are the predominant protocols up for replacement by MoQ.

The overlap between these domains is particularly interesting, but I will cover the major video calling and video streaming use cases first.
I am also going to give my bet for WebRTC vs. MoQ in each use case.
Video calling
1:1 (human:human)
One-to-one calls are very common. In his post, Fippo makes strong case for:
probably 40-50% of the time WebRTC is talking to a server
Our many blackbox evaluations shows, no major service makes use of “full mesh”, so this means 50 to 60% of all calls are 1:1.
If there is no need for recording or server-side analysis, then 1:1 is generally implemented as pure peer-to-peer (p2p). This is a light weight architecture were only a signaling server is needed to rendezvous the 2 clients (and maybe TURN server for unusual network situations). WebRTC was made as a p2p protocol at its core. 1:1 implementations are optimized to avoid server usage which adds cost and privacy concerns.
MoQ is a client-server protocol, so I don’t see how MoQ will ever be ideal here unless MoQ vendors can make an argument that putting more infrastructure in between two users is better. I have worked on many projects/services that tried to make that case several times, but I have never seen the actual results justify the extra cost.
| My bet in 2030: | WebRTC will still be used for 1:1 calls. |
Voice AI (1:1 but one party is AI)
Voice AI is a relatively new use case for WebRTC where you send your audio and video to a real time LLM and its respond back with media stream. As Cloudflare recently wrote in late August:
WebSockets are perfect for server-to-server communication and work fine when you don’t need super-fast responses, making them great for testing and experimenting. However, if you’re building an app where users need real-time conversations with low delay, WebRTC is the better choice.
On the other hand, this use case is client-server vs. peer-to-peer. WebRTC’s has a comprehensive, but relatively slow Interactivity Connectivity Establishment (ICE) for creating a connection between two peers that could be behind a NAT or firewall. Voice AI systems sit on a public server, so the much of this process is overkill.
Do real-time LLM makers care about better media?
I wonder how much of a real-time media problem there is to solve here since WebRTC doesn’t seem to be a real priority to any of the AI Hyperscalers. OpenAI uses WebRTC largely to connect to send media to and from browsers. But it is implemented as an add-on gateway to its WebSocket system – not a native part of the platform (see my interview with Sean DuBois at OpenAI for more on this).
OpenAI is actually the furthest ahead on WebRTC. Ironically, Google – the main force behind WebRTC – doesn’t appear to use WebRTC at all in Gemini. You won’t see anything if you run chrome://webrtc-internals on https://aistudio.google.com/, https://gemini.google.com/app, Gemini in Chrome, etc. and I haven’t seen evidence it is any different on native mobile.
WebRTC people like to make the case for low latency in Voice AI. But if latency is really a major problem in Voice AI, why isn’t WebRTC a truly native part of speech-to-speech Voice AI systems?
Media for Humans vs. LLMs
webrtcHacks regular, Gustavo Garcia commented that he thought Voice AI could be a fit for MoQ, so I asked him why?

To summarize his argument – WebRTC is really good at transmitting human voice, but doing that is intrinsically hard, so the solution is complex. LLMs don’t need all the extra processing that humans require and don’t need such a complex solution, so MoQ could be a better fit here.
Gustavo also brings up another point about local speech recognition – you can extend this to speech synthesis too. The browser already has JavaScript APIs for this. However, to act more human, voice LLMs also interpret the user’s tone and accent and adjust theirs. See Gemini’s Affective Dialog for just one example. Current, commodity speech engines on devices don’t convey or interpret this extra information today. I do expect we will start to see on-device tokenization of media with newer neural chips embedded in our phones and laptops, but this isn’t happening yet.
Encoding tokens
There are some standardization efforts for encoding media streams for ML models already. For example, MPEG – the group that brought us the H.26X series of codecs – has a Video Coding for Machines (VCM) spec under MPEG-AI. Instead of building a codec for human viewing, this codec codes for what machine detectors need. The standard, ISO/IEC 23888-2, is currently in the Committee Draft phase of development. It also has a sibling spec, FCM (Feature Coding for Machines) that goes a step further by compressing intermediate features/tensors rather than pixels (see the demo video). On the audio side, ACoM (Audio Coding for Machines) is an official MPEG Exploration with a Call for Proposals (Apr/Jul 2025) to define formats for machine listening and audio-feature transport
Splitting inference between the client and the server sounds like the future to me. Standardization efforts are still in the earlier phases here, but perhaps LLM vendors will make their own proprietary mechanisms for this. When it comes to coordinating transmission of these codecs, I could see these developers using MoQ because it is new and potentially easier to work with. However, I could not find much direct evidence of this outside of some academic papers (example here).
| My bet in 2030: | Hopefully something other than raw media over WebSocket, but it might not be WebRTC or MoQ |
Meetings
As Fippo mentioned, there were many video conferencing vendors prior to WebRTC, but almost all of them moved to using WebRTC (Zoom being the last significant holdout). Multi-party video meetings is the hardest use case of all the items here – maximizing video quality for a bunch of different users with constantly changing bandwidth and minimal latency tolerance is hard. Sophisticated bandwidth control mechanisms like gcc took years to develop and tune (see Kaustav’s great webrtcHacks post on gcc probing to get some appreciation for just part of that).
MoQ doesn’t have any of that today. It needs to be built. However, libwebrtc’s work here is effectively done and is open source for all to reference. Codex/Claude/your-favorite-coding-agent are very good at adapting existing code, so I don’t think it is impossible for MoQ to build something like gcc tuned for WebTransport in the foreseeable future if the motivation is there.
What would be a motivation? I think WebRTC app developers would consider switching to MoQ if it could achieve a measurable quality improvement. The bigger the improvement, the more the motivation to improve. Would I spend a lot of effort on switching stacks for a 1-2% improvement? Nope. More than 10%? Likely depending on the effort and risk.
Do we need better video quality?
Right now, MoQ has a long way to go to even be a contender. And if MoQ’s main adoption driver is better video quality, then there needs to be user demand for better video quality than they get today. If users start demanding 4K or AR for their video calls, then maybe any marginal gains MoQ can provide will start to matter a lot more.
We can use screen size as a prerequisite for larger video camera resolutions. If you have 4 callers with 1080p cameras in a 2×2 grid in full screen, then your screen size needs to be at least 3840×2160 to fit everyone at their full resolution. Unfortunately, there isn’t great public data on this, so you need to consider your user data. Every time I have looked at the adoption of screens larger than UHD (1920×1080 i.e. 1080p) I have been disappointed. statcounter shows that less than 8% had a resolution larger than 1920×1080 on Desktop, though there is a significant “other” category that could include high resolutions too. I looked at the Google Analytics stats for webrtcHacks in October too – you all have better hardware on average than most, but still less than 3% of you are viewing on screens with resolutions of 4K or better.

The reality is, today the hardware does not seem to be there to display a bunch of users in ultra-high resolutions in video meetings today and that percentage hasn’t moved quickly. There are certainly many niche applications that need very high resolutions in meetings-style environments, but will it be worth it for them to invest heavily in optimizing MoQ when WebRTC already works?
| My bet in 2030: | WebRTC stays |
Video Streaming
I am going to only briefly cover “traditional video streaming” here – the variety that has been around since the Flash days and is mostly HTTP Live Streaming (HLS) today with some other formats like DASH. This is a huge market with many vendors in an established ecosystem. It is also not WebRTC, and this is a WebRTC blog covering the left side of my Venn diagram above, so I am only going to touch on this.
I will also review Low-Latency HLS (LL-HLS) and DASH’s low latency variant for live streaming and how those compete with WebRTC and now MoQ in the next section.
Traditional video streaming
Most video streaming has built in latencies of many seconds – usually anywhere from 6 to 30 seconds. Video is converted into smaller chunks and distributed on a CDN. WebRTC has no place here. WebRTC is designed for other uses and no one is recommended it here.
Unlike WebRTC, MoQ does have some advantages in traditional longer-latency streaming, but those are most apparent on flaky networks. If you ignore latency, I am not sure MoQ is compelling here. This category is only distinguished from “live streaming” because low latencies weren’t technically feasible when HLS was initially developed. Lower latency options have evolved into a new market (or submarket).
What if you could get lower latencies out of the box without all the extra complexity? That is what MoQ is pitching. On that, let’s look at the “in-between” architectures.
| My bet in 2030: | Look at MoQ only if lower latency is important |
In-between Use Cases
As we just covered, WebRTC started out on the left for real time interactivity with many streams for video calls and meetings. Video streaming is on the right for one input stream with many seconds of latency for thousands up to 10’s of millions of viewers.
Then meetings started getting bigger and as they get bigger most attendees end up being viewers – more like the streaming scenario. WebRTC systems have scaled up. Now some allow up to 1000 viewers.
On the streaming side, services like Clubhouse, Periscope, Twitch, and Mixer demonstrated the value of letting a host interact with an audience. The lower the latency, the better the interactivity. Low latencies also allow internet streaming to act more like traditional over-the-air broadcast TV, which is expected for sports and other live events.
This introduced a gap in between what WebRTC and HLS was doing. Both sides started to push into the center from their respective sides. As I’ll explain below, this is likely where MoQ’s sweet spot will be.

Live Streaming
No one ever says “I wish there was a longer delay between me and the live event I am watching.” The whole point of “live” is that it is actually happening in the moment – as close to physically being there as possible. The TV broadcast industry has set the live-event latency expectation at seconds.
By optimizing HLS and reducing chunk sizes, LL-HLS can get down to 2 to 4 seconds. DASH is the same. This seems to be the fastest this technology can get in practice.
Live streaming services generally use RTMP (/RTMPS)for ingest, which adds several more seconds of latency. Several seconds can be a long time if a live stream host is looking for some sort of instant audience feedback like they are used to having in video meetings software.

The solution to go even faster: use WebRTC like meetings do. Instead of transferring chunks, a series of SFU’s replicate and relay live streams to users. Latencies are typically less than a second. WebRTC HTTP Ingest Protocol (WHIP) and WebRTC HTTP Egress Protocol (WHEP) help facilitate plugging low latency streams into and out of real time streaming networks.

WebRTC vs. LL-HLS
LL-HLS has an advantage for replay – once the chunks are on the CDN, they are there for watching anytime. WebRTC streams are ephemeral – there is no replay mechanism built-in without an additional recording system. In practice most streamers encode multiple renditions – so LL-HLS for live viewing and multiple HLS streams for worse networks and high quality replay. WebRTC-based systems often also use HLS for replay.
Which is cheaper? I have analyzed this a few times in the past and the answer was never that definitive. It comes down to how much you pay for a CDN vs. how much you pay for SFU processing. However, both of these costs are dwarfed by egress bandwidth charges once you reach scale. At the protocol level, WebRTC’s better bandwidth adaptation usually means it sends less information, but these overhead differences are usually only a few percentage points. The major differences in bandwidth usually come down to the choice of codec and there are comparable codec options for both systems and this is very much up to the implementor.

How is MoQ different?
MoQ aims to provide a best of both worlds approach between WebRTC and HLS/DASH.
Unlike LL-HLS, which relies on playlists and (partial) segments optimized for CDN delivery, MoQ publishes a continuous live feed that CDNs/relays can cache and fan-out immediately at the live edge, minimizing playlist churn and catch-up delays. This is basically the same approach a WebRTC SFU takes, but more formally specified through a bespoke publish/subscribe model in the architecture.
Also, because it rides on QUIC, MoQ benefits from independent streams and fast connection setup, which helps reduce startup/join time and avoids stalls during congestion bursts. Relays can also prefer the newest live data so late joiners land closer to “now.”
MoQ aims to pair WebRTC-like interactivity/latency with HLS/DASH-like distribution and caching, making it especially attractive for one-to-many live events where you need sub-secondish join + Internet-scale fan-out without maintaining thousands of individual real-time sessions.
Much of what MoQ brings to live streaming, WebRTC already provides using different mechanisms over UDP. However, unlike WebRTC, MoQ also has a Fetch mode where it operates like a traditional HLS CDN. This is an advantage when you also want to make the video available for playback since theoretically you can have a single architecture (MoQ) vs. 2 architectures (WebRTC & HLS).
Finally, remember that RTMP is most commonly used as the ingestion protocol. WHIP replaces that with WebRTC. MoQ could be used for both ingest and egress, again simplifying the architecture with a single protocol.

| My bet in 2030: |
|
Webinars / Townhalls
Meetings software is usually designed to allow interactivity from everyone, at any time. In Webinars and townhall types of use cases, usually there is one or a few presenters and most of the audience is watching most of the time, with sporadic interactivity by only a few audience members.
There are a variety of different technology choices here, depending on how much interactivity is really needed and what process you want participants to go through to have a live interaction. This usually comes down to one of two approaches:
- Put everyone on WebRTC and treat it like a massive meeting – but put most of the audience in a view-only mode to conserve resources
- Broadcast over HLS and have the audience member join the live session (usually based on WebRTC) when they want to participate – this isn’t instant because they may be many seconds behind on the HLS stream
Some platforms let you pause or watch out of sync with the livestream. Others force a live-only.
Can MoQ help here? Maybe, but this use case is much more fragmented. For example, if an important feature is letting people form smaller breakout rooms, then they are likely to stay with WebRTC. On the other hand, if the webinar app looks more like a live streaming app without having audience members come on stage, then MoQ could be a good fit.
| My bet in 2030: |
|
Conclusions
So here is my summary take:
| Use case | Major Tech Today | MoQ fit |
| 1:1 calls | WebRTC | 👎 |
| Meetings | WebRTC | 👎 |
| Voice AI | WebSocket | 🤷 |
| Streaming | HLS | 🤏 |
| Live Streaming | RTMP/LL-HLS & WebRTC | 👍 |
| Webinars | LL-HLS & WebRTC | 👍👎 |
Fit doesn’t equal commercial viability, but is a prerequisite.
There are many other use cases. Meetings and streaming are the major ones. There are certainly many other use cases that may be relatively small compared to those, but significant to you (especially if it is your project). No doubt MoQ will be a good fit for many of those, but I did want to examine the largest potential MoQ use cases today.
Will MoQ make a dent into WebRTC usage?
Probably not, but it is a very big Internet and there is plenty of space for both. I expect we will have many more MoQ posts here. In fact, I just bought moqHacks.com 😀
{“author”: “chad hart“}





Great write-up Chad, lots of detail, thanks for this.
Thanks for the reference, I would appreciate if you could fix my last name (should be Begen).
Thanks Chad for the great blog post. How about the Use Case Cloud Gaming (or in general Remote Rendering and Streaming)?
If I do a v2 or look at other applications in a future post, I definitely will include some of the WebRTC Data Channel use cases.
WebTransport by itself has many advantages over WebRTC Data Channels for low latency client-server applications. I am not as sure about peer-to-peer or situations where the data is accompanying RTC media.
“WebTransport vs. WebRTC Data Channels” would be a great post topic if there are any qualified volunteers out there!