OpenAI is utilizing WebRTC for its Realtime API! Even better, webrtcHacks friend and Pion founder Sean DuBois helped to develop it and agreed to a Q&A about the implementation. It is not often a massive WebRTC use case like this emerges so rapidly. In addition, Sean was extremely transparent about his work at OpenAI. In addition, he provides some updates on Pion. We also covered some audience Q&As that largely overlapped with many of my questions.
The replay from the original live stream on April 10, 2025 is below. The Table of Contents give a good idea of what we covered. After that you will find an annotated transcript with my commentary and some follow-up information. I encourage you to follow this as you watch the video.
This is the full transcript. It has been lightly edited to help with readability, but should be close enough to follow along with the video. I include some extra notes and comments to add context and find more information.
Introduction
Chad: Welcome to another webrtcHacks live stream. Today we’ll be discussing OpenAI’s WebRTC implementation.
If you don’t know me, I’m Chad Hart. I’m editor and blogger over at webrtcHacks. I’ve been a WebRTC enthusiast for a long time now.
Just a few logistics notes before we begin.
One, I’m rebroadcasting this stream to YouTube on an old computer, so that might not be the best [experience]. I’m using RingCentral Events to broadcast this stream. It’s a product I spend a good amount of my time on. So, I do strongly encourage viewers on YouTube, to use the end-to-end WebRTC experience [instead]. Just go to webrchacks.com/livestream to register. [Then] you can jump over here to to get the fun stuff with everyone and get a lot more features.
I’m going try to keep my questions to around 30 minutes. After that, we can take some audience Q&A. Make sure to use the Q&A feature in the chat bar to enter your questions. I will peak over there occasionally, to see if there’s a question that’s relevant to some of the things I’m discussing, but, I’ll save [most questions] to the end.
Unfortunately, Sean and I both need to get back to our day jobs at some point, so I am going try to keep this whole thing to about 45 minutes. I am guessing we might go over a little bit. I will republish this video with the transcript to webrtcHacks. I usually add some annotations and links, so worry too much if you like you’re going to miss something – it will be reposted.
We have so much to discuss today, and as I said, there is never enough time. So, let’s get started.
Sean (re-)Introduction
Chad: I’d like to welcome – or I should really say welcome back – Sean DuBois of OpenAI. I will ask Sean to give some background on himself and, and how he got here. But if you don’t know him, he’s the founder of the very popular opensource, WebRTC project Pion, among many, many other things.
Sean, could you give everyone a background on what you’ve been up to in the last couple years?
I think actually the last time I had you on video [live stream], was at Kranky Geek during the, opensource round table a couple years ago. You were at Amazon at the time. I know you’ve done other things, but I’ll, let you get us to speed [including] how you ended up at OpenAI.
Sean: So my background – I got into WebRTC because I was at a startup that was doing… No, I was at Etsy and WebRTC had just come out. I thought it was just absolutely magical that you could exchange an offer/answer over IRC and do video. Etsy was paying, tens of thousands of dollars to a video conferencing provider. And I was just like [I can build that now]. I’ve been in love with [WebRTC] since then. I was doing Asterisk at the time.
My interests bounce all over the place. I have, like, Pion, which is, like, the opensource project that I work on. It’s funny to hear, like, you say, the founder. There’s actually four people that started it with me: John Bradley, Noah Schrader They all went and did other things but I kind of just, like, clung onto this thing that [we] started, and I’m still here. I also want [to give] a shout out: Joe’s in the channel – Joe Turkey
There’s like, other people like that [who] make the project what it is. I [always] just want to give credence to everyone. I’m a very small part of all of this stuff.
So I do that and I work at OpenAI on the realtime API, on ChatGPT, anything that does WebRTC. And I do the embedded stuff. [Also], the other thing I love is like, WebRTC broadcasting. So, I added WebRTC support to OBS. I have broadcast box from, from my time at Twitch. [It is] sort of disjointed, but I have all these different parts of WebRTC I love and I bounce between them.
Broadcast Box is a WebRTC HTTP Ingest Protocol (WHIP) tool for receivings streams with a pubic version Sean hosts on his site. I used it a part of my OBS Cracks the WHIP on WebRTC post where I reviewed the OBS implementation of WebRTC with WHIP.
Chad: You were over at LiveKit for a while too, right?
Sean: Yep. I was.
Chad: [That’s] a little relevant since OpenAI using some LiveKit too, right?
How did you end up at, at OpenAI? I mean, I guess, other than applying for a job and getting it, was there anything else?
Sean: No, that was actually it. I was at Twitch, and then I went to LiveKit. And then at LiveKit, I did the, the SIP stuff and the telephony stuff. And then after that, I never found a good fit. I got excited about embedded and I got all these other things. I wasn’t really doing good by LiveKit. I feel like I made them kind of unhappy with my distractions. I feel I do better at companies [where] there’s room to be distracted.
Chad: You do like lots of side projects.
The WebRTC job at OpenAI
Sean: Yeah, I like lots of side projects to the displeasure of my bosses. So when I went back to Twitch and it was a hard… It was hard because the RTMP and the HLS stuff was always bigger than WebRTC, but I really cared about the WebRTC. I was always, trying to push [WebRTC] forward, but it felt like, it didn’t have the mindshare.
Then Dan Jenkins just sent me a link for OpenAI and was like, “Oh, this is… You should apply for this.” And I just did. I didn’t know anyone. [I just applied]. I can’t tell you how many times I haven’t gotten jobs like everyone else. I’ve interviewed at Google like six times and not gotten it and stuff like that. So, I apply all the time to things with no intention of getting them. [At OpenAI], I just kind of stumbled into the job. It was cool my boss at the time was Jason Clark and just me doing WebRTC stuff. It was just so incredibly fun those first couple [of months]. But I’m jumping ahead..
Different WebRTC Implementations
OpenAI has 3 different implementations that include some realtime functionality. We will talk through these, but to summarize:
Chad: I thought maybe [your hire there] had something to do with LiveKit[, which OpenAI uses]. Given your LiveKit background, you would be a natural shoe-in.
All right. So, you joined OpenAI and they were already using LiveKit.. well, maybe you could explain it? One of the confusing things there is that there’s multiple different WebRTC implementations.
I think most people probably have used the ChatGPT advanced voice. That feature [uses LiveKit].
Philip Hanke – [aka] Fippo [and I] did a bunch of analysis and look at all these different [OpenAI implementations] back in December.
There’s that [LiveKit] stack. There’s the “old” realtime API, and then there’s a “new” realtime API with WebRTC. Can you walk through how [OpenAI] ended up [with] three different stacks? What was the state when you started? I’m guessing the [realtime with] WebRTC stack wasn’t there…
Sean: When I joined, the LiveKit stuff was already there, built into ChatGPT. The Realtime API is an entirely different code base. The realtime API and ChatGPT, are both front ends to using inference the [LLM] engines and stuff like that. They’re different front doors to the same backend.
The realtime API doesn’t use any of the LiveKit stuff, because [the Realtime API] was all WebSockets at the time. It was just a WebSocket API. It’s a big server that consumes WebSockets, takes that audio, decodes it and sends it to the [inference] engines, and all of that logic.
Chad: Are you a fan of the WebSockets approach? I’m guessing no – I’m curious to [hear] your thoughts on that. I have opinions there.. Why did they do that?
Sean: When I joined, my initial perception of the WebSocket stuff was confusion. You have to write more code to use this, and it doesn’t, automatically do audio capture, and all these things.
The use case that I didn’t understand is there is “faster than realtime”. You can imagine, if someone’s pulls up to Burger King and they want to order something, you want the speech-to-text and you want the speech-to-speech to come back instantly. Then the customer will run checks on it because it would be bad if you go and order something, then it responds with something completely different or [something] unexpected. So, there’s some use cases where you want just, WebRTC audio that’s flowing in realtime. And then [there are] a lot of use cases where you want [the audio and text] faster than realtime. So [here] it makes more sense to send it over a WebSocket where you can send [it] faster than [the speech on RTC].
Chad: That sounds like call center oriented [scenarios].
This is 10 years old now, but we had a post that covered WebRTC in call centers that give some background on those architectures by Rob Welbourn here.
Sean: Yes. But then, like, you have – a lot of customers that are just like, “I just want WebRTC. I’m not gonna be auditing it. Like, I just want the LLMs to go [to the user as fast as possible]”
ChatGPT Voice & LiveKit
Chad: Where was LiveKit? Walk us through what LiveKit was being used for.
Sean: For ChatGPT.
Chad: For the advanced voice feature?
Sean: Yep.
Chad: The ChatGPT advanced voice is LiveKit – it still is?
Sean: Yes.
Initial Design Considerations
Chad: I do want come back to where all this is going and how it fits. So you came in and built out all this new WebRTC stuff. What [were the] problems you were trying to fix? Like, what did you build? I assume it uses Pion?
Sean: It quickly became apparent that implementing WebRTC inside Kubernetes was challenging since everything was designed around HTTP and WebSocket protocols. The first couple of weeks involved building this demo and attempting to integrate it with our infrastructure. Eventually, I realized that if we wanted to ship quickly, we’d need to run this WebRTC implementation outside Kubernetes.
Chad: Okay. I assume it worked, and you convinced the team? That ultimately led to a parallel API alongside the WebSocket one that uses WebRTC?
Sean: My hope is that this WebRTC API is a parallel to the existing WebSocket API, so if you’re familiar with the WebSocket API, the WebRTC API feels exactly the same. You can send messages over the data channel, and you no longer need to handle audio separately. I would like to see the, this real-time API extend to anything that people need. If in 5 years, Media Over QUIC [MoQ] takes over, I want to be able to slot in Media over QUIC and not have it be a painful transition – or SIP, or any other thing. That’s my hope.
Chad: Why start a second and a really a third stack?
Sean: All the business logic of the real-time API was [in the WebSocket server]. That has all this business logic that ChatGPT doesn’t have. Like checking API keys, doing, safety checks, doing all these things. wanted to use the WebSocket stuff that we had already. So, I guess, like, what I could have done created a LiveKit room. I don’t even know how I would have done it, right? You can imagine, like, in this Go Pion server I’m just, translating WebRTC to WebSockets. It’s, like, really simple. I don’t know how I could have used LiveKit to do, this WebRTC WebSocket transfer because it has to occur in a process. I can’t have the real-time API join the LiveKit room.
There was no consideration of like, “Oh, I, I want to use LiveKit,” or, “I want to use something else.” I was racing against the clock to ship the code. I just joined the job and you you want everyone to like you, so you wanna build something quick.
Chad: No one was like, “Why didn’t you just use LiveKit?”
Sean: Everyone’s so incredibly busy. Like, that’s the thing.
I’ve been at bigger companies where everyone reall dives in on design docs, and you can weeks and weeks of conversation. Here, if you care about something, you can change, core features very quickly because everyone’s just too busy to ask questions. That’s why I like it. It’s nice that if you care about something, you can go do it. You’re empowered to do it.
Also, it was just me. There was no one [else]. My boss was also incredibly busy and I was the only WebRTC person. There wasn’t really anyone else to talk to.
SFU’s?
Chad: Okay. Is there a scenario where you would need to use an SFU for multiparty type calling? I guess you could talk to multiple LLMs at the same time – and one of them wins or whatever. Was that part of design consideration?
Sean: No. Like, I imagine someday we’ll have an LLM joining. I saw this with Cloudflare – you can have an LLM join the conference room so you can ask questions. You can have multiple LLMs talk to each other. There’s totally stuff that will come up like that.
It’s always funny when people analyze stuff. I just didn’t think any of this. I was just like trying to solve the immediate problem. [The approach I used] was pretty simple in my head.
Implementation Details
TURN vs. ICE TLS
Chad: I have a couple of implementation questions? [First], you have no turn server? I guess you’re using ICE TLS? Why didn’t you do a TURN server?
Sean: It was just like another thing to deploy. All I really care about is how many people can connect to [the API]. So, if I can offer UDP and TCP support with only having to run one service, I find that preferable.
It was the same with OBS. TURN is cool, but if you’re doing client/server communication, it’s another service that I have to run. It’s more latency, because you have to like run through another server. I always prefer the ICE TCP stuff just because it like brings down the complexity.
Chad: I assume you haven’t had too many complaints about not connecting? Or that you [would] need a TURN server [to connect] in those situations anyway?
Sean: We have metrics of how many people post an offer compared to how many people get to connected [state]. It’s never alarmed on a lot of users.
Audio Quality & Optimization
Chad: All right. The next area I am curious about is how you think about optimizing audio?
You’re not using Opus DTX. You’re not doing any RED redundancy audio. I suppose a client could implement client side noise detection. Maybe you’re doing that server side?
There’s two parts, right? You need to get audio into the LLM, and it needs to understand the audio well enough to like interpret the speech. It probably doesn’t have to be perfect. That’s one aspect, then [the user] wants to hear the audio coming back, so you’re synthesizing audio and sending it over [RTC]. You do want that to be very high quality, right?
How do you think about optimizing [both parts]?
Sean: I’m just using LibOpus via cgo in the server. Things like DTX I just haven’t even considered or worried about yet. I’m just like running it through an Opus encoder and decoder. The WebSocket API has Krisp. You can enable it in the session.update. You can turn on far field or near field.
Krisp is a commercial noise cancellation library – see RNNoise for an open source option.
Chad: Okay. So is that, some of these audio quality improvement things, is that something you see yourself doing and you just haven’t got to yet?
Sean: Yeah. just haven’t got to it because like it hasn’t been burning enough. Honestly, like most of the time the WebRTC stuff is so good out of the box and libWebRTC is so mature that people just don’t complain too much about the audio issues. Most people are like having issues with the LLMs, steering them or other things. Most of the effort of the company is not on the audio video transmission.
I love [RTC]. It’s super interesting to me, but on the burn down chart, it’s so low below the bar to convince someone that “Hey, this is the thing we need to work on.”
Chad: Maybe the LLM can live with poor audio anyway, at least on the on the inbound side [to the LLM]. Is there any evidence of that? Maybe you don’t care about sending great audio because [the LLM] doesn’t need it?
Sean: No, I do care about sending great audio. But even with like clear audio samples you’ll have these evals where [the LLM] will have trouble understanding numbers or trouble following instructions. So, people are so much more focused on things other than the audio quality.
I don’t know. I would like to come back to it and do more on it, but if… It’s just not what people are asking for. No customer has ever come to me and been, “Oh, the playback of the audio is really bad or it doesn’t understand me.”
Scaling architecture
Chad: don’t know how many users ChatGPT has now. It’s going to be smaller for the API [vs. ChatGPT]. Still [the API] is a pretty high scale service. How do you think about scaling this? I know, I asked you to draw some stuff. I’ll put it on screen too if that helps.
Sean: This is a rough sketch out of the VM – the Pion service that [is] doing that like WebSocket to WebRTC translation.
A user request comes in via a POST of – here’s a WebRTC offer…. and I have a GeoIP DNS, [so] I have it boxed to latitude and longitude. And depending on what your latitude and longitude is, I send you to a different regional load balancer in Azure. Then, that region then has a bunch of VMs inside of it that are load balanced between them. I can spread the load out between a bunch of different virtual machines. And all those virtual machines then do is take that WebRTC-ness and convert it to a WebSocket and use the same real-time API WebSocket that other people are using.
Chad: Do you think there’ll be a day when you don’t need to go through the WebSocket interface and you can send audio more directly?
Sean: Yeah. I have a couple things that we’ll need to solve for, like how do we get that business logic out of the real-time API WebSocket server? Around, billing, limits of sessions, safety, and recording features.
But the question would be, like, why? If we move that code out, would it give customers lower latency? There has to be, like, a clear benefit. I think there will be eventually. As time goes on and these things get more complicated it becomes more latency-sensitive, it’ll be then. But like-
Chad: I’m not surprised there’s not more latency sensitivity already.
Sean: Well, since everything’s, like, turn-based right now it doesn’t seem like the latency’s too big of a problem. And also, since it’s all in the same data center, we’re talking milliseconds of latency. There’s no congestion. There’s not a lot of latency between these servers, like those virtual machines and the real-time API WebSocket because it’s all just sitting in the same data center.
WebRTC in ChatGPT Operator
Chad: What’s coming next, but did you have any roadmap plans or anything you can, you can share?
Sean: The thing that I’ve been working on recently is Operator. Like, how you tell a computer user agent to go book tickets-
So right now, that uses video over WebSockets, and it has like a seven-second latency. So, I’m working on switching that to WebRTC. So that’s something that I’m working on. It just uses, an existing open-source project called Neko.
Chad: All right. Kind of like a screen capture to WebRTC?
Sean: Yeah. So just does GStreamer and it does like a X11 grab, and then it like sends the buffers into Pion to send them.
I came across Neko recently in my trending open source WebRTC projects analysis. Neko is a neat project that gives you a browser you can record, broadcast, or connect to WebRTC among other features.
Demo of Neko from the Neko repo
Chad: Awesome. I’ve only played around with Operator a little bit. Now you got me scared that there’s a seven-second latency, but I suppose it’s not responding that quick. It is kind of slow. That would explain partly why.
Sean: I don’t really know where all the latency comes from when it’s doing, like, the VNC over WebSocket stuff today. I never really looked into it. I was just like, “I know Neko works out of the box with sub-second latency. Let’s just use this thing instead.”
What’s New in Pion?
Chad: I know there’s a lot Pion fans in the audience. I did want to spend a minute or two on what’s new with Pion too. I guess it does relate here, because you use Pion in [OpenAI]. My first question on Pion – are you doing things in Pion specifically for OpenAI?
Sean: Yeah, I made a couple small commits. You could only get the local candidates and I added the remote candidates. Nothing deep. Just little changes to make stuff easier.
I think the biggest things that are exciting for me in Pion right now is that I would really like to see more of the RTP, RTCP stuff come to greater maturity. So it’s like the net… Like, before, like, we, we used the same SSRC for retransmissions, and then we finally switched to a distinct SSRC so you could tell, like, is the packet… Like, is it a late packet or is it a retransmission that finally arrived? Um, and then it has, like, other issues. So I think that’s like the biggest… Is like Pion hit this point where you can build anything you want with it, but then the quality of service can be better. Like, there’s definite- There’s like a very clear gap between, like, standing up, you know, libwebrtc and Pion and like, I’ve just… It’s a- It’s a- It’s hard to close that gap, because it’s hard to find, like, developers that wanna go do that kind of stuff.
Chad: OK. Anything else on the Pion side you want to talk about?
OBS & WHIP with Simulcast
Sean: I’m continually excited about OBS WebRTC.
Chad: That was gonna be my next topic. Before this recent [OpenAI] topic, we talked a lot about the OBS with WebRTC launch. How is that going?
Sean: Good. The simulcast PR was approved and I’m just waiting for someone to hit that merge button. I just think it’ll make such a dramatic difference if can do client-side encoding. It just kills me. Like, I’m desperately, waiting on that one.
Chad: Anyone that’s interested or wants some more background [on OBS with WebRTC], I had a post on that Sean helped review and give me insights. That was 18 months ago now. It’s been out there for a little while.
That post where I reviewed the OBS implementation of WebRTC with WHIP is here again for reference: OBS Cracks the WHIP on WebRTC.
Chad: So no one has done the merge on Simulcast yet?
Sean: I don’t know why it’s so [hard]…
Chad: Just to explain the idea [of Simulcast on WHIP]: WebRTC supports Simulcast where you can send streams of multiple different qualities. That’s expensive to do server side because then you have process and reencode the video [for each layer]. If you could just do that [encoding] client side – like we’re doing now already in this video call – and just send those streams, it helps offload the server. I guess in theory there might be some case that it’s going be better quality [too] – less opportunity for the network to mess things up in between.
Sean: I think there’s definitely better quality that way. If you send 1080 and then the server decodes and reencodes it, you introduce generational loss—resizing, resampling—which is frustrating. Simulcast is also better from a security standpoint. I want to reach a point where I’m not sending my video unencrypted through servers—maybe I upload it with some [auth] tag, and only then does it get played out. I don’t want a world where someone can insert ads into my video. I can imagine a company thinking, “Hey, let’s start inserting ads in the background,” and then that becomes the new normal. I just want to get ahead of that and build something good for everyone. I don’t know.
Audience Questions
We let the audience ask some questions.
MoQ not on the horizon
The WebPlatform has exposed many more APIs that open the opportunity to do much of what WebRTC does in different ways. Media over QUIC is a specification currently under development that leverages the QUIC the HTTP/3 protocol for providing media over the Internet at a range of latencies, including realtime.
MoQ is slowly emerging, so I wanted to know if Sean was looking at this with Pion.
Chad: I want to spend a few minutes talking about interesting developments in WebRTC. There are a few questions in the Q&A about QUIC—specifically Media over QUIC (MoQ). Maybe we can discuss that for a bit. It could potentially be relevant to what you’re doing at OpenAI too.
So, what are your thoughts on MoQ? I know you’ve been involved in some of this media-over-QUIC discussion. What’s Pion’s involvement, and how do you feel about the initiative?
Sean: So I haven’t been involved with Media QUIC at all.
Sean: I chalk it up to just my personality. I want to build things that are useful to people today. I’m a very like hacky, scrappy kind of builder guy. So that’s kind of where [Pion’s] been. I just don’t have a lot of interest in going and arguing with people about how things are done.
Luke put a lot of effort into this stuff and then he like writes a post about like how he’s very frustrated with the process. I’m like, “I’m very happy I’m not in that same position.” It does seem like an admirable process to go build things better. I feel that with Pion. I want to go build… I’m very aspirational that I want to go build these things better. But the nice thing with Pion is like no one can stop me from doing that. I can just go build, but [without] standards and all that. The politics of having to work with other people seems very frustrating. So that’s what keeps me away from it.
Chad: So the takeaway from that is you don’t see yourself getting too involved in [MoQ] until it gets a little further along in terms of actual things being used in browsers?
Sean: If it got to the point where like developers were coming to me and saying like, “Hey, I want to use QUIC and it would be great if I could do this, that, this, that,” [then] I would start building something with it. But at this point, I feel like I don’t have anything to add.
Sean: I think developers should build with whatever gets them to their goals quickly and in a way they enjoy. If you can use Go and build your app fast, go for it. If Rust is better for you, use Rust. A lot of people push their own tool because they just want to see it used, rather than focusing on what’s best for users. Pion came from my frustration that libWebRTC was always considered the default solution. But I’m not here to say WebRTC is the answer to everything; just use what’s right for you. For some, more control is essential—maybe for Media over QUIC—while for others, a straightforward WebRTC setup works fine. For example, I’ve done remote control projects where Meiko was handy since it already used WebRTC. If there was a Media over QUIC solution that worked great and deployed easily, I’d choose that. There’s no perfect protocol or software—it’s about what solves your problem now.
So, short answer: it doesn’t look like there will be MoQ in Pion anytime soon.
How does Sean stay so motivated for Pion?
Chad: I do have a related question. You’ve got a family, a job, and yet you put out a huge amount of code. Do you ever sleep? Or do you just take time off work? Are you a super genius who never makes mistakes?
Sean: Well, you said it for me! Actually, I do vanish from Pion at times, and sometimes I’m not the best employee because I’ll work on Pion instead of my day job. Ultimately, I’m just trying to live life to the fullest, and Pion has played different roles for me. At first, I had low self-esteem and wanted to prove myself, so I was eager for attention and validation. Then, when my dad passed away, working on Pion was therapeutic. Eventually, people started using Pion, so it became part of my identity and what I do.
Really, I’m happiest when I’m doing things for others—helping people with open-source software. If someday I’m 70 and I can look back and say I spent time with my kids and got lost in this open-source work, that’ll be enough.
Chad: So Pion is basically your hobby?
Sean: Pretty much, along with projects like adding WebRTC to OBS. I see that I can’t fix everything in the world, but I can make WebRTC broadcasting simpler. That’s what I focus on. With Pion, I hope to help others, create opportunities, and support them however I can. That’s enough for me.
SIP into OpenAI?
Chad: An audience member asked the question: What are your thoughts on bridging in SIP Trunking into OpenAI? I understand LiveKit has a SIP bridge, but curious if you have thoughts on overall telephony ecosystem and AI today
Sean: I did the SIP stuff for Live Kit.
My hope is that the real-time API will support SIP eventually. I can’t promise roadmaps or timelines or when I’ll find time to do it, but that is like the next thing I think that would help people the most.
I’m always frustrated by protocol bridging because there is information lost. When you bridge like SIP and WebRTC, the RTCP means different things across [those protocol implementations] because, you know, different software. My goal – I just want to make things as easy as possible.
Chad: When we were getting ready a couple of minutes before [we went on air], you asked me how I got into this [WebRTC] stuff. I said I worked Acme Packet that was a Session Border Controller [company], which is essentially a protocol bridge. I still find it amazing [this is still done by specialized software vendors] when it so easy to bridge protocols – just go ask ChatGPT. You have a well-defined thing here and well-defined thing here, so it’s very easy to just go map it, right? I’m still amazed that we’re, uh, still doing a lot of this protocol bridging,
Sean: We got to, otherwise we’re gonna lose our jobs. This is like, all I get paid to do. You’re wishing me out of a job here.
Chad: I’ve been out of the telecom world a little bit. A lot of it was just “let’s just make more protocols for job security”. Right?
Sean: Absolutely.
Chad: It’s even worse in the 3GPP and the wireless telecom stuff. Anyway, that’s a different topic.
How to build a career in WebRTC?
Question: How can a dev build a career in WebRTC?
Chad: One other… This is a more general question from Anonymous. How do- how can a dev build a career in webRTC?
Sean: My advice is to build something that clearly reflects your work—where success can be attributed back to you. I’ve done open-source contributions to projects like GStreamer, but if you can’t show exactly what you built, it’s harder to gain recognition. I notice people who create interesting projects with Pion often turn that into a job. I encourage newcomers to Pion to tackle a self-contained feature—like bandwidth estimation—so it’s clear what they alone contributed. Simply picking off small tickets doesn’t earn you the appreciation you deserve.
I also believe in saying “yes” to everything. Whenever a new Pion user comes in with a problem, I help them. You never know where it’ll lead. Seven years ago, I could never have imagined that working on Pion would land me at OpenAI. But here I am.
Chad: Yeah, no one knew much about OpenAI five years ago, but I get your point.
Question: I noticed your recent work in bringing WebRTC for embedded devices. Do you see WebRTC emerging as a clear winner for resource constrained embedded devices in the future for realtime, low latency comms?
Chad: We chatting and I told you I wanted to hold the conversation embedded [until now].
I personally have been playing around with WebRTC on embedded devices [for a long time]. My [webrtcHacks] post from a few weeks ago is on this. Well, I don’t know if you count the Raspberry Pi as an embedded device anymore, but it is kind of in that direction. The use case there certainly is. I am curious [to hear] your thoughts [here].
Sean: WebRTC is nice because you don’t have to write anything on the client side. You just exchange an offer and answer. Like with WHIP [that is what] I think is what will help with WebRTC as the clear winner [vs. other embedded protocols].
I guess the only other alternative would maybe be SIP that could be like the winner for low latency comms. But I don’t see anything that has all of the right pieces to be the winner for low latency comms. WebRTC is open source, it’s not owned by anyone, it’s like sub-second latency, it’s got like implementations everywhere. But you’re asking, like, one of the most heavily biased people of all time, like all I do is WebRTC.
Chad: Early on, I thought there’d be a lot more WebRTC-based IP cameras. I think part of it is just that the the stack is heavier to run versus RTSP. I did think there’d be a lot more replacement of basic RTSP cameras with WebRTC. I haven’t looked so deeply recently on this.
You can talk a little bit more about some of the even the OpenAI embedded projects being able to run on a ESP32 device. Why don’t you mention that a bit in your 12 Days of Christmas video? There you gave a little demo – maybe you could talk a little bit more about that too.
Here is the link to that point in the OpenAI’s Dev Day Holiday Edition—12 Days of OpenAI: Day 9 video that featured Sean:
Sean: Back to the RTSP – I’ve been around this problem for a while. When I was at AWS, we worked on making a pure C implementation for embedded cameras, They were going after customers that were doing RTSP and RTMP. But because [those are] not secure by default and it became a problem. RTSP and RTMP are baked into the cameras. So now I’m hoping with WHIP and WHEP that we can get WebRTC into the cameras. And then there is a spec.
What’s like the standard that cameras use. There’s like a specification and like the specification is adding WebRTC support. So, we’ll have WebRTC in security cameras by default coming soon. like I don’t have any of the insight into that industry.
Chad: Ultimately it comes down to the chipmakers need to build something in. I agree at WHIP makes a big difference. [They] just [need to] write to the WHIP interface. I suppose it’ll start with the broadcast industry doing that, for higher-end encoders and it’ll make its way down to the cheaper IP cameras eventually, right?
Sean: Yep. So we’ll see. And then they’ve got… You’ve got SRT that’s floating around. That seems like the only big competitor in that space. It’s really just SRT versus WebRTC. I think SRT has already won in the big space. My opinion is SRT has a behemoth behind it – and how long will that last? You know, that’s like big corporate Unixes.
Someone shared in the chat, the ONVIF WebRTC spec. I’m very excited for that. I think that, like, will be great for the future.
The Open Network Video Interface Forum (ONVIF) is an industry alliance for IP-based physical security products, like cameras and video recorders. They recently added a formal WebRTC specification aimed at reducing latency and simplifying connectivity between security devices and browser-based clients in this ONVIF-WebRTC-Spec.
WebRTC DataChannels and LLMs
Question: A more global question about LLM and WebRTC. Do you think LLM’s will communicate one to another over webRTC data chanels ? Could be a great way to make several LLM working together. What do you think what are you views about it ? Extreme low latencies to communicate one LLM to another ? I think there could be great engancement with that but i do not know. What is your vision with this?
Chad: We have a global question about LLMs and WebRTC. Do you think LLMs will communicate with each other over WebRTC data channels to achieve ultra-low latency? Could that be a good way to make several LLMs work together?
Sean: I’ve heard some people describe LLMs like a nervous system—you have fast ones that talk to slower ones. But no one’s mentioned a standardized way for them to communicate yet, because it’s unclear how that might work or what it’d look like in practice. I think it’d be interesting to use something like WebRTC or another open standard. Maybe you could have a powerful GPU box in your home, running multiple LLMs that talk to each other.
That said, if you standardize too early, it might be ineffective since you don’t fully understand the problem space. We know how to standardize audio/video with WebRTC, but LLM communication is different. I’m not an AI expert; I just enjoy playing with it. So I’m not sure what the “lingua franca” for LLMs would be.
Chad: Right. Well, when they do want that, they’ll probably ask you.
Sean: Sure. I’ll let them know.
Reaching Sean and More info
I asked Sean to share some links for anyone that wants to follow-up with him:
Sean: the Discord is really nice ’cause you can just come in and talk to everyone. But if you wanna talk to me personally, send me an email.
Joe has this really nice thing where he has like a- a channel just of like interesting links. So I think if you join that Discord, then you’ll like WebRTC for Curious and other implementations, all that stuff. Like I think that’s a good jumping off point.
Chad: I always say, Sean, you always get me pumped up about WebRTC. I’ve been doing this for a long time – sometimes you get down because it’s always the same stuff. Whenever something’s new and exciting and we’re changing the world. WebRTC has now penetrated the world [so is not so new. It is exciting to see [WebRTC] get into like major use cases [at] OpenAI. It is extra exciting to see that you’re involved in it and that you’re still willing to make what this WebRTC stuff you’re hobby. Hopefully you have other hobbies, but it seems like its your sole hobby.
Thank you for keeping me going.
Sean: I love it. You just always find new stuff and that’s the beauty of life. I feel so lucky to be here. I thought I would be stuck in Toledo, Ohio my entire life. I can’t imagine that I get to work with all these things and talk to all these people. This was a way more colorful life than I ever thought I would have. And it’s all thanks to open source and WebRTC. So I’m hooked.
Chad: Okay. Oh, awesome. Well, thank you again and see you all next time.
Leave a Reply