I have been updating a WebRTC in Open Source dataset derived from GitHub event data for 10 years now. I periodically update this to look for recent trends on WebRTC activity, popular repos, and new API usage. I hosted a live stream of my 2024 review back in December where Tsahi Levent-Levi joined to help moderate. You can see an edited version of that video below.
The main sections are:
- Introductions
- Methodology
- General WebRTC Trends
- Top Repos, Orgs, and Users
- WebRTC Technology Trends
- Conclusions
- What’s next – methodology improvements
This post is an edited and annotated transcript of that video with all of the charts with some additional comments. If you watched the live stream, then this will be a refresher with some clarifications. If you missed the live stream, then you can use the post to follow along or review to the sections you care about.
{“editor”, “chad hart“}
Introductions
Chad: So today, we’ll be going through and review of some quantitative GitHub WebRTC analysis I did. Maybe to start, we can introduce ourselves. I’m Chad Hart. Hopefully you know who I am because I invited you all here. I am the editor of webrtcHacks.
Because I hate talking to myself, I invited Tsahi here. You always have great commentary, which I’m sure you will interject.
Tsahi Levent-Levi is a long-time WebRTC compatriot. He offers a variety of WebRTC Industry services over at BlogGeek.Me. We also ran the Kranky Geek WebRTC Event series together so you will hear a few mentions of that too.
Chad: What are we doing here? We’ll get into some general health trends. We’ll talk about top repos, orgs – who the big contributors are in WebRTC. And then I did some analysis looking at different WebRTC plus different technology trends to see how things are shaping out and what’s interesting. In the world of WebRTC, which I’m sure we could spend a lot of time talking about, well, today we’ll just stick mostly to the data.
Methodology
Chad: Back in 2015, I noticed that Google BigQuery published all GitHub commits there [in a public database]. I thought it’d be interesting to mine that data to look for trends in WebRTC. I’ve been maintaining and improving this methodology – looking to see what’s interesting, who’s the best – that sort of thing.
The core of my methodology is the BigQuery GitHub [Events] database [add link]. Colab is another nice tool they use, because you can actually run queries. I use this mostly because it have a notebook to kind of keep track of what I’m doing. Then to make pretty charts, there’s nothing better in my mind than Excel. So, I usually dump stuff in Excel and make a lot of the graphs there.
What is this data? In an ideal world, I could go through and search all the source code on GitHub and do a bunch of analysis there. But that’s way too big and not readily available, so that’s not practical or possible today. But we do have this Events database, which has all the events that come out of GitHub here. Let’s do some analysis based on the different types of events you can see here:
I mine for keywords within those events. The hard part is finding distinct keywords that don’t have multiple meanings or don’t mean something else – that is a lot of what I’ve improved over time. What really takes forever is going through and like cleaning out all the anomalies and fixing all the stuff in there. But I guess [data cleansing] is part of general data science.
Minimal dataset inclusion keywords | webrtc, getusermedia, peerconnection, rtpsender, rtpreceiver, rtptransceiver, rtcdtlstransport, icetransport, rtctrackevent, \b(stun, turn).?server, mediastreamtrackprocessor, mediastreamtrackgenerator, webcodecs |
Category | Keyword(s) |
keywordWebrtc | webrtc |
keywordGum | getusermedia |
keywordStunTurn | stun server, turn server |
keywordPc | peerconnection |
keyworkdPcExt | rtpsender, rtpreceiver, rtptransceiver, rtcdtlstransport, icetransport, rtctrack |
keywordInStr | mediastreamtrackprocessor, mediastreamtrackgenerator |
keywordWebCodecs | webcodecs, videoencoder, audioencoder, videodecoder, audiodecoder |
keywordAV1 | av1, aomedia, dav1d |
keywordH265 | h265, h.265, hevc |
keywordWhip | whip |
keywordWhep | whep |
keywordMoq | moq, webtransport, quick |
If anyone ever wants to check my work, you can see my Google Colab here: OSS Analysis – Dec 2024.ipynb
General WebRTC Trends
Chad: With that, let’s dive into some of the big WebRTC trends.
Monthly Unique WebRTC Activity
Chad: I probably don’t need to remind anybody the pandemic ended. Things are maybe not quite the same as before. This chart here shows the number of unique counts per month [for repos, orgs, and users].
To explain “unique counts” further – for example, if a repo/user/org has 500 counts of a given keyword a month, I only show it once in this chart. This helps to normalize when there is a lot of activity around one thing – like when a single line of code gets referenced many times. I also remove bots.
I have a few different categories of events. The chart below the trends across the following event categories:
- Coding – commits, pull requests, pull request reviews
- Leecher – any event that doesn’t include the above
- Popularity – forks, watches, issue posts, and issue comments
Chad: The total number of GitHub users that have done something WebRTC-related peaked during the pandemic. But you can see, actually, we’re not doing too shabby [in comparison now]. Earlier in March, we had some new peaks:
The numbers are roughly around the same towards the end [of the year]. So, in general, I’d say we are doing okay.
Tsahi – I don’t know if you agree?
Tsahi: I’ve looked at the usage, less at the developers. In the usage, we had [WebRTC usage] growth of four between before the pandemic [compared] to after the pandemic. This [data] seems almost the same. The number of repos jumped by a factor of two and stayed [level], which makes sense. The number of number of users grew by a factor of 2.5 and then went down to 1.5 or two somewhere in between that. It seems in-line.
Events by type of activity
Chad: If you look at code events, we had a new peak also in March, which I think actually is pretty interesting. WebRTC [code events] haven’t gone down a lot, even though WebRTC is definitely a maturing technology. It is encouraging right that we still have new peaks and there hasn’t been a big drop off. At some point this is going to happen, but we’re not there yet.
Here is another view is the same thing. Distinct users with that had some sort of coding event. This is higher than ever – actually higher than even during the pandemic times. So not too bad.
Tsahi: How do you explain it?
Chad: Well, there’s a lot of people are doing new stuff that they weren’t here before. We’ll talk about like, we’ll get to WHIP and WHEP and stuff like that. But, as a example, there are new WebRTC broadcasting tools, WebRTC is in things like OBS for the first time. It expanded in new communities where it didn’t exist before.
Open Broadcast Software (OBS) is an open source video production studio that is very popular for streaming. I did a review of the official introduction of WebRTC there in WebRTC cracks the WHIP on OBS.
I talk more about WHIP and WHEP specifically below. As I will cover later, there isn’t necessarily one community growing more than the rest. There are many new communities. In aggregate it seems these are helping to continue WebRTC’s coding event growth on top of maintenance of existing projects.
Tsahi: I have an egging question that I’m sure others have. At least those with an OCD. What happened in October 2021?
Chad: Oh, yeah. That’s in every graph. There’s an example where – for whatever reason – the volumes were like half of what they were supposed to be, and it didn’t make any sense. So I just excluded that whole month.
Even this October. I didn’t have time to fully investigate it. I didn’t see anything obvious there, but this current October is [also] down. And that’s unusual because October is usually busier, seasonality wise.
That’s why there’s a hole there.
Tsahi: Okay.
Coding vs. Leeching
I have also been studying the percentage of “coders” vs. “leechers” according to the definitions in the previous section. The start of the pandemic brought a sharp increase of leechers. This shouldn’t be too surprising in retrospect – we had a huge influx of new developers looking to do video-calling things. WebRTC has a relatively steep learning curve, so it isn’t realistic they would be able to contribute immediately.
Chad: I also took a look at coding and leecher events. No surprise – during the pandemic, a lot of people coming on just wanted to use the code. They didn’t know anything about [WebRTC] and weren’t contributing. The positive thing here is now the coder events and leechar events have flipped. There are more people contributing code than just passively using these repos. So that’s also a good sign of health.
We have effectively more active WebRTC than ever.
This was on an overall event count basis. I did not have time to go into this on the live stream, but I also looked at the average event count per user. This doesn’t show a switch between coding and leeching, but you can see a steady increase in coding events with a decline in leeching. 44% of 7226 distinct users with WebRTC activity this last November did a push, pull, or pull request review.
Top Repos, Orgs, and Users
I spent about 8 minutes reviewing this year’s top open-source repos, organizations, and individual developers. I revised my methodology from how I did this in the past. It can be hard to see all the individual repos and users in these tables. So instead, I will be giving this its own dedicated post with full 2024 numbers sometime soon.
See the video for most of it if you can’t wait.
WebRTC Technology Trends
In this next section, I filter the dataset around some specific technology trends within WebRTC:
- WHIP and WHEP – WebRTC HTTP Ingress Protocol and WebRTC HTTP Egress Protocol – relatively new protocols to aid in real-time live streaming
- Media over QUIC (MoQ) – bring your own media communications stack leveraging that latest HTTP/3
- Artificial Intelligence (AI) & Machine Learning (ML) – both the use of WebRTC to help in this area (like with ChatGPT Real Time) and use of AI/ML to make RTC better
- WebXR – WebRTC use in Augmented, Virtual, and Mixed reality applications
Chad: Which of these interested you most?
Tsahi: That’s a great question. So for me, I am most interested in two and three.
Chad: Ok – the AI/ML and QUIC?
Tsahi: Yeah, AR/MR,XR. – we’ve been talking about it for the last 10 years, but nothing interesting. WHIP and WHEP are just too simple. It’s like, OK, so we have signaling.
WHIP and WHEP
Chad: Let’s talk about WHIP and WHEP first.
Chad: this has some decent growth, but its anything insane? It is still a relatively small community doing this.
Here’s some analysis [of top projects] I did:
Chad: This actually just looks through July and shows the top repos by popularity. You can see many kinds of broadcast and streaming-related companies in there. I’ll talk a little bit more about XR and that metahuman-stream [repo later].
So there is a community of broadcasters that are doing this.
WebCodecs, WebTransport, MoQ
For background on this topic, see our last livestream on WebCodecs, WebTransport, and the future of WebRTC.
Chad: And then the next one, we’ll look at WebCodecs, WebTransport and the Media over QUIC stuff. [MoQ] could potentially move [RTC apps] off WebRTC.
You can see, there’s a pretty clear uptick of just WebCodecs in here. WebCodecs plus WebRTC is not [what I] expected. It is not completely flat, but there’s not more than two or three [active repos]. These are very small numbers.
Tsahi: The issue is the numbers. They’re not interesting to begin with.
Chad: Exactly. I was surprised. It’s a lot, lot lower than I thought.
I also specifically at like QUIC, Media over QUCI, and WebTransport.
The numbers here aren’t too bad, right? We’re getting up like 160 repos here. This is actually a little more interesting.
I took a snapshot of some of the repos in 2024 that are doing some stuff here. So, some potential there, but still not a ton.
AR / MR / XR
Chad: Here’s a new one I just did – XR.
Chad: This one actually has pretty steady growth. This is another case where I explicitly look for WebRTC [plus an XR term]. You can see my insane regular expression that I used.
This actually had some decent growth, so I think this could be interesting. With all the stuff that Meta and Apple are doing here, I guess you’re bound to see some [activity].
I forgot to add the data table with some repo examples on the live stream, so here are the top 10 by number of users with an event:
Repo | Description | Events |
EtherealEngine/etherealengine | iR Engine: A customizable platform for immersive WebRTC social experiences. | 5803 |
ir-engine/ir-engine | iR Engine: WebRTC-enabled platform for immersive social experiences. | 1885 |
godotengine/godot | Godot Engine: Open-source game engine with Web export capabilities. | 65 |
WebKit/WebKit | WebKit: Cross-platform browser engine for web and media applications. | 140 |
mrlt8/docker-wyze-bridge | WebRTC bridge for Wyze cams; supports RTSP, RTMP, and HLS streaming. | 51 |
w3c/webappsec | Web App Security WG: Standards for secure web apps, including WebRTC. | 20 |
mozilla/hubs | Duck-themed WebVR spaces for multi-user collaboration using WebRTC. | 75 |
AlexxIT/WebRTC | Home Assistant component for real-time camera viewing via WebRTC. | 21 |
dart-lang/web | Dart package for lightweight browser API bindings with JS interop. | 33 |
dxos/dxos | DXOS: TypeScript SDK for real-time, peer-to-peer collaboration apps. | 26 |
The first two are the same project – IR Engine looks to be a mature and active project. Their readme says:
AI and ML
Chad: Lastly [we have] AI and ML. I got to reuse our old KrankyGeek logo here.
ML was the sole theme of our 2018 Event and we made sure to include it in some form in every event after.
Here I looked for “WebRTC” plus “ai”, “ml”, “gpt”, or “llm” (regex \bai\b|llm|gpt|ml/i
. I was expecting more hits on this topic. I previously tried to run a similar regular expression match in my GitHub WebRTC dataset. Unfortunately, that surfaced more noise than useful information, so I did not include it.
Instead, I included an analysis I did a few months earlier based on StackOverflow posts and answers. In that case, I had to build the dataset from the http-archive.
Sadly that one has not been updated since April 2024. As a result, I did not spend much time on this topic this time around.
This is something I need to research more.
Chad: I didn’t find much.
Tsahi: Probably mostly OpenAI.
Chad: Yeah, not really. Open AI was one of the terms in there. The GitHub data shows something similar. It’s not nothing, but not as much as you would think.
More on OpenAI in the conclusions next.
Conclusion: nothing too exciting but WebRTC is still doing OK
Chad: I think we are actually doing pretty well. Certainly, we had a kind of a sugar high off of the pandemic, but there’s still some new peaks. [We’re doing well] in terms of the number of actual developers, people understanding WebRTC. There is still planning activity and new stuff going on. But the new tech is still pretty small compared to the traditional WebRTC stuff that we’ve been working with for a long time.
Media over QUIC is not replacing WebRTC today.
I really want there to be more ML in WebRTC and I didn’t see it. I’ll say, we know there are some companies doing stuff, but like this is again, public repos. So if somebody is doing this on a private repo [then] it’s not going to show up in my data set.
Tsahi: So I think that Media over QUIC will take a long time. It’s not that quick. And then the ML stuff is done in behind closed or not in open source. That would be my conclusion on these things.
I also speculate that there is AI in WebRTC going on in closed-source that isn’t being replicated in the open-source domain. Still, I was hoping to find something like a fork of LLAMA that provides real-time chat like Open AI’s recent real-time API with WebRTC. I didn’t see much of that. However, this area is moving fast. Maybe DeepSeek will have a R1 realtime soon?
Chad: I agree. In other areas you see [closed-source] copycats in open source, those trying to replicate some of the closed source stuff, right? There’s not as much of that in ML [with WebRTC].
Tsahi: Yes, but that’s because you haven’t, the only, let’s say, killer scenario or use case here is connecting WebRTC to OpenAI. And that’s something that only was announced or started lifting off in November this year when the real-time API from OpenAI was announced. So you’ll see that likely in next year’s numbers. And most of it is going to be around the LLM, OpenAI and less around machine learning in AI in general, I think.
My dataset presented here was through November, and OpenAI’s Realtime API with WebRTC was released afterwards in December. We have some posts coming up around OpenAI and RTC, so I’ll hold comment on that topic until then.
What’s next – methodology improvements
We did a short Q&A section. I am just including this excerpt here. Tsahi started with a question for me.
Tsahi: I have a question around the methodology itself. You did that in the last couple of years. What do you plan or think that you should change or improve for next year? Where do you see gaps in the way that you do the analysis?
Chad: Well, I’d like to do more of like the [updated 2024 methodology] I did. Ideally I would go and rerun that analysis [for] every year. But the cost -it’s actually getting pretty expensive. The [minor] updates I did in the last couple of days probably cost me like 75 bucks or something like that. It would be hundreds of dollars every time I go and rebuild the data set from scratch.
And unfortunately, I realized I screwed up stuff and there’s little tweaks and you want to make every time. So, you have to assume every time I [do a run], it probably needs to be done a few times. It starts to get expensive.
Probably not from the very beginning, because that’s not practical, but [I would like to] at least rebuild the data set for the last few [years], since the pandemic? Start with a clean data set with all the new repos and then include the larger expanded list of WebRTC[-related repos and include] not just the pure events that have WebRTC terms in them, but basically everything that happens in those repos. That actually might change some things. The keyword analysis is still based on the keywords, so [that approach] is not going to change anything there. But in terms of the overall trend in activity and users, it’s possible that could change in there.
We don’t have we don’t have we don’t have avatars built into this yet. That’ll make more sense.
Chad: Well. Thank you so much for joining. Sorry, it was good to hang out again. I miss the old KrankyGeek days, but this is maybe the next best thing. Thank you all for joining and we’ll be in touch!
Tsahi: Bye everyone.
Do you have any data analysis requests? Let me know in the comments so I can look the next time I dive into this data.
Leave a Reply