WebRTC had its heyday during the peaks of the pandemic, but how is it doing now? Did all those new projects die, putting the community back at pre-pandemic “normal” levels or is WebRTC still going strong? Are there many new WebRTC-related repos? How many new users is WebRTC attracting? Is the community coding as much as it used to? I built a dataset from over a million GitHub’s events since 2019 to help answer these questions and examine other WebRTC developer trends.
I found we are certainly past peak, but by nearly every measure WebRTC is better off than it was before. I also dug into some trends within this dataset, specifically looking at newer API’s like Insertable Streams and WebCodecs.
Many details below…
Quick Methodology Review
The core question here is what counts as “WebRTC” to be included. Ideally, I would be able to search through all of GitHub’s file commits and changes over time, looking for usage of WebRTC-related API’s to build a list of repos, then analyze that. I actually attempted to do something like that with bigquery-public-data.github_repos.contents
that has copies of GitHub file contents, but I discovered that hadn’t been updated in about 5 years. So, I turned back to the methodology I used for my 2019 and 2015 posts using the GH archive. That tracks every individual GitHub event.
The events don’t contain the raw file contents, so I search through that database looking for WebRTC-related keywords that show up anywhere in the event data. Most of these keywords are the same as my 2019 post, but I added some for Insertable Streams and WebCodecs:
Category | Keyword(s) |
keywordWebrtc | webrtc |
keywordGum | getusermedia |
keywordStunTurn | stun server, turn server |
keywordPc | peerconnection |
keyworkdPcExt | rtpsender, rtpreceiver, rtptransceiver, rtcdtlstransport, icetransport, rtctrackevent |
keywordInStr | mediastreamtrackprocessor, mediastreamtrackgenerator |
keywordWebCodecs | webcodecs, videoencoder, audioencoder, videodecoder, audiodecoder |
I found a lot of false positives on the audio and video encoder and decoder terms, so I only included them in my analysis if they contained another term in the table above. I’ll cover that more below.
I also group the events into types for different analyses:
Category | Events |
code | Push, Pull, PullRequestReview |
Issue | Issues, IssueComment |
popularity | Fork, Watch, Issue, IssueComment |
contribution | PullRequest, PullRequestReview, Issues, IssueComment |
The methodology isn’t perfect, but I did quite of a bit of examining the results and it holds up pretty well. One difference this time is I discovered a lot of bots that added noise to the data, so I did my best to exclude them. I also noticed Oct 2021 seems to be missing a large amount of data – I didn’t see a good explanation why. You might notice I removed that from some of the graphs below.
I ran my query for 2019 through September 2022 and it spit-out the following:
Total Events | 1,069,145 |
Unique Repos | 69,148 |
Unique Users | 163,399 |
If you want more details, see a sanitized Jupyter notebook with final queries here. I made most of the graphs in Excel because I don’t have the patience to format matplotlib charts in Pandas.
How is WebRTC Doing?
We had a global pandemic and remote Real Time Communications really took off. Things are starting to get back to normal now. How does that look for WebRTC?
Distinct Users
Looking at the count of distinct users per month that showed up in a GitHub event. There is a very clear peak in April 2021 with a total of 10,218 users. It is flattening out in 2022 if not slightly declining, but we are still at 62% of peak, well above pre-pandemic levels.
The story is different if you look at contribution events – my interpretation of real activity:
You see a similar pandemic peak in May – I suppose it took new users a month of exploring WebRTC before they could really start contributing code and real feedback. However, 2022 shows a continued increase in activity with a recent peak this past August. This coincides with a similar pattern of the term “WebRTC” being used.
Who is driving the recent growth?
I looked to see if just a few Repos were driving the growth – I selected the top 25 repos in Aug 2022 by contribution activity and looked to see how they have changed over time:
There’s a lot going on there, but peering into the data you notice it isn’t one repo driving the change. Most of the Aug-2022 Top 25 showed increases in 2022 with a few newer projects, but mostly established ones. My conclusion is that this isn’t just an anomalous single project or a few projects driving increased WebRTC development – it is a bunch of them. That’s a good sign for WebRTC’s health.
WebRTC vs. All of GitHub
WebRTC is just one of many technologies developers could be working on. I find it helpful to put WebRTC in perspective. In this case, I compared my WebRTC dataset against all of GitHub on a similar basis (unique repos / users / orgs with an event in a given month). Obviously, all of GitHub has more volume, but how is WebRTC trending in comparison?
The below graph looks at how the total repo count changes over time:
You can see WebRTC’s pandemic spike. GitHub has a bit of a spike, bit is more of a steady incline. Again, remember these are repos with an event that month – not dormant ones.
What about on a unique-user per-month basis (unique users with an event to be more precise)?
The All of GitHub pattern looks the same as it does for Repos. WebRTC is becoming proportionally less relevant in the githubverse. However, this is what I would expect – GitHub is attracting new developers and projects that were previously in other systems and not everyone needs WebRTC.
Do we have more Leechers now?
One sign of health is how often users are submitting code vs. just using it. It’s hard to find how often a given repo is used, but we can use the percent of non-code events vs. all others as a proxy. I call these leechers. Some of these are legitimate bots, but most aren’t. The below shows distinct users without a code event per period:
Unsurprisingly, the pandemic initially brought an increase in leechers, but there was a meaningful dip in July 2020 – one of the periods with the highest number of users overall. Perhaps it took a couple of months with WebRTC to start contributing code?
The leecher % has actually been gradually declining. The last few months have the lowest percentages – I see this as a good sign.
Is WebRTC attracting new users?
I looked to see the number of new users who had any event. Since my data set starts in 2019, everyone at the beginning of that period would be counted as new so I removed 2019 from the chart below:
For this analysis, I would expect the number to go down slightly since the odds of a user doing something again with one of these repos will go up over time. We see the same pandemic peaks bringing in new users. Surprisingly, about half of all users in recent months are first-time appearances. That percentage has stayed fairly flat of late.
Keyword Trends
We can also gain some insights by seeing what keywords show up in these events.
The term “webrtc” is being used way more than the other keywords. It is showing a steady rise to an all-time high peak in August 2022. We’ll get dig into that in a moment.
Here are the other keyword trends:
That chart is not so easy to follow, so I will dig into those in more detail too.
Why is “WebRTC” showing up so much?
I was curious about the increase in the keyword “WebRTC” in the past year, so I filtered the dataset to show contribution and code events vs. unique repos that use the term over time:
You can see, the number of repos hasn’t changed much, but contributions and code events have both increased quite a bit. I also checked to make sure there wasn’t a recent increase in repos with WebRTC in the name – i.e. webrtc-rs/webrtc
, which would give it extra weight in my methodology.
This took me down a rat hole to see if I could figure out what was going on. I tried to do some analysis of the commits that included the term “WebRTC”. Here were some of the top results:
phrase | cnt | repos |
Adds support for Tags on Amazon Chime SDK WebRTC sessions | 92 | 73 |
: Adds support for Tags on Amazon Chime SDK WebRTC sessions | 103 | 64 |
Source/ThirdParty/libwebrtc/libwebrtc.xcodeproj/project.pbxproj:\n | 171 | 55 |
feature: ChimeSDKMeetings: Adds support for Tags on Amazon Chime SDK WebRTC sessions | 114 | 53 |
Source/WebKit/WebProcess/GPU/webrtc/RemoteVideoFrameObjectHeapProxyProcessor.cpp:\n | 109 | 45 |
Adds support for Tags on Amazon Chime SDK WebRTC sessions | 93 | 44 |
Source/WebKit/GPUProcess/webrtc/RemoteAudioMediaStreamTrackRendererInternalUnitManager.cpp:\n | 101 | 42 |
Source/ThirdParty/libwebrtc/Configurations/Base.xcconfig:\n | 104 | 41 |
Source/WebKit/GPUProcess/webrtc/RemoteSampleBufferDisplayLayer.cpp:\n | 98 | 37 |
Source/WebKit/GPUProcess/webrtc/LibWebRTCCodecsProxy.mm:\n | 98 | 37 |
Source/WebCore/platform/mediastream/gstreamer/GStreamerWebRTCProvider.cpp:\n | 93 | 35 |
Source/WebKit/WebProcess/Network/webrtc/LibWebRTCProvider.h:\n | 98 | 34 |
Source/ThirdParty/libwebrtc/Configurations/libwebrtc.xcconfig:\n | 63 | 34 |
WebCore::GStreamerWebRTCProvider::initializeVideoEncodingCapabilities | 104 | 32 |
WebCore::GStreamerWebRTCProvider::initializeAudioDecodingCapabilities | 104 | 32 |
WebCore::GStreamerWebRTCProvider::senderCapabilities | 104 | 32 |
WebCore::GStreamerWebRTCProvider::receiverCapabilities | 104 | 32 |
WebCore::GStreamerWebRTCProvider::initializeVideoDecodingCapabilities | 104 | 32 |
WebCore::GStreamerWebRTCProvider::initializeAudioEncodingCapabilities | 104 | 32 |
Looking into these, many are related to Amazon Chime, GStreamer, or WebKit:
You can see a big increase in Repos that use these projects activity here – especially for Chime. However, these 3 still account for only 3% of all the “WebRTC keyword only” repos in August 2022, so there is clearly more going on here.
What is driving STUN and TURN server peaks?
Looking just at the “STUN server” and “TURN server” keywords, we can see a few peaks with an increase last month (September 2022).
Here are the top repos for the last September 2022 peak:
repoName | users | codes | contribs | events |
coturn/coturn | 50 | 294 | 294 | 294 |
libp2p/specs | 8 | 133 | 133 | 133 |
nextcloud/spreed | 17 | 68 | 68 | 68 |
nextcloud/talk-android | 12 | 67 | 67 | 67 |
libp2p/rust-libp2p | 3 | 32 | 32 | 32 |
eakraly/coturn | 1 | 28 | 28 | 28 |
bigbluebutton/bigbluebutton | 11 | 21 | 21 | 21 |
ddimaria/stun-server | 1 | 19 | 19 | 19 |
webrtc-rs/webrtc | 3 | 17 | 17 | 17 |
feugy/tabulous | 1 | 14 | 14 | 14 |
These 10 repos account for 50% of all the activity in that period. coturn has has a tremendous amount of activity recently. webrtcHacks will have a post with the new project leads there soon.
Insertable Streams
We have covered Insertable Streams a bunch of times recently, so I wanted to see if I could find any interesting projects that use it. This one doesn’t look too popular in terms of GitHub event messages, with only 114 repos referencing mediastreamtrackprocessor
or mediastreamtrackgenerator
.
Most of these were W3C, Chrome, and other new-spec-related activity. Removing those we only get a few repos:
repoName | org | users | codes | contribs | events |
HTTPArchive/legacy.httparchive.org | HTTPArchive | 4 | 19 | 19 | 19 |
Lightcord/Lightcord | Lightcord | 3 | 7 | 7 | 7 |
rustwasm/wasm-bindgen | rustwasm | 3 | 5 | 5 | 5 |
Vonage/media-processor | Vonage | 2 | 4 | 4 | 4 |
pixijs/pixijs | pixijs | 2 | 4 | 4 | 4 |
espeak-ng/espeak-ng | espeak-ng | 1 | 3 | 3 | 3 |
Automattic/jetpack | Automattic | 1 | 3 | 3 | 3 |
microphone-stream/microphone-stream | microphone-stream | 1 | 2 | 2 | 2 |
Xpra-org/xpra-html5 | Xpra-org | 2 | 2 | 2 | 2 |
gpac/mp4box.js | gpac | 1 | 1 | 1 | 1 |
highfidelity/Spatial-Audio-API-Examples | highfidelity | 1 | 1 | 1 | 1 |
php/php-src | php | 1 | 1 | 1 | 1 |
Scanning through these, most have minor references to those API’s, comments about how the project should use Insertable Streams, or some kind of API check. The most interesting to me:
- Vonage – stream effects
- highfidelity – spatial audio
- microphone-stream – mediaStream to node.js stream)
- espeak-ng – browser text-to-speech.
WebCodecs
WebCodecs is another new Web API with potential for interesting RTC use. I was also hoping to find some interesting WebCodecs use cases. Here is the event history, split by use of the specific term “webcodec” and the APIs which tend to have more false positives.
The “webcodec” peak in the summer of 2021 corresponds with Chrome’s non-origin trial release and more W3C activity. The recent peak did not reveal much either – of the 153 repos, most were WebKit-related (lots of forks) or W3C activities.
None of these had a lot of activity, but here are some of the interesting ones I found out of the top 100 or so:
repo | About |
StaZhu/enable-chromium-hevc-hardware-decoding | A guide that teach you enable hardware HEVC decoding for Chrome / Edge , or build a custom version of Chromium / Electron that supports hardware & software HEVC decoding. |
guest271314/WebCodecsOpusRecorder | WebCodecs Opus Recorder/Media Source Extensions Opus EncodedAudioChunk Player |
ziyunfei/render-video | Convert canvas animation to video in browser using WebCodecs API |
ennuicastr/libavjs-webcodecs-polyfill | A polyfill for the WebCodecs API. (uses libAV from ffmpeg) |
cvisionai/tator | Video analytics web platform |
davedoesdev/streamana | Stream from your Web browser to YouTube Live. No plugins or native apps required! |
I was hoping to see more. One of the bigger ones that utilize WebCodecs is Intels Open WebRTC Toolkit (Kranky Geek video on that), but I only saw the upstream WebRTC source fork in the Aug 2022 data.
To be continued…
What about the repos, orgs, and users? I started playing around with a new methodology, but I forgot how time-consuming this kind of analysis can be. I blew way past my time budget on this, so that will be coming soon!
{“author”: “chad hart“}
Leave a Reply