I logged into YouTube on Tuesday and noticed this new camera icon in the upper right corner, with a “Go Live (New)” option, so I clicked on it to try. It turns out you can now live stream directly from the browser. This smelled a lot like WebRTC, so I loaded up chrome://webrtc-internals to see and sure enough, it was WebRTC. We are always curious here to see how large scale deployments are implemented, so I immediately asked WebRTC reverse engineering master Philipp “Fippo” Hancke to investigate deeper. The rest here is his analysis.
{“editor”, “chad hart“}
Chrome’s webrtc-internals page has served us well since the hangouts analysis in 2014 and we are going to use it again this time. As Youtube had a cooldown of 24 hours after registration we ended up analyzing a dump kindly provided by Tsahi Levent-Levi which is available for download here. You can use this import tool to re-import the JSON dump into Chrome.
Note that the feature, which seems to work only in Chrome, is only using WebRTC for publishing the Webcam stream, the receiver side does not use WebRTC. This means its not super-low latency but our old buddy Chris Kranky reports less than five seconds at least. We are eagerly await his think-piece on the topic!
Let’s dive into the technical details…
getUserMedia calls
At the top of the imported data we see the getUserMedia calls made by youtube:
We can see that getUserMedia gets called asking for a 1080p resolution from the webcam:
and there is a separate
getUserMedia call to acquire the microphone.
Note that while this is not shown in this dump the very first getUserMedia call will still ask for both audio and video, so that there will only be a single permission prompt in Chrome.
RTCPeerConnection calls
After getUserMedia we can dive into the RTCPeerConnection API calls. If you want to learn more about this see either the previous post on how Hangouts uses WebRTC or the more complete webrtc-internals documentation on the TestRTC blog.
ICE servers
We can see that the RTCPeerConnection object is created with an empty set of ICE servers:
1 2 3 |
{ iceServers: [], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 } |
We will see later why TURN servers are not required for this use-case.
The client then adds a MediaStream using the addStream API. Note that this API is deprecated so the lack of dogfooding the spec addTrack API which shipped in Chrome 64 (and which is polyfilled in adapter.js for older versions) is a bit disappointing.
setLocalDescription signaling
The client then creates an offer with the full set of audio and video codecs supported by Chrome. Next setLocalDescription is called without modifying that offer. In particular that means it does not use simulcast.
As a result of the setLocalDescription call the client generates some host candidates. Probably those are discarded, it does not make much sense to send them to the server even.
Update: the signaling server and protocol were relatively easy to find. Filtering for realtimemediaservice Chrome’s network inspector shows the single HTTP request and response. No fancy signaling required (and no trickle-ice), this is about as bare-bones as it gets:
setRemoteDescription
Next comes the setRemoteDescription call from the server. This is where it gets actually interesting. The SDP looks exactly like what Chrome or the webrtc.org library usually generates, answering with the full list of supported codecs. Quite notably, it is not using ice-lite as Hangouts does which suggests Youtube is using a different infrastructure.
H264 is preferred by the server in the m= line:
1 |
m=video 9 UDP/TLS/RTP/SAVPF 102 96 97 98 99 123 108 109 124 |
(the 102 , see here for a SDP refresher) while the Opus codec is used for audio.
Inspecting the statistics (some of which are visualized when importing the dump) from the raw dump also confirms H264 is used. Search for send-googCodecName if you are interested.
For connectivity we see a few of ICE candidates included in the SDP:
1 2 3 4 5 6 7 8 9 10 11 12 |
a=candidate:3757856892 1 udp 2113939711 2a00:1450:400c:c06::7f 19305 typ host generation 0 network-cost 50 a=candidate:1687053168 1 tcp 2113939711 2a00:1450:400c:c06::7f 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:1545990220 1 ssltcp 2113939711 2a00:1450:400c:c06::7f 443 typ host generation 0 network-cost 50 a=candidate:4158478555 1 udp 2113937151 66.102.1.127 19305 typ host generation 0 network-cost 50 a=candidate:1286562775 1 tcp 2113937151 66.102.1.127 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:3430656991 1 ssltcp 2113937151 66.102.1.127 443 typ host generation 0 network-cost 50 |
There are UDP candidates both for IPv6 and IPv4 as well as ICE-TCP candidates and the Chrome-proprietary SSL-TCP variant which Hangouts uses as well. This results in about the same chance of establishing a connection with the server as TURN UDP/TCP/TLS would so there are no additional TURN servers required.
Encoding
Simulcast is not used. Given the lack of H264-simulcast in Chrome and the underlying WebRTC library (see the bug report; the lack of feedback is quite sad) that is not very surprising. H264 makes sense as a codec both for the encoder which can use hardware acceleration and to support a larger set of viewers without transcoding from VP8 to H264.
Note that while transcoding of VP8 to H264 is avoided for the server there is probably still reencoding to generate lower-resolution streams for receivers that do not have the bandwidth to receive the full resolution stream from the client. This functionality has probably been part of the livestreaming pipeline that Youtube uses for a long time already.
WebRTC statistics
The statistics themselves do not offer much insight. The only interesting graph is the one for the picture loss indications (PLI) which the server sends:
The pliCount increases every ten seconds and consequently the client will send a keyframe in that interval. This probably makes it easier for Youtube to generate a server-side recording or to convert the stream to whatever output format is used.
Summary
Youtube uses WebRTC as a user-friendly way to grab the users webcam and stream to their infrastructure. While this is probably not going to replace the advanced setups of people who broadcast it will lower the barrier of entry significantly, removing the friction to get started.
Unfortunately, the feature does not work in Firefox. This is another bad example of Google launching products that work only in Chrome. Mozilla’s Nils Ohlmeier tried getting it to work while spoofing the user agent and ran into issues because the Javascript uses the deprecated registerElement API. From a WebRTC point of view the feature should work however so we look forward to coming back to this when the frontend bugs are fixed.
Update: as of May 2018, Firefox 53+ is supported as well.
{“author”: “Philipp Hancke“}
Ray says
I’m not seeing a camera icon at all?
I am running Chrome Version 65.0.3325.181 (Official Build) (64-bit)
Chad Hart says
What happens when you go to https://www.youtube.com/webcam ?
Dave says
It appears you need to be a member of YouTube Red to have the icon show up. You might try
https://www.youtube.com/live_dashboard
It to may be different between plain YouTube and Red
sourabh joshi says
Could help me figure out how am I supposed to use the rtcstats API if there is some example I could refer to.
Jeremie says
Just a question : Is it possible to stream webcam directly to youtube live ?
Or we need an encoder ?
I see that there is no way to send Webrtc stream directly to youtube live…
Chad Hart says
That’s what this feature does. The webcam streaming icon does not show here: https://www.youtube.com/live_dashboard, but if you go to the Creator Studio under Live Streaming or the main YouTube homepage you can start your stream and it will show-up in the live stream dashboard.