Sorry. We really wanted to do a post-cap of the W3C WebRTC and IETF RTCweb meetings that took place at the end of October and November, but we did not get to it. Victor and Reid provided some commentary on the codec debate prior to the IETF discussion. The outcome of that discussion was widely publicized and we did not have a lot of value to add to this for the developer community.
Importantly, codecs were not the only thing discussed in this latest rounds of standards meetings. There were a couple items like the move to JavaScript promises, output device enumeration, and discussions of security implications that are very relevant to the average WebRTC developer that have gone under the general media radar. To get the whole on standards right from the horse’s mouth, I asked W3C WebRTC editor and founding author Dan Burnett for an update on the recent WebRTC standards meetings and for some details on some of the more significant issues like promises and screen sharing.
For more background on Dan, please see our last Q&A with him.
{“intro-by”, “chad“}
webrtcHacks: Late October and November were busy months in the standards bodies. Can you give us some of the top-line highlights of some of the more significant decisions?
Dan: There were 2 significant standards meetings in the last month or so:
- one is the W3C WebRTC WG meeting and
- the other is the IETF RTCWEB meeting.
Probably the most well known outcome of the 2 meetings was the codec decision by the IETF. That decision has already been covered extensively so I will not go into that one.
Mainly I want to talk about the API discussions.
The item that Google says is their most requested feature or change is to provide an ability to enumerate output devices. getUserMedia today only allows you to select input devices. The proposal from Google is to include output devices also in the enumeration of devices with a new sinkId that is analogous to sourceId for inputs.
From the application’s perspective, getting a permission grant for an input device automatically gives you access to any output with the same groupId . For example, if you request and get permission to access the microphone on a headset that has both a mic and speakers that would automatically give you access to the headphone speakers as well.
Google says their most requested feature or change is to provide an ability to enumerate output devices.
webrtcHacks: So services like Hangouts are using proprietary mechanisms to do this today right?
Dan: Yes. It is important to note:
- This is still just a proposal at this point.
- Everyone else in HTML-land cares about this capability as well. so it may take a while to go through a broader standards discussion.
webrtcHacks: Can you give an example outside of WebRTC where this could be used?
Dan: it has been possible to play output from videos in HTML, but it has never been possible for the web application to select the device that it goes to. There is also a tricky aspect to this. One example complication here – one that will require involvement by others who are expert in audio output – is the user may want the audio from different portion of the rendered page to go to different output devices. For example, I may want my WebRTC call to go to my headphones, but if there is an ad in an iframe I still may want that audio to come out of my speakers rather than my headphones.
webrtcHacks: Any other major issues that were discussed or resolved?
Dan: some other key items were:
- promises
- authenticated origins
- movement from stream to track focus for Peer Connections
- the RTCRtpSender and RTCRtpReceiver objects, and
- screen sharing
There are others too, but those are the top ones.
webrtcHacks: Before we dive into some of those topics, can you comment on how the average “developer” is represented in these discussions? How do you prevent big companies with vested interests from completely controlling the standardization agenda?
Dan: Anyone is permitted to participate in discussions whether they are representing a company or just themselves. We have a number of people who participate occasionally and who consider themselves independent developers. In fact, some of the changes I mention, like promises and authenticated origins, were driven by developers outside of the WebRTC community.
webrtcHacks: Interesting. On the topic of promises – Callback hell and promises are a big topic in the JavaScript community. Can you walk us through that discussion a bit?
Dan: I’m not really active in the generic JavaScript discussion, but just for summary here promises are being added into ECMAscript 6. The syntax as I understand it is still not completely finalized, but is very close to being finalized. There are plenty of sites online where you can get more information – www.promisejs.org and Mozilla’s developer network are two of the most relevant for today.
The motivation for promises is that JavaScript is a single threaded language. What this means is that JavaScript can really only do one thing at a time. However, some things take a large amount of real world time to do, such as fetching a file from a website. No developer wants their website to freeze while a file is being loaded. To allow for asynchronous handling of real world events, JavaScript up to this point required developers to use a callback – a developer defined function that would be called whenever the time consuming function or method completed.
There are several problems with this. The first is that once your code requires a callback to be passed in somewhere, all following code dependent on that also needs callbacks as well. This is referred to as “callback hell”.
The other main problem which is very relevant for WebRTC programmers, is that when you have multiple asynchronous functions that have to all complete in order to proceed, you have to create special guard functions to verify that these are done. The promises approach, instead, returns an object for any asynchronous capability that can be passed around like any other object and whose value might not be determined until a future point. In practice this allows for coding of many asynchronous capabilities using the chaining syntax familiar to jQuery programmers.
If you need more details on this checkout the sites I mentioned earlier.
webrtcHacks: So what does this mean for WebRTC?
Dan: This was a very interesting discussion with the W3C WebRTC Working Group because it was largely pushed from outside of the group. W3C as a whole is now trying to ensure that all of its new standards use the promise based approach for API’s rather than the callback approach. Our first discussion was with respect to the media capture API – so getUserMedia . There are several methods defined in the Media Capture and Streams specification but the getUserMedia method specification was problematic in that we wanted the existing code to continue working. The existing code almost all uses one particular recommended definition for getUserMedia . Everybody uses:
1 2 3 4 |
getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia; |
The reason everybody uses this is because all of us working on creating getUserMedia assumed that navigator.getUserMedia would be defined eventually. If we changed navigator.getUserMedia to not accept callbacks and return a promise instead, this would break existing applications.
There was strong pressure within W3C to completely remove the callbacks. Because the methods other than getUserMedia were not implemented yet by existing browsers without a prefix, we decided that navigator.getUserMedia would retain the existing callback syntax and that the new recommended getUserMedia call would live under navigator.mediaDevices and would return a promise.
1 2 3 |
Navigator.mediaDevices.getUserMedia({audio:true, video:true}) .then(gotStream) .catch(logError); |
webrtcHacks: How does use of the navigator top-level object vs. other objects get determined?
Dan: We had decided earlier this year that the getUserMedia method needed to move away from the navigator object into the navigator.mediaDevices object.
webrtcHacks: What was the reason for that?
Dan: That is also where the new enumerateDevices method lives. Essentially, as it says in the spec: The mediaDevices object “is the entry point to the API used to examine and get access to media devices available to the user agent.”
So the plan now is that all the methods under mediaDevices will use promises rather than taking callbacks.
webrtcHacks: Moving from callbacks to promises is a big paradigm shift. How do you think this will affect existing applications?
Dan: As I mentioned, for backwards compatibility, there is also a version of getUserMedia that lives under navigator that takes callbacks.
Also, all of the methods defined in the WebRTC specification, including all of the RTCPeerConnection API’s, will both accept callbacks and return promises. We do not anticipate any backwards compatibility issues there.
webrtcHacks: does that include dataChannel too?
Dan: This applies to all of the APIs defined in the WebRTC specification. The data channel APIs don’t currently have any callbacks, so they don’t need any changes.
webrtcHacks: Do you think that will lead to any implementation or support issues having so many options that essentially do the same thing?
Dan: No. However, we are strongly encouraging people to use the promise based version from this point on in their new code.
There are currently no plans to officially deprecate any of the callback versions, but that could conceivably happen in the future. W3C has pressured us very strongly to do so.
we are strongly encouraging people to use the promise based version from this point on in their new code
webrtcHacks: That is a pretty strong statement that WebRTC developers should start moving to promises.
Dan: Absolutely. But you should not have to panic about changing your existing applications.
Of course these promise-based APIs are not even implemented yet in the browsers anyway.
webrtcHacks: good – so I have some time to update the many examples with callbacks we have here on webrtcHacks. 🙂
Dan: Yes, just like Alan and I have some time to update the examples in our book. 🙂
webrtcHacks: Security is another big topic that probably is not discussed with WebRTC enough. Fortunately it sounds like security seemed to be a big topic in the recent standards discussions with authenticated origins and screen sharing. Can you give us some more insight on those?
Dan: Yes, these are pretty big topics. There is no such thing as a short and quick security discussion anymore. Just like with promises, the authenticated origin discussion was initiated by W3C folks outside of the working group. W3C very strongly wants to forbid the use of unsecured HTTP or other unauthenticated origins. In short, the WebRTC WG decided that our specifications will recommend, but not require, that WebRTC content origins be authenticated.
the WebRTC WG decided that our specifications will recommend, but not require, that WebRTC content origins be authenticated.
webrtcHacks: So that means getting a certificate and using HTTPS?
Dan: In most cases yes. We do not really have time here to go into the edge cases of authenticating origins other than via HTTPS and certificates.
webrtcHacks: Ok, so how about the screen share topic?
Dan: Yes, this is the second highest request by Google Chrome users. So we all want it 🙂
However, security here is tricky. Let me give an example:
The web today provides security for users against malicious websites – JavaScript – by preventing JavaScript from one origin from accessing JavaScript from another origin. So even though one site can load an ad, or any other content, from another site in an iframe , the first site cannot see the content from the second site. This is important because security credentials, such as API keys, or TURN server passwords – are often in the HTML and/or JavaScript.
Now think about the screensharing case – a malicious website can not only load another site in an iframe , but it can also cause view-source to popup a window of that website source. If the screen is being captured by the first website, the first website can now see this source code, scan the content and gain access to the credentials.
Even better, since we are talking about WebRTC, if the first site already had permission to use my camera, it can track my gaze direction so that it only pops up the source to the other site when I look away and then clears it before I look back. So I never even know that anything happened.
This is obviously no good.
webrtcHacks: Wow – an interesting, but dangerous use of screen sharing. So is there any good way to prevent this?
Dan: This is why today, Chrome requires the use of an extension so that both the site and user are aware that they are enabling this capability. Firefox is going a different route, which is to require the use of a whitelist of applications that can be shared. This is the direction we are likely to go in developing a standard.
Screensharing is the second highest request by Google Chrome users
webrtcHack: Can you clarify what you mean by “application” whitelist?
Dan: There are many details to be worked out here, but the idea would be that any applications not on the whitelist would be blanked out. So to co-browse, the user would have to provide permission in the browser configuration settings for that other browser to be shared. Same for any other application.
Again, this is all very vague right now. The key is to understand is that we all want this. The standards groups are working on this. But they want to make sure that the user remains protected.
webrtcHacks: I asked you about standardization timelines when we talked a couple of months ago. You mentioned you saw stabilization happening for getUserMedia in early 2015 and peerConnection was 6 months behind that? Have your timelines changed as a result of these recent meetings?
Dan: Surprisingly no. In fact I have been describing the recent meetings to people – with the exception of the codec debate – as almost boring in the fact that we just steadily made progress on outstanding change requests. It is a very good sign when a standards group is able to do that because it means that now all we have is work to do rather than disagreements to resolve.
I don’t want to say that everything is perfect, but the recent meetings were very, very encouraging for completing WebRTC in a reasonable timeframe.
the recent meetings were very, very encouraging for completing WebRTC in a reasonable timeframe.
webrtcHacks: That’s great news – so where can our readers go to get more details on what was discussed during the recent W3C and IETF meetings?
Dan: There are minutes for the meetings, but they are not always easy for even the participants to follow, much less external observers.
webrtcHacks: should we just wait for the specs to be updated?
Dan: The specs are gradually being updated with the results of the discussions. I have some high-level slides that I presented at the WebRTC Expo 2 weeks ago that might be helpful.
Otherwise the best way is to monitor the specs. You should click on the editors draft on the specs – that is always the most recent coherent version.
If you want to follow more closely – you could always follow the spec editing directly on GitHub if you care at that level.
I am also doing a training session with your fellow webrtcHacks colleagues, Victor and Tsahi in Paris on December 16 if you want to find me or ask questions during that session.
webrtcHacks: Cool – I’ll be in Paris too so I’ll see you there.
{“interviewer”, “chad“}
{“interviewee”, “Dan Burnett“}
Lorenzo Miniero says
Interesting summary, thanks, there were updates even I wasn’t aware of! 🙂