There have been many major WebRTC launches in the past months including Facebook and KimDotCom. Before those, Mozilla started bundling a new WebRTC calling service right into Firefox. Of course we wanted to check out to see how it worked.
To help do this we called on the big guns – webrtcHacks guest columnist Philipp Hancke. Philipp is one of the smartest guys in WebRTC outside of Google. In addition to his paid work for &yet he is the leading non-googler to contribute to the webrtc demos and samples and is also a major contributor to the Jitsi Meet and strophe.jingle projects. Google even asks him to proof-read their WebRTC release notes.
Last time he used Chrome’s chrome://webrtc-internals to decode Google Hangouts. Hello is for Firefox, so he had to pull out some more advanced tricks. Please see below for some analysis of Hello and some great tips and tricks for debugging your WebRTC applications or even someone else’s.
{“intro-by”: “chad“}
Recently Mozilla, in collaboration with Telefonica and TokBox unveiled a WebRTC calling application called “Firefox Hello”. It basically adds a small button to your browser that generates a call url which you can share.
If you don’t have that button yet, check Dan York’s blog post which shows how to add the button. If the button still does not appear, you might have to change the value of loop.throttled to “false” in about:config. Mozilla and Telefonica code named this project “loop” which explains the many loop references we will see.
Note: this post was written when Firefox 33 was the latest version. Subsequent versions changed some of the behavior and user experience.
In the spirit of the Hangouts analysis let’s try to figure out what is going on here internally. In general, there are two interesting things going on in a WebRTC application:
- the exchange of the signaling messages between the clients and
- the RTCPeerConnection calls that result from these messages.
The Signaling Messages
Let’s look at the signaling path first. When using Chrome, the Network panel in the Developer Tools allows you to look at the networking connections. After clicking the “Start a conversation” button, you will see that a websocket connection to:
wss://loop.services.mozilla.com/websocket
is established. This is using secure websockets, indicated by the use of wss. Using the network tab, it is easy to inspect the websocket frames:
1 2 3 |
{"messageType":"hello","callId":"8990000bb6695c7add72aedb18dcaf05","auth":"25bbb6467fc932e9333a1c01b7bc8848"} {"messageType":"hello","state":"init"} {"messageType":"progress","state":"alerting"} |
The protocol is based on sending JSON messages around. That is pretty much everything that flows via the websocket.
Note: it seems that in Firefox 34 the protocol for this was changed. Now it no longer seems to use a Websocket connection for anonymous users but goes directly to the TokBox servers.
Since WebRTC requires SDP and ICE candidates to be exchanged where are those?
It turns out there is another Websocket connection:
wss://media030-lon.tokbox.com/rumorwebsocketsv2
This one is established with TokBox who runs the infrastructure for Hello. Rumor is the codename for TokBox’ own publish-subscribe system described here.
Unfortunately, the websocket frames are binary and encrypted which means the Chrome Developer tools cannot easily decode them.
Fortunately my employer, &yet, also has a security division called ^lift which does security reviews of web applications. When doing this, you find amazing things when looking at the network traffic happening, so I got a recommendation for a tool called Burp Suite. Burp Suite is basically a man-in-the-middle proxy that allows you to wiretap (and modify) any request your browser makes. In order to intercept secure connections you need your browser to relay all traffic via burp and install a certificate agency from them which is explained nicely in the documentation.
Note: make sure you remove the PortSwigger CA when you’re done and deinstall Burp.
After starting Burp, go to the Proxy tab. Since we only want to listen to the signaling messages exchanged without modifying them make sure “Intercept” is off:
Now make a call using Hello again and switch to the “WebSockets history” tab:
The screenshot shows one of the signaling messages flowing over the websocket connection. You can inspect all messages in an easier way than Chrome Dev tools allow you to.
The protocol is basically a publish-subscribe protocol that transports JSEP-style SDP encoded in JSON objects. Pubsub works by someone subscribing to a channel and any message published to that channel is broadcasted to all subscribers.
The PeerConnection API calls
Now that we have found the signaling messages and it is not really fancy, let us look at the RTCPeerConnection and getUserMedia usage which was one of the most interesting points in the Hangouts analysis.
Chrome 38 added a “GetUserMedia Requests” tab to the chrome://webrtc-internals page. This tab shows all getUserMedia calls and the constraints used – i.e. to tweak echo cancellation or use a different camera. In the Hangouts Analysis post, this revealed quite a number of interesting constraints.
Here, there is less interesting stuff going on:
This shows that just the stock {audio: true, video: true} constraints are used. The only advantage those have is that they work in any browser.
In the PeerConnection section of webrtc-internals, one can see that there is a single constraint used: optional: {DtlsSrtpKeyAgreement:true}
This enables DTLS-SRTP in Chrome, but that has been enabled by default already for quite a while.
The PeerConnection API calls themselves are pretty standard, a standard createOffer->setLocalDescription->setRemoteDescription flow with no signs of the SDP mangling done by Hangouts.
Why capture the camera on audio-only calls?
When testing with an audio-only call in Firefox 33, the camera stream was acquired, added to the PeerConnection and sent in the offer. From a UX/privacy perspective that is somewhat dubious, even if the camera stream is muted and sending blackness. The only reason for doing this is to avoid renegotiation, which, unsurprisingly, is not supported by Firefox. The ability to accept/reject calls seems to have been removed in Firefox 34.
My opinion from this analysis
Compared to Hangouts which showed quite a number of highly elaborate features – many of which only work in Chrome – Hello’s WebRTC is extremely generic and not very interesting. I was hoping for more. The same functionality has been demonstrated on Google’s apprtc sample application for more than two years. As Tsahi Levent-Levi points out in his analysis, a very simple service like this might make more sense in the context outside of the traditional browser with platforms like Firefox OS.
Hopefully this article was still useful as a step-by-step description of how to “look under the hood” of a WebRTC application. The techniques like the use of chrome://webrtc-internals and using a proxy like Burp Suite are helpful for analyzing any WebRTC application.
{“author”: “Philipp Hancke“}
Gustavo Garcia says
Most of the advanced features of Google Hangouts are intended to improve multiparty experience, and that’s something that current version of Hello doesn’t support yet. In addition most of those features are Chrome specific and Hello is multiplatform but mostly a Firefox application.
Regarding the signaling protocol it is not really JSON, it is binary with a JSON payload. But that’s a detail, the analysis is very good.
I’m not so sure the WS connection to Hello services is not there yet. I can double check it with Mozilla people but I was not aware of that change.
As you said opening the camera for audio only sessions is to avoid renegotiations, but latest version of Firefox actually do support renegotiations so it is likely that OpenTok and Hello will make use of it very soon.
The use of DtlsSrtpKeyAgreement is because TokBox has partners still using very old versions of Chrome that we don’t officially support but we try not to break them on purpose either. Anyway we should review that, thank you.
Philipp Hancke says
Gustavo: it seems to be fetching a JSON thing with connection information for the room server now. Which makes sense, you don’t want to have a websocket connection open for the non-firefox users.
Adam Roach says
I’m amused at this analysis, mostly because a quick Yahoo (or Google, I guess) search would have found you all of these answers in a fraction of the time. Being a Mozilla product, all of this is being developed out in the open.
The change you’re seeing isn’t version-to-version — it has to do with whether you’re using the rooms (“conversation”) model or the call model. For unaccounted calls, all you can do is send room-oriented links. But if you log in with a Firefox account, you can perform direct calls to other logged-in users in an experience that closely mimics traditional phone calls. This direct-call experience still uses the websocket connection to Mozilla’s servers; if you look at the messages closely, they only serve the purpose of keeping the clients in sync regarding the call setup state.
By the way, the interfaces Mozilla uses for its servers are probably easier to read about than they are to reverse-engineer. For example: https://docs.services.mozilla.com/loop/apis.html#websockets-apis
You’ll also see mention that the websockets won’t be used for rooms (“conversations”) in the introductory section of the Rooms architecture document: https://wiki.mozilla.org/Loop/Architecture/Rooms
TokBox’s stuff is TokBox’s stuff. They don’t document the client interface, and I can’t speak to whether their future plans include doing so. The client library used by Hello, however, is open-source and part of the Firefox repository. Depending on your skillset, you may find reading the source easier than sniffing messages: https://dxr.mozilla.org/mozilla-central/source/browser/components/loop/content/shared/libs/sdk.js
Philipp Hancke says
hey Adam, the main point here is to demonstrate how to look at the signaling channel. Which makes it a nice companion to “how to look at the api usage” (hangouts) and “how to look at the wire” (mayday).
If one was just looking for a technical description of Hello I would actually recommend going to a source like hacks.mozilla.org. An article doing that would probably be good given concerns like https://lists.torproject.org/pipermail/tor-talk/2015-January/036615.html