WebRTC is supposed to be secure. A lot more than previous VoIP standards. It isn’t because it uses any special new mechanism, but rather because it takes it seriously and mandates it for all sessions.
Alan Johnston decided to take WebRTC for a MitM spin – checking how easy is it to devise a man-in-the-middle attack on a naive implementation. This should be a reminder to all of us that while WebRTC may take care of security, we should secure our signaling path and the application as well.
{“editor”: “tsahi“}
Earlier this year, I was invited to teach a graduate class on WebRTC at IIT, the Illinois Institute of Technology in Chicago. Many of you are probably familiar with IIT because of the excellent Real-Time Communications (RTC) Conference (http://www.rtc-conference.com/) that has been hosted at IIT for the past ten years.
I’ve taught a class on SIP and RTC at Washington University in St. Louis for many years, but I was very excited to teach a class on WebRTC. One of the key challenges in teaching is to come up with ways to make the important concepts come alive for your students. Trying to make security more interesting for my students led me to write my first novel, Counting from Zero, a technothriller that introduces concepts in computer and Internet security (https://countingfromzero.net). For this new WebRTC class, I decided that when I lectured about security, I would – without any warning – launch a man-in-the-middle (MitM) attack (https://en.wikipedia.org/wiki/Man-in-the-middle_attack) on my students.
It turned out the be surprisingly easy to do, for two reasons.
- It is so quick and easy to prototype and test new ideas with WebRTC. JavaScript is such a fun language to program in, and node.js makes it really easy on the server side.
- Unfortunately, WebRTC media sessions have virtually no protection against MitM attacks. Not many people seem to be aware of the fact that although WebRTC uses DTLS to generate keys for the SRTP media session, the normal protections of TLS are not available.
So, a few weeks later, I had a WebRTC MitM attack ready to launch on my students that neither Chrome or Firefox could detect.
How did it work? Very simple. First, I compromised the signaling server. I taught the class using the simple demo application from the WebRTC book (http://webrtcbook.com) that I wrote with Dan Burnett. (You can try the app with a friend at http://demo.webrtcbook.com:5001/data.html?turnuri=1.) The demo app uses a simple HTTP polling signaling server that matches up two users that enter the same key and allows them to exchange SDP offers and answers.
I compromised the signaling server so that when I entered a key using my MitM JavaScript application, instead of the signaling server connecting the users who entered that key, those users would instead be connected to me. When one of the users called the other, establishing a new WebRTC Peer Connection, I would actually receive the SDP offer, and I would answer it, and then create a new Peer Connection to the other user, sending them my SDP offer. The net result was two Peer Connections instead of one, and both terminated on my MitM JavaScript application. My application performs the SDP offer/answer negotiation and the DTLS Handshake with each of the users. Each of the Peer Connections was considered fully authenticated by both browsers. Unfortunately, the Peer Connections were fully authenticated to the MitM attacker, i.e. me.
Here’s how things look with no MitM attacker:
Here’s how things look with a MitM attacker who acts as a man-in-the-middle to both the signaling channel and DTLS:
How hard was it to write this code? Really easy. I just had to duplicate much of the code so that instead of one signaling channel, my MitM JavaScript had two. Instead of one Peer Connection, there were two. All I had to do was take the MediaStream I received incoming over one Peer Connection and attach it to the other Peer Connection as outgoing, and I was done. Well, almost. It turns out that Firefox doesn’t currently support this yet (but I’m sure it will one of these days) and Chrome has a bug in their audio stack so that the audio does not make it from one Peer Connection to another (see bug report https://code.google.com/p/webrtc/issues/detail?id=2192#c15). I tried every workaround I could think of, including cloning, but no success. If anyone has a clever workaround for this bug, I’d love to hear about it. But the video does work, and in the classroom, my students didn’t even notice that the MitM call had no audio. They were too busy being astonished that after setting up their “secure WebRTC call” (we even used HTTPS which gave the green padlock – of course, this had no effect on the attack but showed even more clearly how clueless DTLS and the browsers were), I showed them my browser screen which had both of their video streams.
When I tweeted about this last month, I received lots of questions, some asking if I had disclosed this new vulnerability. I answered that I had not, because it was not an exploit and was not anything new. Everyone involved in designing WebRTC security was well aware of this situation. This is WebRTC working as designed – believe it or not.
So how hard is it to compromise a signaling server? Well, it was trivial for me since I did it to my own signaling server. But remember that WebRTC does not mandate HTTPS (why is that, I wonder?). So if HTTP or ordinary WebSocket is used, any attacker can MitM the signaling if they can get in the middle with a proxy. If HTTPS or secure WebSocket is used, then the signaling server is the where the signaling would need to be compromised. I can tell you from many years of working with VoIP and video signaling that signaling servers make very tempting targets for attackers.
So how did we get here? Doesn’t TLS and DTLS have protection against MitM attacks?
Well, TLS as used in web browsing uses a certificate from the web server issued by a CA that can be verified and authenticated. On the other hand, WebRTC uses self-signed certificates that can’t be verified or authenticated. See below for examples of self-signed certificates used by DTLS in WebRTC from Chrome and Firefox. I extracted these using Wireshark and displayed them on my Mac. As you can see, there is nothing to verify. As such, the DTLS-SRTP key agreement is vulnerable to an active MitM attack.
The original design of DTLS-SRTP relied on exchanging fingerprints (essentially a SHA-256 hash of the certificate, e.g. a=fingerprint:sha-256 C7:4A:8A:12:F8:68:9B:A8:2A:95:C9:5E:7A:2A:CE:64:3D:0A:95:8E:E9:93:AA:81:00:97:CE:33:C3:91:50:DB
) in the SIP SDP offer/answer exchange, and then verifying that the certificates used in the DTLS Handshake matched the certificates in the SDP. Of course, this assumes no MitM is present in the SIP signaling path. The protection against a MitM in signaling recommended by DTLS-SRTP is to use RFC 4474 SIP Enhanced Identity for integrity protection of the SDP in the offer/answer exchange. Unfortunately, there were major problems with RFC 4474 when it came to deployment, and the STIR Working Group in the IETF (https://tools.ietf.org/wg/stir/) is currently trying to fix these problems. For now, there is no SIP Enhanced Identity and no protection against a MitM when DTLS-SRTP is used with SIP. Of course, WebRTC doesn’t mandate SIP or any signaling protocol, so even this approach is not available.
For WebRTC, a new identity mechanism, known as Identity Provider, is currently proposed (https://tools.ietf.org/html/draft-ietf-rtcweb-security-arch). I will hold off on an analysis of this protocol for now, as it is still under development in an Internet-Draft, and is also not available yet. Firefox Nightly has some implementation, but I’m not aware of any Identity Service Providers, either real or test, that can be used to try it out yet. I do have serious concerns about this approach, but that is a topic for another day.
So are we out of luck with MitM protection for WebRTC for now? Fortunately, we aren’t.
There is a security protocol for real-time communications which was designed with protection against MitM – it is ZRTP (https://tools.ietf.org/html/rfc6189) invented by Phil Zimmermann, the inventor of PGP. ZRTP was designed to not rely on and not trust the signaling channel, and uses a variety of techniques to protect against MitM attacks.
Two years ago, I described how ZRTP, implemented in JavaScript and run over a WebRTC data channel, could be used to provide WebRTC the MitM protection it currently lacks (https://tools.ietf.org/html/draft-johnston-rtcweb-zrtp). During TADHack 2015(http://tadhack.com/2015/), if my team sacrifices enough sleep and drinks enough coffee, we hope to have running code to show how ZRTP can detect exactly this MitM attack.
But that also is a subject for another post…
{“author”: “Alan Johnston“}
Timothy Panton says
Alan,
You can get the audio result you want in firefox by using AudioContexts to pipe the
audio from one peer-connection to another.
I’ve been working on solving this issue and have come up with a nice standards compliant way – which I’ve filed a patent on – https://yopet.us demos it in a way that avoids needing to read long (or short) hex strings to each other.
Happy to catch up if you want more info.
Tim.
Alan Johnston says
Hey Tim,
Thanks for the pointer on Firefox – I’ll give AudioContexts a try.
Yopet is cool! Very handy for talking to your “late” parrot. I’m presuming that it is essentially a cached shared secret between the two browsers. This allows you to authenticate the other browser for all future calls, which is useful. However, this doesn’t help if you and I haven’t talked before – we have no shared secret and hence no protection against MitM. Now ZRTP also has this feature, so that once the SAS has been verified, a shared secret is cached which means you don’t need to check the SAS again as long as the caches remain. Your approach of using the screen to set the shared secret is very cool!
Of course, ZRTP isn’t the only approach, but it does solve the MitM problem without trusting any third party or service. And ZRTP can use words instead of hex digits, so the users only need to compare two words.
– Alan –
Tim Panton says
The innovation is in the selection of the shared token, which isn’t a secret by the way, and the way we use it.
We not only have to have talked before, but the way YoPet works the two devices have been in physical proximity to form the initial pairing visually. (other methods are possible of course). From then on the cached token maintains the pairing with just a few lines of javascript.
Philipp Hancke says
forwarding should just work in Firefox — what version were you checking?
Samuel Erb says
I have a working webrtc data-channel MiTM demo online right now – http://blog.erbbysam.com/?p=149 (be careful to follow the directions regarding usernames).
It’s unclear to me why ZRTP is the answer here. You still need to verify SAS strings (on first connection), right? IdP’s could be a lot more transparent to the end user (if I’m not mistaken) & increases MiTM complexity to include compromising IdP’s (still not as secure as SAS strings, but much more secure than what exists now).
If ZRTP SAS strings exchanged out-of-band are the solution here, why not just exchange DTLS fingerprints?
Regarding identity providers – what are your concerns?
Alan Johnston says
Hi Samuel,
I’ll take a look at your link. This post isn’t about how ZRTP protects against this attack, so it is missing all the details you crave. ZRTP can detect the DTLS MitM attack immediately by checking the DTLS fingerprints, without the users checking the SAS, if the attacker is not also doing a MitM attack on ZRTP. If the attacker is also doing a MitM attack on ZRTP, then comparing the SAS will detect this very determined MitM as well.
Identity Provider is also another topic. My two main high level concerns are: 1. We don’t have it yet, either in browsers or providers, and 2. It is a third-party who has to be trusted, and could be compromised or compelled to lie. Using a peer-to-peer approach such as ZRTP provides MitM protection without trusting or relying on any third party.
– Alan –