One of WebRTC’s great features is its mandated strong encryption. Encryption mechanisms are built-in, meaning developers don’t (often) need to deal with the details. However, these easy, built-in encryption mechanisms assume you have: 1) media is communicated peer-to-peer and 2) a secure signaling channel setup. Most group-calling services make use of a media server device, like a Selective Forwarding Unit (SFU) that terminate and re-encrypt, preventing the end-to-end encryption (e2ee). As we have covered here before, WebRTC e2ee is still possible with new APIs like Insertable Streams. That addresses the first assumption, but what about the second? How does one set up secure signaling for e2ee?
That isn’t so easy. Dave Baker of Matrix.org touched on this briefly during his Kranky Geek Modern Call Signalling for WebRTC earlier this year. To give more detail, we asked Dave to dive into this topic with a more detailed explanation. I particularly like Dave’s review of some of the hard-to-grok crypto mechanisms that make e2ee work.
For background, Matrix, is an open-source project backed by a non-profit foundation that provinces secure, decentralized signaling for real-time communication. They have been at it since the early days of WebRTC and it is great to see the technology evolve into a global community over the years. Element is an affiliated commercial calling and messaging app / system based on Matrix.
{“editor”, “chad hart“}
As most of us who work with WebRTC are probably aware, mandatory, industry-standard end-to-end encryption was incorporated into the standard some time ago, so nowadays all calls are protected with DTLS. So, does that mean every WebRTC call is completely secure and confidential? As we will explain, not necessarily. In Matrix, we solved this problem using Matrix’s end-to-end encryption. In this blog post, we’ll look at the problem, and how we solved it.
Identity and Fingerprints
First, a quick, high-level primer on how DTLS establishes a secure channel. When a caller (for the sake of tradition, let’s call her Alice) places a WebRTC call by creating an offer, the browser will include some DTLS ‘fingerprints’ as part of that offer. These are short strings computed from the public key, or more formally, ‘certificate’ that the browser intends to use to encrypt a given call. The callee (Bob) will include the same when replying with his answer. When the first packets are exchanged between Alice and Bob over the media channel, they perform a key agreement. We don’t need to go into the maths of this: all we need to know for this purpose is that Alice and Bob can agree on a number that only they know by producing some mathematically related numbers – one being a secret they keep to themselves and another key they send to the other side. (See RFC6347 for the gory details) Even if a third party (Eve) saw every message exchanged between Alice and Bob, she would never be able to compute the same number because she doesn’t have the information Alice & Bob kept to themselves.
However, this all hinges on Alice and Bob knowing that they were talking to the right person in the first place. Eve could in fact be more than an eavesdropper (we’ll still call her Eve…) but instead of intercepting and modifying their messages: in which case she can make Alice and Bob think they’ve agreed on a secret key when actually they’ve both agreed on different secret keys with Eve. (See this webrtcHacks post from Alan Johnston for more on Man in the Middle attacks.)
This is where our DTLS fingerprints come in. If we can get Alice’s WebRTC offer, and therefore her fingerprint to Bob without Eve having the chance to modify it, our call is secure.
A Secure Transport
So how do we normally get an offer and an answer between Alice and Bob when setting up a WebRTC call? Well, in most cases, you’d probably send them over HTTPS – potentially via a WebSocket. This means the message is secure as far as your web server. In some cases, perhaps this is enough, but it is not true end-to-end encryption: if your web server were compromised then an attacker could modify these messages without Alice and Bob knowing. If Alice and Bob have previously communicated, we can do better: we can ensure that, no matter what happens on the server or servers, Alice’s call connects directly and securely with the same party she was messaging. In fact, with Matrix’s verification and cross-signing we can do even better than that, but more on that later.
So, how might we do this? Since there’s no standard signaling for WebRTC calls, there’s also no standard way of encrypting or securing the signaling. This is left up to the developer to get right.
End-to-end Encryption in Matrix: A rough guide
With Matrix’s end-to-end encryption, Alice and Bob can be sure the messages they receive from each other really are from the other party, haven’t been modified, and are also confidential.
Let’s start with when Alice first wants to send a message to Bob. There are a few steps to this, but as a slight simplification, Bob’s device has its own keypair, and Alice’s device encrypts a message with the public part of that key and sends it to Bob’s device using an algorithm called Olm which is a type of double ratchet algorithm – one of the state-of-the-art approaches used in cryptography to prevent even if one of the user’s keys is compromised. The “ratchet” part is a mechanism that can only move forward and can not be cryptographically reversed based on knowledge of previous states.
The double ratchet algorithm was invented by Trevor Perrin and Moxie Marlinspike and then popularised in Signal, and is used in many popular instant messaging applications. The algorithm is very good at ensuring that if an attacker manages to get hold of any of the messages exchanged, or indeed any of the keys for those messages, generally speaking, the worst they can do is read the specific messages they obtained the key for. As Alice and Bob continue to talk, an attacker can’t read their later messages. The encryption is self-healing: if a key is compromised, the channel becomes secure again.
So, Alice can securely send a message to Bob, and in some instant messaging apps, that’s where it would end. But in Matrix, the message that’s actually encrypted here isn’t Alice’s message, but another key. Why? Well, in Matrix Alice and Bob might have several devices each: a laptop, a phone, a tablet, and maybe more. Moreover, in general messaging, they could be in a room with many participants, each of which has several devices themselves. Using only Olm, Alice would have to encrypt to each of these devices for every single message she sends. This could end up being a lot of messages.
Matrix fixes this with another algorithm, Megolm: Alice generates a key (we call this a Megolm session) when she wants to speak and sends this to every device, rather than the message itself. When she wants to send another message, all she has to do is send a single message to the room, encrypted with the same Megolm session, and every device can read it. A bonus is that Alice’s device can start sending this session to other devices in advance of her sending a message, so the message arrives sooner. Plus, Megolm is a ratchet algorithm too, so an attacker who captures the current key of a session won’t be able to decrypt previous messages.
A World of Many Devices
So, we now understand that all of Alice’s messages to each of Bob’s devices are protected by the key of Alice’s device, using Olm and Megolm. What if Alice has several devices? What if she uses one device to send one message, then uses another to send her next message, or place a call? How does Bob know that both devices are really Alice?
The way this is solved in Matrix is by something called cross signing. In addition to each of Alice’s devices having its own private key, Alice also has a master key for herself. Whenever Alice signs in on a new device, she automatically signs the key of her new device with her master key (e.g. by using an existing device) and publishes this signature so that others know her new device is really her. We know this because the only ways to gain access to the private part of this master key are either for Alice to verify her new device from one of her old ones (by either scanning a QR code or comparing emoji symbols), or to enter her account’s Security Key or passphrase (i.e. her master key). This also means that, even if an attacker gained access to her account, or her server was compromised, the attacker would not be able to sign a new device against her account because the private part of her master key never goes near the server.
What this means is that once Bob knows what Alice’s master key is, he can be sure he’s really talking to Alice, even if she signs in on a new device: her master key remains the same.
Trust, but Verify
Once Bob has spoken to Alice, he can now be sure he’s talking to the same person each time he communicates with her, be that via text chat or call, but one last piece of the puzzle remains: how can he be sure that he really has the correct master key for Alice? What if Eve swapped in her own master key for Alice’s back when Alice and Bob first talked to each other?
We already have all the tools we need to solve this problem: Bob can verify Alice at any point in the same way Alice verifies her own devices: they can scan a QR code if they’re in the same room as each other, or if not, they can compare a sequence of emoji symbols. If they match, Alice and Bob have ensured out-of-band (that is, not via the Matrix network) that they each have the correct, authentic master key for each other, so their communications are secure.
Putting It All Together
In this post, we’ve seen how WebRTC provides security for calls, but we’ve also seen what a developer using the WebRTC platform needs to do to ensure this security really guarantees Alice and Bob know that their communication is secure and not being intercepted. We’ve learned about end-to-end encryption and the extra security it provides, and we’ve seen how Matrix implements the many different layers required to implement this when each participant has many devices.
Hopefully what we’ve talked about in this article will help developers working with WebRTC understand and decide what level of security is right for them, and implement that level of security within their application. Of course, the fully open-source Matrix project has SDKs in lots of different languages to get you started should you not want to implement this all yourself. Alternatively, you could also reference our implementation inside Element, our open source Matrix client for our commercial service.
{“author”: ”Dave Baker”}
Leave a Reply