As I anticipated in my post on WebRTC standardization, the IETF 87th meeting took place last week in Berlin, Germany. One of the agenda items for WebRTC was whether SDES should be part (and how) of WebRTC.
According to the IETF drafts, any WebRTC compliant implementation must support the RTP/SAVPF profile which builds on top of the Secure RTP profile RTP/SAVP. This means that media channels (e.g. audio, video) must be secured via Secure RTP (SRTP), which provides media encryption among other security features. In fact, the use of plain (unencrypted) RTP is explicitly forbidden by the WebRTC specifications.
WebRTC Key Management Alternatives
SRTP needs to interact with key management protocols (e.g. MIKEY, ZRTP, SDES, DTLS-SRTP) in order to negotiate the security parameters for the media traffic session. It’s worth noting that the signaling (e.g. SIP, HTTP) and media (e.g. RTP) involved in a multimedia communication can be secured independently. For instance, SDP Security Descriptions for Media Streams (SDES) and Multimedia Key Exchange (MIKEY) use the signaling plane to carry the security parameters of the session via SDP; this means the signaling should be in turn protected. However, securing signaling and media independently may be insufficient in some cases as it provides no guarantee that the signaling user is the same as the media user. Hence, a cryptographic binding between the two planes is desirable – ZRTP and DTLS-SRTP do this.
A lot of literature can be found on the topic, but in a nutshell:
SDES – security parameters and keys to set up SRTP sessions are exchanged in clear text in form of SDP attributes, hence relying on the signaling plane to secure the SDP message, using for instance TLS.
MIKEY – performs the key exchange and negotiates cryptographic parameters on behalf of multimedia applications. Its messages are transported in the SDP payload and encoded in base64.
ZRTP – a shared secret and other security parameters are exchanged relying on Diffie-Hellman. Mutual authentication can use a Short Authentication String (SAS), so it doesn’t require support from a PKI. The ZRTP exchange takes place over the same port numbers used by the multimedia session for the RTP traffic (as opposed to the signaling path).
DTLS-SRTP – enables the exchange of the cryptographic parameters and derive keying material. The key exchange takes place in the media plane and are multiplexed on the same ports as the media itself. We will elaborate on this in a future post but, in short, once some of the ICE checks have completed, DTLS-SRTP allows the SRTP media channel to be established with no need to reveal keys in the SDP message exchange as is done with SDES.
According to draft-ietf-rtcweb-rtp-usage-07 (current draft, July 2013), WebRTC:
Implementations MUST support DTLS-SRTP for key-management. Other key management schemes MAY be supported
DTLS-SRTP is the default and preferred mechanism meaning that if an offer is received that supports both DTLS-SRTP and SDES, DTLS-SRTP must be selected – irrespective of whether the signaling is secured or not. From what I’m seeing in the field most (I can’t say all) WebRTC trials are running signaling over TLS, usually via Secure HTTP (HTTPS) or Secure Websockets (WSS).
In general there are few doubts that DTLS-SRTP should be the mandatory to implement (MTI) security mechanism for WebRTC real-time media (well, some argue ZRTP could potentially provide a simpler approach or even better protection in some scenarios). The discussion in Berlin was whether the IETF, beyond mandating support for DTLS-SRTP, could provide recommendations on how to reuse some other existing mechanisms we have in place today in order to provide backward compatibility; namely to reuse SDES in WebRTC-to-SIP interworking scenarios. Yes, it’s true that the majority of RTP traffic in VoIP networks is not secured today – in fact, this is one of the very first features customers usually ask vendors to remove in order to meet their budgets. However, when secured, most of the deployments I’ve seen are using SDES (which as mentioned has a strong dependence on the signaling plane security).
When it comes to WebRTC implementations, Google’s Chrome supports both DTLS-SRTP and SDES while Mozilla’s Firefox only implements DTLS-SRTP. As a curiosity, I’m currently involved in field trials where some of the WebRTC-to-SIP gateways under evaluation only implement SDES and have DTLS-SRTP as a roadmap item. Those gateway vendors probably considered SDES to be good enough for the basic interworking scenario and expected it to be adopted at some point by the WebRTC specifications.
- WebRTC Browser security key support
The slides from the discussion in Berlin can be found here. As can be seen in the presentations, some of the arguments from the SDES supporters included:
Commercial incentive – it’s already there and works
Early media clipping
Tradeoff between complexity and cost
Allowance for end-to-end SRTP in interworking cases
On the last bullet, this means one would not need to decrypt the SRTP traffic on the interworking gateway if the VoIP endpoint at the other end is using SDES.
Of course, the main point of debate was whether SDES really degrades security in the interworking use case when compared to DTLS-SRTP. In my opinion some of the arguments used against SDES also apply to DTLS-SRTP. Note that SDES supporters were not proposing it for all use cases but were advocating somehow limiting its applicability to the interworking scenario instead.
Others argued that backward compatibility was not such an important factor when one will need to use media gateways anyway (at a minimum for ICE termination); so DTLS-SRTP-to-SDES interworking could be just yet another feature of the gateway function (as shown in the following diagram extracted from this presentation).
As a result of the discussion, not only was SDES not recommended from select use cases like WebRTC-to-SIP interworking, but it has been completely prohibited for WebRTC in general. Adopting SDES was interpreted by most as imposing a requirement on web-browsers beyond the scope of browser-to-browser communications while potentially degrading their security properties.
I believe having a single solution might be beneficial from the interoperability perspective (i.e. the less options we have usually means the better interoperability) but I’m wondering what the industry is really going to implement, especially in non-browser WebRTC scenarios that might have different security requirements or might want to directly interoperate with existing VoIP devices (as I’m seeing in a project, this can mean several millions of endpoints). Time will tell how vendors react, but this would definitely not be the first case where a specific feature/behavior is disabled by default to be RFC-compliant but a configurable, and sometimes hidden, option is provided “at one’s own risk” to support the feature 😉
Want to learn a bit more about WebRTC security? We will elaborate on this topic in future blog entries. In the meantime, find here and here a couple of interesting references. My friends and former colleagues Jiri, Dorgham, John, Uli and Henning include a very comprehensive description on multimedia security in their SIP Security book. You can also send me an email to [email protected] or follow me on Twitter at @victorpascual.