As I anticipated in my post on WebRTC standardization, the IETF 87th meeting took place last week in Berlin, Germany. One of the agenda items for WebRTC was whether SDES should be part (and how) of WebRTC.
According to the IETF drafts, any WebRTC compliant implementation must support the RTP/SAVPF profile which builds on top of the Secure RTP profile RTP/SAVP. This means that media channels (e.g. audio, video) must be secured via Secure RTP (SRTP), which provides media encryption among other security features. In fact, the use of plain (unencrypted) RTP is explicitly forbidden by the WebRTC specifications.
WebRTC Key Management Alternatives
SRTP needs to interact with key management protocols (e.g. MIKEY, ZRTP, SDES, DTLS-SRTP) in order to negotiate the security parameters for the media traffic session. It’s worth noting that the signaling (e.g. SIP, HTTP) and media (e.g. RTP) involved in a multimedia communication can be secured independently. For instance, SDP Security Descriptions for Media Streams (SDES) and Multimedia Key Exchange (MIKEY) use the signaling plane to carry the security parameters of the session via SDP; this means the signaling should be in turn protected. However, securing signaling and media independently may be insufficient in some cases as it provides no guarantee that the signaling user is the same as the media user. Hence, a cryptographic binding between the two planes is desirable – ZRTP and DTLS-SRTP do this.
A lot of literature can be found on the topic, but in a nutshell:
SDES – security parameters and keys to set up SRTP sessions are exchanged in clear text in form of SDP attributes, hence relying on the signaling plane to secure the SDP message, using for instance TLS.
MIKEY – performs the key exchange and negotiates cryptographic parameters on behalf of multimedia applications. Its messages are transported in the SDP payload and encoded in base64.
ZRTP – a shared secret and other security parameters are exchanged relying on Diffie-Hellman. Mutual authentication can use a Short Authentication String (SAS), so it doesn’t require support from a PKI. The ZRTP exchange takes place over the same port numbers used by the multimedia session for the RTP traffic (as opposed to the signaling path).
DTLS-SRTP – enables the exchange of the cryptographic parameters and derive keying material. The key exchange takes place in the media plane and are multiplexed on the same ports as the media itself. We will elaborate on this in a future post but, in short, once some of the ICE checks have completed, DTLS-SRTP allows the SRTP media channel to be established with no need to reveal keys in the SDP message exchange as is done with SDES.
According to draft-ietf-rtcweb-rtp-usage-07 (current draft, July 2013), WebRTC:
Implementations MUST support DTLS-SRTP for key-management. Other key management schemes MAY be supported
DTLS-SRTP is the default and preferred mechanism meaning that if an offer is received that supports both DTLS-SRTP and SDES, DTLS-SRTP must be selected – irrespective of whether the signaling is secured or not. From what I’m seeing in the field most (I can’t say all) WebRTC trials are running signaling over TLS, usually via Secure HTTP (HTTPS) or Secure Websockets (WSS).
In general there are few doubts that DTLS-SRTP should be the mandatory to implement (MTI) security mechanism for WebRTC real-time media (well, some argue ZRTP could potentially provide a simpler approach or even better protection in some scenarios). The discussion in Berlin was whether the IETF, beyond mandating support for DTLS-SRTP, could provide recommendations on how to reuse some other existing mechanisms we have in place today in order to provide backward compatibility; namely to reuse SDES in WebRTC-to-SIP interworking scenarios. Yes, it’s true that the majority of RTP traffic in VoIP networks is not secured today – in fact, this is one of the very first features customers usually ask vendors to remove in order to meet their budgets. However, when secured, most of the deployments I’ve seen are using SDES (which as mentioned has a strong dependence on the signaling plane security).
When it comes to WebRTC implementations, Google’s Chrome supports both DTLS-SRTP and SDES while Mozilla’s Firefox only implements DTLS-SRTP. As a curiosity, I’m currently involved in field trials where some of the WebRTC-to-SIP gateways under evaluation only implement SDES and have DTLS-SRTP as a roadmap item. Those gateway vendors probably considered SDES to be good enough for the basic interworking scenario and expected it to be adopted at some point by the WebRTC specifications.
- WebRTC Browser security key support
The slides from the discussion in Berlin can be found here. As can be seen in the presentations, some of the arguments from the SDES supporters included:
Commercial incentive – it’s already there and works
Early media clipping
Tradeoff between complexity and cost
Allowance for end-to-end SRTP in interworking cases
On the last bullet, this means one would not need to decrypt the SRTP traffic on the interworking gateway if the VoIP endpoint at the other end is using SDES.
Of course, the main point of debate was whether SDES really degrades security in the interworking use case when compared to DTLS-SRTP. In my opinion some of the arguments used against SDES also apply to DTLS-SRTP. Note that SDES supporters were not proposing it for all use cases but were advocating somehow limiting its applicability to the interworking scenario instead.
Others argued that backward compatibility was not such an important factor when one will need to use media gateways anyway (at a minimum for ICE termination); so DTLS-SRTP-to-SDES interworking could be just yet another feature of the gateway function (as shown in the following diagram extracted from this presentation).
As a result of the discussion, not only was SDES not recommended from select use cases like WebRTC-to-SIP interworking, but it has been completely prohibited for WebRTC in general. Adopting SDES was interpreted by most as imposing a requirement on web-browsers beyond the scope of browser-to-browser communications while potentially degrading their security properties.
I believe having a single solution might be beneficial from the interoperability perspective (i.e. the less options we have usually means the better interoperability) but I’m wondering what the industry is really going to implement, especially in non-browser WebRTC scenarios that might have different security requirements or might want to directly interoperate with existing VoIP devices (as I’m seeing in a project, this can mean several millions of endpoints). Time will tell how vendors react, but this would definitely not be the first case where a specific feature/behavior is disabled by default to be RFC-compliant but a configurable, and sometimes hidden, option is provided “at one’s own risk” to support the feature 😉
Want to learn a bit more about WebRTC security? We will elaborate on this topic in future blog entries. In the meantime, find here and here a couple of interesting references. My friends and former colleagues Jiri, Dorgham, John, Uli and Henning include a very comprehensive description on multimedia security in their SIP Security book. You can also send me an email to [email protected] or follow me on Twitter at @victorpascual.
Vijay K. Gurbani says
The issue Victor is discussing is not endemic to WebRTC only. It is a general problem that the SIP world has tackled for a bit.
The paper below provides relative merits and disadvantages of various keying schemes in an Internet multimedia telecommunication environment. If you would like a copy, you can send me an email at vkg AT bell-labs DOT com.
Gurbani, V.K. and Kolesnikov, V., “A Survey and Analysis of Media Keying Techniques in the Session Initiation Protocol (SIP),” In IEEE Communications Surveys and Tutorials, 13(2), pp. 183-198, 2011.
Victor Pascual says
Thanks for the reference Vijay — very interesting paper.
BTW, this new Internet-Draft was published a few hours ago: “Using ZRTP to Secure WebRTC”
Victor Pascual says
Quick update: Google just announced Chrome is phasing out SDES in a multi-step process
“1) In Chrome 31, DTLS is now on by default, although SDES is still offered as well. You no longer need to pass the DtlsSrtpKeyAgreement:true constraint to enable DTLS. Since we are using per-origin certificate caching, the performance hit of generating a DTLS cert is no longer an issue, allowing us to make this the default. No application changes are needed at this time, although DTLS can be disabled by setting DtlsSrtpKeyAgreement:false, which reverts to SDES-only operation.
2) In an upcoming version of Chrome, probably Chrome 33, SDES will no longer be offered by default, and will only be used if a new TBD constraint is specified. For applications that require SDES, this will require an application change to specify this new constraint.
3) In a future version of Chrome, TBD at this point, this SDES constraint will be removed and only DTLS-SRTP will be supported. We expect this to occur sometime in 2014, so please begin migration of your applications to DTLS-SRTP as soon as possible.”
Thanks for the interesting post Victor.
I was just searching for some security issues about WebRTC and I found this. Maybe, the current status about this has changed, due to the post was published about a year ago I am new in WebRTC, and everything appears that WebRTC is using DLTS-RTSP. However, What happens with the security in mobile phones?. As far as I know RTSP could be a little slow to decrypt Video, for example, in mobile plattforms.
Thanks in advance
Chad Hart says
Please don’t mix up Real-time Streaming Protocol (RTSP) with Secure Real-time Transport (SRTP). WebRTC uses DTLS-SRTP.
DTLS-SRTP like all encryption does require decryption, and there is some overhead associated with this but it is miniscule on modern devices. Concerns about encryption costs have usually been focused around server side equipment that needs to handle high volumes and therefore could potentially increase the price of offering a service.
Right, I was referreing to DTLS-SRTP no DTLS-RTSP, sorry for my mistake.
I wanted some kind of clarification before doing some test, since the minuscle overhead associated in modern devices is what I expected.
Many thanks for your response.