For the last year and a half I’ve been working with a number of customers helping them to understand what WebRTC is about, supporting them in the definition of new products, services, and in some cases even developing WebRTC prototypes/labs for them. I’ve spent time with Service Providers, Enterprise and OTT customers and the very first time I demoed WebRTC to them, after the initial ‘wow moment’ almost all of them complained about the ‘call setup delay’, as in some cases represented tens of seconds.
Why is there such delay? In short, because of the ICE (Interactive Connectivity Establishment) processing. As we have mentioned in our past networking posts, an endpoint using ICE needs to gather candidates, prioritize them, choose default ones, exchange them with the remote party, pair them and order into check lists. Once all this have been completed, and only then, the endpoints can begin a phase of connectivity checks and eventually select the pair of address candidates that will be used in the session; this process can lead to relatively lengthy session establishment times.
Fortunately, Emil Ivov (founder and current lead of the Jitsi project) and others have been working on an extension to ICE that allows agents to send and receive candidates incrementally rather than waiting to exchange a complete list. With such incremental provisioning, ICE agents can begin connectivity checks while they are still gathering candidates and considerably shorten the time necessary for ICE processing to complete; thereby, reducing the ‘call setup delay’ customers were complaining about.
Please read below for Emil’s review of the ICE trickle extension.
ICE always tastes better when it trickles! (by Emil Ivov)
“The one thing I love about WebRTC is how it makes all communication P2P”
This is a statement I read or hear very often when people discover WebRTC. NATs have been causing such a mess in VoIP, for so long that many people have come to believe that only a brand new technology, like WebRTC, could possibly ever save us from this ordeal.
Well … this is not quite the case. Just as with RTP, SDP, DTLS/SRTP, RTCP-MUX and many others, WebRTC borrows most of its NAT traversal utilities from pre-existing mechanisms. To be fair though, there is this one bit that WebRTC helps improve: establishing a connection as quickly as possible with the help of Trickle ICE.
We will dive into the details of Trickle ICE in just a minute, first however, let’s have:
A lil’ bit of history
The way that many VoIP protocols work is actually quite simple: Whenever Arya wants to talk to her brother Bran (probably to inform him on the imminence of a particular season), she would send him a message containing her IP address. If Bran chooses to accept the call, he would respond with his own IP address. From then on the actual exchange of audio and video can happen directly between them. Peer-to-peer media of parties across different domains is a key aspect of the traditional “VoIP Trapezoid” and allows us to keep VoIP lightweight. It reduces the need for infrastructure.
The model breaks as soon as you introduce NATs into the picture. If Arya was to get a private, NATed IP address from Bran she wouldn’t be able to send him audio or video. It would be pretty much the same if one day you got a letter in your box that urgently asked for a reply but where the return address said: “The house with the yellow door”.
While NATs could maybe be ignored back in the mid ‘90s when protocols like SIP were born, this was no longer the case a few years later. It became evident that they were here to stay and that VoIP needed patching. And patch we did …
First, we tried to handle things in efficient ways so that media could continue flowing (almost) directly between Arya and Bran. That’s how we came up with STUN. STUN works pretty much the same way as whatismyip.com. It allows VoIP clients to ask a server what their public address is so that Arya can then include that in her call invitation to Bran.
While STUN works in a number of cases, there are a number of others where it doesn’t. Some NATs, you see, would allocate a different port number for every connection you made. This means that the port number you get from the STUN server would not be the one you need for your VoIP call and process would once again fail.
We didn’t give up. Many vendors tried using things like UPnP or the Port Control Protocol (PCP), where you would ask your gateway to allocate a port for you. That worked … when it worked and it didn’t when the above protocols were not supported by the gateway. Overall we came up with the same partial success we had with STUN.
Even IPv6 was in that situation: it allowed for direct connections when supported by the network but one couldn’t count on it to always be there.
The only reliable techniques were those that relied on relaying all media traffic through the server. These could be transparent solutions like Hosted NAT Traversal and Latching or client-controlled ones like TURN and Jingle Relay Nodes.
The situation becomes particularly problematic due to the fact that the SDP Offer/Answer process that SIP uses for media negotiation, is a one shot thing: you “offer” an address and it either passes or it fails. SDP Offer/Answer doesn’t allow you to, say, start with STUN realise it doesn’t work and then simply fall back to TURN.
This left everyone with the following choice: VoIP providers could either relay everything with the cost implications that came with this or insist on using unreliable solutions like STUN and just let some percentage of their users bite the bullet when that fails.
As a result NAT traversal was often a cause of great frustration and many people arrived at the conclusion that there was simply no way out of that situation.
Then we started having ICE
That’s when a new solution emerged: the concept of interactive connectivity establishment. The mechanism got standardized by the IETF as the ICE protocol and it was in use by applications like Skype and Google Talk even before that.
The idea is rather simple. It’s about acknowledging that none of the mechanisms above are perfect, so instead of insisting on our favorite, well we simply try them all. Before a client is to start a call, it would gather all possible addresses it can. Well, technically they are not only addresses but rather address:port/transport triplets, where the transport can be UDP or TCP. ICE calls these triplets “candidates” and so shall we.
The ICE RFC5245 only talks about using STUN and TURN but in practice nothing prevents clients from also leveraging on the other NAT traversal tricks mentioned above … and most clients do.
So, once all the candidates are gathered, they are ordered in a list based on priorities. Highest priorities are assigned to candidates with the least overhead: those that you get from the device itself, the IPv6 and IPv4 “host” candidates. Next come STUN candidates. The ICE specification calls them server reflexive because they do work a little bit like a mirror. This is also where you would place candidates obtained with UPnP and PCP.
Finally come the “relayed” candidates that clients obtain from TURN or Jingle Nodes servers. The prioritization happens in a way that would only result in use of relaying when no other route is available between two endpoints.
So is that it? Did we reach El Dorado? It sounds like with ICE we could have direct or almost direct connectivity when that’s at all possible and fall back to relaying only when that’s necessary.
Awesome! No more NAT problems … right?
Having ICE will slow you down
While ICE was percolating through the IETF, it saw a lot of criticism, a fair amount of which was directed at its complexity. The IETF’s response to this was:
“ICE is complex because we live in a complex world.”
With time more and more endpoints added ICE because of the efficiency promise it carried, but even when implementations were becoming stabler and stabler one specific problem persisted: setting up a connection through ICE could take a considerable amount of time … as in tens of seconds.
Every time a client is preparing to setup an ICE connection it needs to gather candidate addresses. In addition to addresses that appear on the local interfaces, this also includes acquiring STUN bindings, Jingle Nodes and TURN allocations or executing UPnP or PCP queries.
Once the above are completed, the client would take all the resulting candidates and send them to the remote party where the exact same process would start.
Not only that but many of these have to be executed through all the interfaces on the client endpoint. There are many ways for this process to go wrong and make you wish for a coffee while waiting for your call to connect:
- A VPN interface with no Internet connectivity. This would mean that VoIP clients will try to get in touch with STUN and TURN servers through it but never get a response. Their transactions would have to timeout before the client proceeds
- A wireless interface that appears “up” but is otherwise not connected (e.g. because it attached to a hotspot with a captive portal)
- A STUN/TURN server that went down for some reasons and is not even returning “port unreachable” ICMP errors.
In all of the above cases ICE would be peacefully dripping STUN requests while (as Chef would put it):
… you wait, and you wait, and you wait and you wait …
… and you wait, and you wait, and you wait …
The ICE RFC 5245 recommends that instead of waiting for the user to make a call, clients should start their candidate harvest as soon as they have a cue that a connection might be attempted soon. Picking up the receiver off the hook would be a good indication for this. This however, only applies to hardware phones as soft phones don’t always have a reliable indication that a call is pending. Also that kind of optimization only solves half the problem and the time of a complete harvest would still have to be endured when the remote party receives the call.
An interesting side effect of all this waiting is that we have started seeing weird things happen, like soft phones generating long, fake DTMF sequences while dialing, only to amuse the user while rushing through the stages of ICE.
So, this is where we get to:
The idea behind the protocol is that rather than waiting for a whole bucket of candidate addresses to fill, clients would start exchanging, or trickling the newly found addresses one by one, as they discover them.
In other words, a Trickle ICE client would gather local host candidates and immediately send them to the remote party. It would then discover a TURN/STUN address and send that, then potentially UPnP and send that, etc.
Making candidate harvesting a non-blocking process, has a significant impact on the speed of the entire process.
First of all, given how the initial batch of host candidates is not delayed, the call invitation can leave very early and both clients can start the discovery process almost simultaneously.
Not only that but they would also be able to immediately start verifying whether the candidates they have can work, all the while receiving new candidates and adding them to their checklists.
As soon as every client is done harvesting addresses, it would send an “end-of-candidates” signal to its peer so that it would know that this is all there is and if nothing works then it is free to declare failure.
As a result both clients perform discovery and connectivity checks simultaneously and it is possible for call establishment to happen in milliseconds. Also, possibly even more importantly, it is no longer possible for half-dead interfaces and inaccessible servers to delay the process.
The best part is that there is no downside. ICE isn’t less efficient or less reliable as a result. It is a pure improvement … in every way but one: it is not backward compatible!
Trickle ICE with SIP
Lack of backward compatibility, that is being able to have a vanilla ICE implementation successfully talk to a Trickle ICE, is not a problem in cases where support for trickling can be predicted. XMPP, for example, has always had advanced discovery mechanisms, that allow clients to know whether or not they should be using trickle before they get into the call.
WebRTC does not have discovery, but it doesn’t really need it in this case because it makes support for Trickle ICE mandatory. Any web app can hence bravely enable and use it most of the time.
Things get complicated for SIP because it has neither of the above: it has neither the reliable discovery mechanisms of XMPP, nor the mandatory support for trickling that WebRTC comes with. Every call initiating SIP INVITE needs to look in a way that can be handled by legacy endpoints. This includes legacy ICE implementations.
To circumvent the problem Trickle ICE comes with a special mode of operation called “Half Trickle”. It basically implies that the calling side starts by performing ICE regularly. It gathers all candidates and only then it sends its first INVITE request. If the callee supports Trickle ICE, the rest of the process is still optimized because it can send candidates incrementally speed up the discovery of a working one.
Once “Half Trickle” has been performed at the initiation of a call, subsequent SIP INVITEs (reINVITEs) can use “Full Trickle” and have all the benefits of Trickle ICE. Similarly, SIP clients that support Trickle ICE and that run in controlled environments can be configured to assume support for Trickle ICE in every callee and avoid use of “Half Trickle”.
|NAT Traversal Standards||Acronym||Specification|
|Session Traversal Utilities for NAT||STUN||IETF RFC 5389|
|Port Control Protocol||PCP||IETF RFC 6887|
|Hosted NAT Traversal||HNT||IETF Draft MMUSIC-Latching|
|Traversal Using Relays around NAT||TURN||IETF RFC 5766|
|Jingle Relay Nodes||XSF XEP-0278|
|Interactive Connectivity Establishment||ICE||IETF RFC 5245|
|Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol||Trickle ICE||IETF DRAFT MMUSIC-Trickle-ICE|
Summary, or, if you only read one paragraph
Going back to where we started from: many of the technologies adopted by WebRTC, such as NAT traversal, media encryption, port multiplexing, and many others, have existed for a while now and have even gained considerable levels of adoption.
The popularity of WebRTC does bring these technologies to the front however, and many of them are now seeing various improvements. Trickle ICE is one such improvement to the ICE protocol. It does not make it more or less likely for you to establish direct sessions across the Internet, but it does make the process considerably faster and less noticeable to users.
Come to think of it, many could argue that this is just as important as the actual optimization …