In my last post (a long time ago) I introduced the issue of NATs and Firewalls, and the tools WebRTC uses to overcome them. First off, my apologies for the lengthy hiatus after promising to continue the discussion of NAT/Firewall traversal. Since that entry, I became a Dad for the 2nd time, and lets just say that so far it seems like 1+1 > 2, in terms of time and energy spent on daddy-duty! Thankfully, Chad has saved me with a brand new baby monitor solution using WebRTC
So, picking up where we left off…To better understand how exactly STUN, ICE, and TURN overcome the problem of NAT and Firewalls, lets take a closer look at a standard WebRTC use case: Bob calls Alice using WebRTC.
The following are the high level steps that most WebRTC applications would take to make this call work:
-
Bob indicates (through the web UI) he is ready to start a session with Alice
-
Bob’s web application interacts with the browser API to prepare a peerConnection (a bunch of potential steps here, but summarized for brevity)
-
Bob’s browser returns a blob of data representing the details of Bob’s session offer, including the layer 3 information about where it is prepared to receive session data
-
Bob’s web application signals (through any number of means) the offer to Alice’s web application
-
Alice’s web application notifies Alice of the incoming session request, and Alice indicates (through the web UI) she wants to accept the session
-
Alice’s web application prepares it’s peerConnection, and provides it Bob’s offer
-
Alice’s browser returns a blob of data representing the details of Alice’s session answer, including the layer 3 information about where it is prepared to receive session data
-
Alice’s web application signals the answer to Bob’s web application
-
Bob’s web application feeds the answer to the browser peerConnection
-
Bob and Alice’s browsers do magic, the browsers send session data to the IP address/port each one indicated, and a session is established!
As shown in the previous post on NAT/Firewalls, if this process goes all the way to step 10 with one or more NAT/Firewalls in the middle, session data may not be sent to the right place, or might be blocked. If one browser wants to tell another where it can be reached for a session, but doesn’t know what the topology in between is, what’s a browser to do? Well…how about ask someone in the middle, that has visibility to both sides! This is where STUN comes in.
As Bob’s browser prepares it’s peerConnection (step 2-3), it has no way of knowing or predicting what kind of topology it might face in creating it’s peerConnection to Alice’s browser. The browser knows it’s local interface IP address and Port it has ready for the session, but doesn’t know what IP:Port might exists in a NAT binding in the middle. So rather than signal one address:port to the other endpoint, the browser goes into a process called “ICE candidate gathering” to generate a list of potential IP:Port endpoints with which to establish a connection on. This process begins by sending out message to a server on the public internet, using the STUN protocol. This initial message, called a binding request, originates from the local IP:Port the browser already has allocated, and is the equivalent of the browser asking the server, “hey, what was the source IP and port that you see this request come from?” The server returns a success response with the IP:Port that it saw. If no changes were made to the packet in transit, than the response would match the local IP:Port that the browser already knows about. If a NAT device is in the middle changing the source address or port, the response would equal the public IP:Port bound by NAT device, to the browser. Depending on how the NAT device operates, it will potentially forward packets received at this public IP:Port, to the internal IP:Port of the browser. So, this IP:Port gets added to the list of candidates where it might be able to receive session data (clearly any IPs and Ports already allocated to the browser by the local system are already on the list).
Here is an example of this initial STUN request. In this case, you can see that indeed, a NAT device has changed the packet in transit, and the public NAT binding address is reported back in the response from the server.
To see STUN message details, click on a STUN packet->Session Traversal for NAT->Attributes
Note that a consequence of this simple STUN transaction, is that a public STUN server is a required piece of infrastructure needed for a WebRTC service to work optimally. Since this STUN transaction is fairly lightweight, the cost for this is not huge. There are a number of public STUN servers are available, operated by Google and others. As a web app asks the browser for a peerConnection, the web app developer is able to provision the peerConnection with STUN servers of choice. Example below:
Once the process of ICE candidate gathering is complete, it has list of IPs and Ports where could potentially receive session data. The browser adds this to the information to be signaled to the far end. In our example case, Bob’s web app will signal this to Alice’s web app (steps 4-6 above).
As Alice accepts the session, her browser now prepares it’s peerConnection by going through the same ICE candidate gathering process. It completes, and sends it’s list of candidates in the signaling answer back to Bob’s web app and browser. Now both browsers have multiple options to use to try to establish transport between themselves for session data.
Thanks to STUN, WebRTC can negotiate a list of potential options for transporting session data that take into account a knowledge of NAT devices in the topology, all without yet sending a single Real-Time Transport Packet (RTP) containing session data. STUN serves a dual purpose in this case, in that it also instantiates a binding in the NAT table of any NAT/Firewall devices in the path. This NAT pinhole can potentially be used for session data. While this is a great start, the browsers cannot just start sending session data. While a NAT situation may be discovered, what if there is still a Firewall blocking the packets? Which candidate will be the optimal one for transmission? Next the browser must establish which candidates work the best, and this is where WebRTC makes use of the ICE protocol. We’ll look at this next phase of WebRTC session setup, including TURN relay, in a subsequent post (I promise I won’t wait long).
{“author”, “reid“}
Barry Dingle says
Very well explained Reid. Thank you.
robert says
Thank you reid! It’s been great series of talks about ICE. Is there updated talk about TURN? I’m so long for it!
RAMESH SUNDARAM says
Thanks Reid. Awesome explanation. Waiting for more updates like this from you.
Reader says
Hey, can you please post the link to the next article. I couldn’t find it. Thanks
Chad Hart says
Unfortunately Reid never made that post and he is no longer involved in WebRTC. However, I do recommend reading Emil’s post on ICE always tastes better when it trickles! post as a follow-up to this piece as it provides a lot of good background on the NAT traversal problem and methods to solve it ending at how WebRTC does it.