One evening last week, I was nerd-sniped by a question Max Ogden asked:
That is quite an interesting question. I somewhat dislike using Session Description Protocol (SDP) in the signaling protocol anyway and prefer nice JSON objects for the API and ugly XML blobs on the wire to the ugly SDP blobs used by the WebRTC API.
The question is really about the minimum amount of information that needs to be exchanged for a WebRTC connection to succeed.
WebRTC uses ICE and DTLS to establish a secure connection between peers. This mandates two constraints:
- Both sides of the connection need to send stuff to each other
- You need at minimum to exchange ice-ufrag, ice-pwd, DTLS fingerprints and candidate information
Now the stock SDP that WebRTC uses (explained here) is a rather big blob of text, more than 1500 characters for an audio-video offer not even considering the ICE candidates yet.
Do we really need all this? It turns out that you can establish a P2P connection with just a little more than 100 characters sent in each direction. The minimal-webrtc repository shows you how to do that. I had to use quite a number of tricks to make this work, it’s a real hack.
How I did it
Get some SDP
First, we want to establish a datachannel connection. Once we have this, we can potentially use it negotiate a second audio/video peerconnection without being constrained in the size of the offer or the answer. Also, the SDP for the data channel is a lot smaller to start with since the is no codec negotiation. Here is how to get that SDP:
1 2 3 4 5 6 7 8 9 10 11 |
var pc = new webkitRTCPeerConnection(null); var dc = pc.createDataChannel('webrtchacks'); pc.createOffer( function (offer) { pc.setLocalDescription(offer); console.log(offer.sdp); }, function (err) { console.error(err); } ); |
The resulting SDP is slightly more than 400 bytes. Now we need also some candidates included, so we wait for the end-of-candidates event:
1 2 3 |
pc.onicecandidate = function (event) { if (!event.candidate) console.log(pc.localDescription.sdp); }; |
The result is even longer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
v=0 o=- 4596489990601351948 2 IN IP4 127.0.0.1 s=- t=0 0 a=msid-semantic: WMS m=application 47299 DTLS/SCTP 5000 c=IN IP4 192.168.20.129 a=candidate:1966762134 1 udp 2122260223 192.168.20.129 47299 typ host generation 0 a=candidate:211962667 1 udp 2122194687 10.0.3.1 40864 typ host generation 0 a=candidate:1002017894 1 tcp 1518280447 192.168.20.129 0 typ host tcptype active generation 0 a=candidate:1109506011 1 tcp 1518214911 10.0.3.1 0 typ host tcptype active generation 0 a=ice-ufrag:1/MvHwjAyVf27aLu a=ice-pwd:3dBU7cFOBl120v33cynDvN1E a=ice-options:google-ice a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24 a=setup:actpass a=mid:data a=sctpmap:5000 webrtc-datachannel 1024 |
Only take what you need
We are only interested in a few bits of information here:
- the ice-ufrag: 1/MvHwjAyVf27aLu
- the ice-pwd: 3dBU7cFOBl120v33cynDvN1E
- the sha-256 DTLS fingerprint: 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24
- the ICE candidates
The ice-ufrag is 16 characters due to randomness security requirements from RFC 5245. While it is possible to reduce that, it’s probably not worth the effort. The same applies to the 24 characters of the ice-pwd. Both are random so there is not much to gain from compressing them even.
The DTLS fingerprint is a representation of the 256 bytes of the sha-256 hash. It’s length can easily be reduced from 95 characters to almost optimal (assuming we want to be binary-safe) 44 characters:
1 2 3 4 5 6 |
var line = "a=fingerprint:sha-256 75:74:5A:A6:A4:E5:52:F4:A7:67:4C:01:C7:EE:91:3F:21:3D:A2:E3:53:7B:6F:30:86:F2:30:AA:65:FB:04:24"; var hex = line.substr(22).split(':').map(function (h) { return parseInt(h, 16); }); console.log(btoa(String.fromCharCode.apply(String, hex))); // yields dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ= |
So we have So we’re at 84 characters now. We can hardcode everything else in the application.
Dealing with candidates
Let’s look at the candidates. Wait, we got only host candidates. This is not going to work unless people are on the same network. STUN does not help much either since it only works in approximately 80% of all cases.
So we need candidates that were gathered from a TURN server. This can be achieved by setting the iceTransportPolicy to ‘relay’ which will not even gather host and srflx candidates.
If you use the minimal-webrtc demo you need to use your own TURN credentials, the ones in the repository will no longer work since they’re using the time-based credential scheme. Here is what happened on my machine was that two candidates were gathered:
1 2 3 4 |
a=candidate:1211076970 1 udp 41885439 104.130.198.83 47751 typ relay raddr 0.0.0.0 rport 0 generation 0 a=candidate:1211076970 1 udp 41819903 104.130.198.83 38132 typ relay raddr 0.0.0.0 rport 0 generation 0 |
I believe this is a bug in chrome which gathers a relay candidate for an interface which is not routable, so I filed an issue.
Lets look at the first candidate using the grammar defined in RFC 5245:
- the foundation is 1211076970
- the component is 1. Another reason for using the datachannel, there are no RTCP candidates
- the transport is UDP
- the priority is 41885439
- the IP address is 104.130.198.83 (the ip of the TURN server I used)
- the port is 47751
- the typ is relay
- the raddr and rport are set to 0.0.0.0 and 0 respectively in order to avoid information leaks when iceTransports is set to relay
- the generation is 0. This is a Jingle extension of vanilla ICE that allows detecting ice restarts
If we were to simply append both candidates to the 84 bytes we already have we would end up with 290 bytes. But we don’t need most of the information in there.
The most interesting information is the IP and port. For IPv4, that is 32bits for the IP and 16 bits for the port. We can encode that using btoa again which yields 7 + 4 characters per candidate. Actually, if both candidates share the same IP, we can skip encoding it again, reducing the size.
After consulting RFC 5245 it turned out that the foundation and priority can actually be skipped, even though that requires some effort. And everything else can be easily hard-coded in the application.
sdp.length = 106
Let’s summarize what we have so far:
- the ice-ufrag: 16 characters
- the ice-pwd: 22 characters
- the sha-256 DTLS fingerprint: 44 characters
- the ip and port: 11 characters for the first candidate, 4 characters for subsequent candidates from the same ip.
Now we also want to encode whether this is an offer or an answer. Let’s use uppercase O and A respectively. Next, we concatenate this and separate the fields with a ‘,’ character. While that is less efficient than a binary encoding or one that relies on fixed field lengths, it is flexible. The result is a string like:
1 2 3 |
O,1/MvHwjAyVf27aLu,3dBU7cFOBl120v33cynDvN1E, dXRapqTlUvSnZ0wBx+6RPyE9ouNTe28whvIwqmX7BCQ=, 1k85hij,1ek7,157k |
106 characters! So that is tweetable. Yay!
You better be fast
Now, if you try this it turns out it does not usually work unless you are fast enough pasting stuff.
ICE is short for Interactive Connectivity Establishment. If you are not fast enough in transferring the answer and starting ICE at the Offerer, it will fail. You have less than 30 seconds between creating the answer at the Answerer and setting it at the Offerer. That’s pretty tough for humans doing copy-paste. And it will not work via twitter.
What happens is that the Answerer is trying to perform connectivity checks as explained in RFC 5245. But those never reach the Offerer since we are using a TURN server. The TURN server does not allow traffic from the Answerer to be relayed to the Offerer before the Offerer creates a TURN permission for the candidate, which it can only do once the Offerer receives the answer. Even if we could ignore permissions, the Offerer can not form the STUN username without the Answerer’s ice-ufrag and ice-pwd. And if the Offerer does not reply to the connectivity checks by Answerer, the Answerer will conclude that ICE has failed.
So what was the point of this?
Now… it is pretty hard to come up with a use-case for this. It fits into an SMS. But sending your peer an URL where you both connect using a third-party signaling server is a lot more viable most of the time. Especially given that to achieve this, I had to make some tough design decisions like forcing a TURN server and taking some shortcuts with the ICE candidates which are not really safe. Also, this cannot use trickle ice.
¯\_(ツ)_/¯
So is this just a case study in arcane signaling protocols? Probably. But hey, I can now use IRC as a signaling protocol for WebRTC. IRC has a limit of 512 characters so one can include more candidates and information even. CTCP WEBRTC anyone?
{“author”: “Philipp Hancke“}
Torrey Searle says
When I first saw the title I thought it said “smallest valid SDP”, was thus a bit disappointed when I saw the answer 😛
Look forward to the first “Twitter DM as Signaling” webrtc solution! 🙂
Wessel Wessels says
This is very cool. I made a fork that uses only STUN. Haven’t tested it yet on anything but local network. http://wesselwessels.github.io/minisdp