So the New York times uses WebRTC to gather your local ip addresses… Tsahi describes the non-technical parts of the issue in his blog. Let’s look at the technical details… it turns out that the Javascript code used is very clunky and inefficient.
First thing to do is to check chrome://webrtc-internals (my favorite tool since the hangouts analysis). And indeed, nytimes.com is using the RTCPeerConnection API. We can see a peerconnection created with the RtpDataChannels argument set to true and using stun:ph.tagsrvcs.com as a STUN server.
Also, we see that a data channel is created, followed by calls to createOffer and setLocalDescription. That pattern is pretty common to gather IP addresses.
Using Chrome’s devtools search feature it is straightforward to find out that the RTCPeerConnection is created in the following Javascript file:
http://s.tagsrvcs.com/2/4.10.0/loaded.js
Since it’s minified here is the de-minified snippet that is gathering the IPs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
Mt = function() { function e() { this.addrsFound = { "0.0.0.0": 1 } } return e.prototype.grepSDP = function(e, t) { var n = this; if (e) { var o = []; e.split("\r\n").forEach(function(e) { if (0 == e.indexOf("a=candidate") || 0 == e.indexOf("candidate:")) { var t = e.split(" "), i = t[4], r = t[7]; ("host" === r || "srflx" === r) && (n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1)) } else if (0 == e.indexOf("c=")) { var t = e.split(" "), i = t[2]; n.addrsFound[i] || (o.push(i), n.addrsFound[i] = 1) } }), o.length > 0 && t.queue(new y("webRTC", o)) } }, e.prototype.run = function(e) { var t = this; if (c.wrip) { var n = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection; if (n) { var o = { optional: [{ RtpDataChannels: !0 }] }, i = []; - 1 == w.baseDomain.indexOf("update.") && i.push({ url: "stun:ph." + w.baseDomain }); var r = new n({ iceServers: i }, o); r.onicecandidate = function(n) { n.candidate && t.grepSDP(n.candidate.candidate, e) }, r.createDataChannel(""), r.createOffer(function(e) { r.setLocalDescription(e, function() {}, function() {}) }, function() {}); var a = 0, s = setInterval(function() { null != r.localDescription && t.grepSDP(r.localDescription.sdp, e), ++a > 15 && (clearInterval(s), r.close()) }, 200) } } }, e }(), |
Let’s look at the run function first. It is creating a peerconnection with the optional RtpDataChannels constraint set to true. No reason for that, it will just unnecessarily create candidates with an RTCP component in Chrome and is ignored in Firefox.
As mentioned earlier,
stun:ph.tagsrvcs.com
is used as STUN server. From Wireshark dumps it’s pretty easy to figure out that this is running the [coturn stun/turn server](https://code.google.com/p/coturn/); the SOFTWARE field in the binding response is set to
Coturn-4.4.2.k3 ‘Ardee West’.
The code hooks up the onicecandidate callback and inspects every candidate it gets. Then, a data channel is created and createOffer and setLocalDescription are called to start the candidate gathering process.
Additionally, in the following snippet
1 2 3 4 |
var a = 0, s = setInterval(function() { null != r.localDescription && t.grepSDP(r.localDescription.sdp, e), ++a > 15 && (clearInterval(s), r.close()) }, 200) |
the localDescription is searched for candidates every 200ms for three seconds. That polling is pretty unnecessary. Once candidate gathering is done, onicecandidate would have been called with the null candidate so polling is not required.
Lets look at the grepSDP function. It is called in two contexts, once in the onicecandidate callback with a single candidate, the other time with the complete SDP.
It splits the SDP or candidate into individual lines and then parses that line, extracting the candidate type at index 7 and the IP address at index 4.
Since without a relay server one will never get anything but host or srflx candidates, the check in following line is unnecessary. The rest of this line does eliminate duplicates however.
Oddly, the code also looks for an IP in the c= line which is completely unnecessary as this line will not contain new information. Also, looking for the candidate lines in the localDescription.sdp will not yield any new information as any candidate found in there will also be signalled in the onicecandidate callback (unless someone is using a 12+ months old version of Firefox).
Since the JS is minified it is rather hard to trace what actually happens with those IPs.
If you’re going to hack people, at least do it cleanly!
{“author”: “Philipp Hancke“}
Ben Klang says
I got curious about the hostname for that STUN server, ph.tagsrvcs.com. The WHOIS information shows it registered to a “White Ops, Inc.” in New York City. A quick search pulls up their home page. Their primary service? Digital advertising fraud mitigation. Seems like an interesting use of WebRTC, to fight ad fraud.
Philipp Hancke says
yeah, see the twitter discussion linked in tsahi’s post
Rena says
I would have guessed it’s for paywalls.
Jim says
How do I delete WebRTC from my browser? WebRTC sounds very dangerous. I do not want it in my web browser. I use Firefox.
– Jim
Dan Kaminsky says
Hi, I’m the security engineer responsible for this code. As co-founder of White Ops (http://whiteops.com) we’re doing something about the massive number of machines getting broken into to commit ad fraud. We ran an enormous study last year (http://whiteops.com/botfraud) and found 2/3rds of the global fraud wasn’t coming from server farms or Amazon; it was coming from home users.
Basically, they hack people’s machines so they can appear to view large numbers of advertisements. Of course, while they’re there…yeah.
This particular code found patterns that certain bot deployments had in common, using code that’s in multiple browsers by design, but in response to some concern we shut it down. As a rule we aviod personal information (partially because that’s the right thing to do, partially because it doesn’t help; the bots have all your cookies). We’re looking for various patterns in the bots themself.
If we can stop people from getting paid for botting, we can make the Internet safer. Apologies if this concerned anyone.
Dillon says
Dan,
While I appreciate your response, I think that it’s disingenuous for you to say that you’re “doing something about the massive number of machines getting broken into” – your motivation is to detect ad fraud and not really end-user security. I’m going on a limb here to assume that you are squarely focused on helping to avoid publishers avoid paying for automated clicks, and that if you framed it a little more truthfully, people would be less inclined to allow the sort of shenanigans you’re doing in their browsers.
As a decade-long user of AdBlock I simply don’t care about ad fraud, and CERTAINLY not enough to think it’s okay for you to try to enum my network. If you want to actually “do something about the massive number of machines getting broken into”, why not become a security researcher instead of designing code that no end-user really wants in their browser?
John says
Dan – you should jump on reddit:
https://www.reddit.com/r/netsec/comments/3dgwee/how_the_new_york_times_uses_webrtc_to_gather/
Chad Hart says
Note Google has modified Chrome and released an optional extension to limit IP leakage for those who want to do that: https://groups.google.com/forum/#!topic/discuss-webrtc/bMOsMFx7PFc
Victor Pascual says
WebRTC IP Address Handling Recommendations: https://tools.ietf.org/html/draft-shieh-rtcweb-ip-handling-00