So I talked about Skype and Viber at KrankyGeek two weeks ago. Watch the video on youtube or take a look at the slides. No “reports” or packet dumps to publish this time, mostly because it is very hard to draw conclusions from the results.
The VoIP services we have looked at so far which use the RTP protocol for transferring media. RTP uses a packet header which is not encrypted and contains a number of attributes such as the payload type (identifying the codec used), a synchronization source (which identifies the source of the stream), a sequence number and a timestamp. This allows routers to identify RTP packets and prioritize them. This also allows someone monitoring all network traffic (“Pervasive Monitoring“) to easily identify VoIP traffic. Or someone wiretapping your internet connection.
Skype and Viber encrypt all packets. Does that make them them less susceptible for this kind of attack?
Bear with me, the answer is going to be very technical. tl;dr:
- it is still pretty easy to determine that you are making a call.
- it is also pretty easy to tell if you muted your microphone.
- it is pretty apparent whether this is a videochat.
Skype
Not expecting to find much, I ran a standard set of scenarios with Skype of Android and iOS similar to those used in the Whatsapp analysis.
A first look did not show much. Luckily, when analyzing WhatsApp I had developed some tooling to deal with RTP. I modified those tools, removing the RTP parser, and was greeted with these graphs:
While the bitrate alone (blue is my ipad3 with a 172.16. ip address, black is my old Android phone) is not very interesting, the packet rate of exactly 50 packets was interesting. Also, the packet length distribution was similar to Opus. As I figured out later from the integrated debugging (on the Android device, this must be too technical for iOS users!), this was the Silk codec. In fact, if you account for some overhead the black distribution matches what we saw from WhatsApp earlier and what is now known to be Opus at 16khz or 8khz.
So the encryption did not change the traffic pattern. Nor does it hide the fact that a call is happening.
Keepalive Traffic
When muting the audio on one device, one can even see regular spikes in the traffic every then seconds. Supposedly, those are keepalive packets.
Similarly, when muting the microphone in the middle of the call, the drop in bandwidth is apparent.
Telling audio and video traffic apart
Let’s look at some video traffic. Note the two distinct distributions in the third graph? Let’s suppose that the left one is audio and everything else is video. This works well enough looking at the last graph which shows the ‘audio’ traffic in green and orange respectively.
The accuracy could possibly be improved a little by looking at the number of packets which is pretty much constant for audio.
In RTP, we would use the synchronization source (SSRC) field from the header to accomplish this. But that just makes things easier for routers.
Relay traffic
Last but not least relays. When testing this from Europe, I was surprised to see my traffic being routed through Redmond, Washington.
This is quite interesting in comparison to the first graph. The packet rate stays roughly the same, but the bitrate doubles to 100 kilobits/second. That is quite some overhead compared to the standard TURN protocol which has negligible overhead. The packet length distribution is shifted to the right and there are a couple of very large packets. Latency was probably higher but this is very hard to measure.
Viber
While I got some pretty interesting results from Skype, Viber turned out to harder. Thanks to the tooling it took now only a matter of seconds to discover that, like Whatsapp, it uses a relay server to help with call establishment:
Blue traffic is captured locally before it is sent to the peer, the black and green traffic is received from the remote end. The traffic shown in black almost vanishes after a couple of initial spikes (which contain very large packets at a low frequency). Visualizations of this kind are a lot easier to understand than the packet dumps captured with Wireshark.
And for the sake of completeness, muting audio on both sides showed keepalive traffic, visible as tiny period spikes in this graph:
Conclusions
VoIP security is hard. And this not really news, attacks on encrypted VoIP traffic have been known for quite a while, see e.g. this paper from 2008 and the more recent ‘Phonotactic Reconstruction’ attacks.
The fact that RTP does not encrypt the header data makes it slightly easier to identify, but it seems that a determined attacker could have come to the same conclusions about the encrypted traffic of services like Skype. Keep that in mind when talking about the security of your service. Also, keep the story of the ECB penguin in mind.
Or, as Emil Ivov said about the security of peer-to-peer: “Unless there is a cable going between your computer and the other guys computer and you can see the entire cable, then you’re probably in for a rude awakening”.
{“author”: “Philipp Hancke“}
Leave a Reply