Search Results

Search Results for: Janus

Thanks to work initiated by Google Project Zero, fuzzing has become a popular topic within WebRTC since late last year.  It was clear WebRTC was lacking in this area. However, the community has shown its strength by giving this topic an immense amount of focus and resolving many issues.  In a previous post, we showed how to break the Janus Server RTCP parser. The Meetecho team behind Janus did not take that lightly. They got to the bottom of what turned out to be quite a big project. In this post Alessandro Toppi of Meetecho will walk us through how they fixed this problem and built an automated process to help make sure it doesn’t happen again. ...  Continue reading

Pion seemingly came out of nowhere to become one of the biggest and most active WebRTC communities. Pion is a Go-based set of WebRTC projects. Golang is an interesting language, but it is not among the most popular programming languages out there, so what is so special about Pion? Why are there so many developers involved in this project? 

To learn more about this project and how it came to be among the most active WebRTC organizations, I interviewed its founder – Sean Dubois. We discuss Sean’s background and how be got started in RTC, so see the interview for his background.  I really wanted to understand why he decided to build a new WebRTC project and why he continues to spend so much of his free time on it. ...  Continue reading

Fuzzing is a Quality Assurance and security testing technique that provides unexpected, often random data to a program input to try to break it. Natalie Silvanovich from Google’s Project Zero team has had quite some fun fuzzing various different RTP implementations recently.

She found vulnerabilities in:

In a nutshell, she found a bunch of vulnerabilities just by throwing unexpected input at parsers. The range of applications which were vulnerable to this shows that the WebRTC/VoIP community does not yet have a process for doing this work ourselves. Meanwhile, the WebRTC folks at Google will have to improve their processes as well.

Let’s make it a new years resolution to get better at this stuff in 2019, ok?

Natalie’s Project Zero WebRTC fuzzing was done using the video_replay tool in a mostly end-to-end way. This is necessary but we will start a bit lower-level, using libfuzzer and walk you through an example of actual code taken from the Janus server, the janus_rtcp_get_remb  function. This function is quite self-contained and turned out to contain a number of nasty issues which makes it a great example.

Fuzzing REMB RTCP messages

Receiver Estimated Max Bitrate (REMB) is a certain kind of RTCP packet that is commonly used by WebRTC for coordinating packing send rates. The packet format is as follows, taken from the REMB draft:

RTCP is an interesting target for fuzzing because it does contain so much indirect control information.  I was actually quite surprised the WebRTC code did not contain any issues with RTCP parsing. The RTCP parser is being fuzzed though.

Fuzzing Walkthrough

Setup

First, in order to fuzz a target in isolation we need to copy the function and some of its header dependencies into a separate file.

Then you need to setup the fuzzing tools and have clang installed on your machine. You can grab a copy of the fuzzit repository here and checkout out the individual versions later.

Next, we need to write a fuzz target as described in the libfuzzer tutorial. It is a simple function that looks like this:

This function will be called with an array of bytes of a specified length many times. It is going to call our actual target, janus_rtcp_get_remb .

Check out the initial version and compile this with address sanitizer and fuzzer enabled:

Test the fuzzer

You should have a binary called jrtcp  now. Run it with a maximum length of 1500 which is the maximum practical length of a UDP packet.  For the sake of reproducibility we are providing a random seed 123456 — if this is not chosen the whole process is a bit more randomized and you might get different crashes.

If you need help on the usage, add -help=1 .

Running the fuzzer will output something like this:

Examining issues

libfuzzer will provide you with an example of the input data as a file:

If you do a hexdump of that file you will see the actual packet:

So a single-byte input crashes the function. We can run the fuzz target with the crash sample as input to reproduce:

Adjusting for the real RTCP header sizes

The problem is the cast to an RTCP header which has a size of 4 bytes. And the input is only a single byte long. Arguably that is a bit of an artificial bug since in production the input will have to be at least two bytes long in order to run through the DTLS/STUN/RTP demultiplexing process. The right place to enforce this kind of “external” behavior is to return early in the fuzz target.

Anyway… we can fix that by rejecting any input that is less than the sizeof janus_rtcp_header. Check out the next version or fix it yourself.
Run the fuzzer again with the same input to verify the crash is fixed.

Compile again, run it again with no arguments. It still crashes…

More fixes

As you can see, the fuzzer is smart enough to have figured out that we reject all inputs that don’t pass the “RTCP version must be 2” check so the first byte passes that.

This is a compound RTCP packet. Kind of… its length is given as 40 32bit-words or 320 bytes. However, the input is only 165 bytes long, leading to an out-of-bounds error when trying to cast the next janus_rtcp_header . Fixing this is pretty easy, or so it seems:

Fixing this and repeating the above jrtcp  call, the fuzzer runs through and everything and seems fine.

Now, compile and run the fuzzer again without any argument. It still crashes… this time during the second iteration of the RTCP loop.
It turns out we need to be a bit more careful and actually take into account that our starting position is not zero.
Let’s keep track of this using an offset…

see the github repository for the next version ...  Continue reading

If you plan to have multiple participants in your WebRTC calls then you will probably end up using a Selective Forwarding Unit (SFU).  Capacity planning for SFU’s can be difficult – there are estimates to be made for where they should be placed, how much bandwidth they will consume, and what kind of servers you need.

To help network architects and WebRTC engineers make some of these decisions, webrtcHacks contributor Dr. Alex Gouaillard and his team at CoSMo Software put together a load test suite to measure load vs. video quality. They published their results for all of the major open source WebRTC SFU’s. This suite based is the Karoshi Interoperability Testing Engine (KITE) Google funded and uses on webrtc.org to show interoperability status. The CoSMo team also developed a machine learning based video quality assessment framework optimized for real time communications scenarios.

First an important word of caution – asking what kind of SFU is the best is kind of like asking what car is best. If you only want fast then you should get a Formula 1 car but that won’t help you take the kids to school. Vendors never get excited about these kinds of tests because it boils down their functionality into just a few performance metrics. These metrics may not have been a major part of their design criterion and a lot of times they just aren’t that important. For WebRTC SFU’s in particular, just because you can load a lot of streams on an SFU, there may be many resiliency, user behavior, and cost optimization reasons for not doing that. Load also tests don’t take a deep look at the end-to-end user experience, ease of development, or all the other functional elements that go into a successful service. Lastly, a published report like this represents a single point in time – these systems are always improving so result might be better today.

That being said, I personally have had many cases where I wish I had this kind of data when building out cost models. Alex and his team have done a lot of thorough work here and this is great sign for maturity in the WebRTC open source ecosystem. I personally reached out to each of the SFU development teams mentioned here to ensure they were each represented fairly. This test setup is certainly not perfect, but I do think it will be a useful reference for the community.

Please read on for Alex’s test setup and analysis summary.

{“editor”: “chad hart“}

Introduction

One recurring question on the discuss-webrtc mailing list is “What is the best SFU”. This invariably produces a response of “Mine obviously” from the various SFU vendors and teams. Obviously, they cannot all be right at the same time!

You can check the full thread here. Chad Hart, then with Dialogic answered kindly recognizing the problem and expressed a need:

In any case, I think we need a global (same applied to all) reproducible and unbiased (source code available, and every vendor can tune their installation if they want) benchmark, for several scalability metrics.

Three years later my team and I have built such a benchmark system. I will explain how this system works and show some of our initial results below.

The Problem

Several SFU vendors provide load testing tools. Janus has Jattack. Jitsi has jitsi-hammer and even published some of their results. Jitsi in particular has done a great job with transparency and provides reliable data and enough information to reproduce the results. However, not all vendors have these tools and fewer still make them fully publicly available.  In addition, each tool is designed to answer slightly different questions for their own environments such as:

  • How many streams can a single server instance of chosen type and given bandwidth limit handle?
  • How many users can I support on the same instance?
  • How many users can I support in a single conference?
  • Etc.…

There was just no way to make a real comparative study – one that is independent reproducible, and unbiased. The inherent ambiguity also opened the door for some unsavory behavior from some who realized they could get away with any claim because no one could actually check them. We wanted to produce some results that one does not have to take on faith and that could be peer-reviewed.

What use cases?

To have a good answer to “What is the best SFU?” you need to explain what you are planning to use it for.

We chose to work on the two use cases that seemed to gather the most attention, or at least those which were generating the most traffic on discuss-webrtc:

  1. Video conferencing – many to many, all equals, one participant speaking at a time (hopefully) ,
  2. Media streaming – one-to many, unidirectional

Most video conferencing questions are focused on single server instance. Having 20+ people in a given conference is usually plenty for most. Studies like this one show that in most social cases most of the calls are 1-1, and the average is around 3. , This configuration fits very well a single small instance in any public cloud provider (as long as you get a 1Gbps NIC). You can then use very simple load balancing and horizontal scalability techniques since the ratio of senders to viewers is rarely high. Media streaming, on the other hand, typically involves streaming from a single source to thousands or tens of thousands of viewers. This requires a multi-server hierarchy.

We wanted to accommodate different testing scenarios and implement them in the same fashion across several WebRTC Servers so that the only difference is the system being tested, and the results are not biased.

For purposes of this post I will focus on the video conferencing scenario. For those that are interested, we are finalizing our media streaming test results and plan to present them  at Streaming Media West on November 14th.

The test suite

In collaboration with Google and many others, we developed KITE, a testing engine that would allow us to support all kinds of clients – browsers and native across mobile or desktop – and all kind of test scenarios easily. It is used to test WebRTC implementation everyday across browsers as seen on webrtc.org

Selecting a test client

Load testing is typically done with a single client to control for client impacts. Ideally you can run many instances of the test client in parallel in a single virtual machine (VM). Since this is WebRTC, it makes sense to use one of the browsers. Edge and Safari are limited to a single process, which does not make they very suitable. Additionally, Safari only runs MacOS or iOS, which only runs on Apple hardware. It is relatively easy to spawn a million VMs on AWS if you’re running Windows or Linux. It’s quite a bit more difficult, and costly, to setup one million Macs, iPhones, or iPads for testing (Note, I am still dreaming about this though).

That leaves you with Chrome or Firefox which allow multiple instances just fine. It is our opinion that the implementation of webdriver for Chrome is easier to manage with fewer flags and plugins (i.e. H264) to handle, so we chose to use Chrome.

Systems Under Test

We tested the following SFUs:

To help make sure each SFU showed its best results, we contacted the teams behind each of these projects. We offered to let them set up the server themselves or connect to the servers and check-up their settings. We also shared the results so they could comment. That made sure we properly configured each system to handle optimally for our test.

Interestingly enough, during the life of this study we found quite a few bugs and worked with the teams to improve their solutions. This is discussed more in detail in the last section.

Test Setup

We used the following methodology to increase traffic to a high load. First we populated each video conference rooms with one user at a time until it reached 7 total users. We repeated this process until the total target number of users was reached.  close to 500 simultaneous users.

The diagram below shows the elements in the testbed:

Metrics

Most people interested in scalability questions will measure the CPU, RAM, and bandwidth footprints of the server as the “load” (streams, users, rooms…) ramps up. That is a traditional way of doing things that supposes that the quality of the streams, their bitrate… all stay equal.

WebRTC’s encoding engine makes this much more complex. WebRTC includes bandwidth estimation, bitrate adaptation and overall congestion control mechanism, one cannot assume streams will remain unmodified across the experiment. In addition to  the usual metrics, the tester also needs to record client-side metrics like sent bitrate, bandwidth estimation results and latency. It is also important to keep an eye on the video quality, as it can degrade way before you saturate the CPU, RAM and/or bandwidth of the server.

On the client side, we ended up measuring the following:

  • Rate of success and failures (frozen video, or no video)
  • Sender and receiver bitrates
  • Latency
  • Video quality (more on that in the next section)

Measuring different metrics on the server side can be as easy as pooling the getStats API yourself or integrating a solution like callstats.io. In our case, we measured:

  • CPU footprint,
  • RAM footprint,
  • Ingress and egress bandwidth in and out,
  • number of streams,
  • along with a few of other less relevant metrics.

The metrics above were not published in the Scientific article because of space limitation, but should be released in a subsequent Research Report.

All of these metrics are simple to produce and measure with the exception of video quality. What is an objective measure of video quality? Several proxies for video quality exist such as Google rendering time, received frames, bandwidth usage, but none of these gave an accurate measure.

Video quality metric

Ideally a video quality metric would be visually obvious when impairments are present.  This would allow one to measure the relative benefits of resilient techniques, such as like Scalable Video Coding (SVC), where conceptually the output video has a looser correlation with jitter, packet loss, etc. than other encoding methods. See the below video from Agora for a good example of a visual comparison:

https://www.youtube.com/watch?v=M71uov3OMfk

After doing some quick research on a way to automate this kind of visual quality measurement, we realized that nobody had developed a method to assess the video quality as well as a human would in the absence of reference media with a  real-time stream. So, we went on to develop our own metric leveraging Machine Learning with neural networks. This allowed for real-time, on-the-fly video quality assessment. As an added benefit, it can be used without recording customer media, which is a sometimes a legal or privacy issue.

The specifics of this mechanism is beyond the scope of this article but you can read more about the video quality algorithm here. The specifics of this AI-based algorithm have been submitted for publication and will be made public as soon as it is accepted.

Show me the money results

We set up the following five open-source WebRTC SFUs, using the latest source code downloaded from their respective public GitHub repositories (except for Kurento/OpenVidu, for which the Docker container was used):

Each was setup in a separate but identical Virtual Machine and with default configuration.

Disclaimers

First a few disclaimers. All teams have seen and commented on the result of their SFUs.

The Kurento Media Server team is aware that their server is currently crashing early and we are working with them to address this. On Kurento/OpenVidu, we tested max 140 streams (since it crashes so early).

In addition, there is a known bug in libnice, which affected both Kurento/OpenVidu and Janus during our initial tests.  After a libnice patch was applied as advised by the Janus team, their results significantly improved.  However, the re-test with the patch on Kurento/OpenVidu actually proved even worse. Our conclusion was that there are other issues with Kurento. We are in contact with them and working on fixes so, the Kurento/OpenVidu results might improve soon.

The latest version of Jitsi Videobridge (up to the point of this publication) always became unstable at exactly 240 users. The Jitsi team is aware of that and working on the problem. They have however pointed out that their general advice is to rely on horizontal scaling with a larger number of smaller instances as described here. Note that a previous version (as two months ago) did not have these stability issues but did not perform as well (see more on this in the next section). We chose to keep version 0.1.1077 as it included made simulcast much better and improved the results significantly (up to 240 participants, that is). ...  Continue reading

Slack is an über popular and fast growing communications tool that has a ton of integrations with various WebRTC services. Slack acquired a WebRTC company a year ago and launched its own audio conferencing service earlier this year which we analyzed here and here. Earlier this week they launched video. Does this work the same? Are there any tricks we can learn from their implementation? Long time WebRTC expert and webrtcHacks guest author Gustavo Garica takes a deeper dive into Slack’s new video conferencing feature below to see what’s going on under the hood.

{“editor”, “chad hart“}

Early this year Slack added support for audio calls using WebRTC technology. Soon after that launch Philipp Hancke wrote this blog post analyzing it.  Yoshimasa Iwase followed soon after with even more detail.

This week Slack announced video support and generated some excitement in the WebRTC community again. Today some of us saw it enabled for the first time, so what was the first thing we did? We set up a meeting with our team to do a sprint planning using this new feature of course.  To peek inside of Slack’s WebRTC workings, we made some quick calls and looked at the SDPs and other stats available in the awesome webrtc-internals in Chrome.

No TCP or IPv6

The first thing you see with webrtc-internals is that they are still using TURN UDP and disabling IPv6.:

Ultimately Slack is aiming to be a a corporate communication tool. “Enterprise” often means  customers that block any “suspicious” UDP traffic. Given this, one would expect to see support for TURN TCP and TLS, but surprisingly this isn’t the case. The same is true for IPv6 support –  maybe there is a problem in Janus or Slack signaling stack to support it but probably something easy to change in future versions.

Media Server Platform

The next thing look at is the SDP coming from the server.

As expected,  we see that Slack is still using the nice open source SFU called Janus from MeetEcho (see Lorenzo from MeetEcho talk about gateways here).

Simulcast

One of the interesting things when talking about multiparty WebRTC these days is how do you implement bandwidth adaptation for different participants?  In the SFU world there is some agreement on simulcast being the right way to proceed. Simulcast has been available for some years, but standardization and support in WebRTC is not complete.  As a result, there are many services were simulcast is still not used.   It is good news that Slack is using it and more and more people is starting to use it apart from Google Hangouts and TokBox.

Looking more closely at the Slack SDP (see below), you can see simulcast is being used by looking at the SIM group in the offer and x-google-flag:conference  in the answer.  

One of the benefits of enabling simulcast is that automatically enables temporal scalability for further granularity.   I did a quick check of the framerate received under different network conditions and apparently Slack is not yet making use of this functionality. However, there has been interest for this feature from Slack employees in the WebRTC mailing lists so we should review it more before confirming it.

Another interesting point is always to check if they are using multistream peer connections.  In case of Slack (and many other services/platforms) they are using a new RTCPeerConnection for each sender and for each receiver.  This is slightly inefficient because of some overhead and extra establishment time. However, it is way easier to implement, so it is a very popular choice these days particularly because

there is no way to do multistream PeerConnection in a single, cross-browser way ...  Continue reading

Earlier this month Fippo published a post analyzing Slack’s new WebRTC implementation. He did not have direct access or a team account to do a thorough deep dive – not to mention he is supposed to be taking some off this month. That left many with some open questions? Is there more to the TURN network? How does multi-party calling work? How exactly is Slack using the Janus gateway? Fortunately WebRTC has an awesomely active and capable community that quickly picked up the slack (pun intended).

Last week Yoshimasa Iwase published a great post giving slack a deeper dive.  Unfortunately Google translate did not give the Japanese translation justice, so I asked him if we could provide a translation with some added details here. Iwase-san has been deeply involved with WebRTC for several years at NTT Communications and as editor at HTML5Experts.jp (which is another great resource you should check out.) He has also helped organize WebRTC events in Japan and if you’ve seen him speak you won’t question he knows his stuff.

Check out his in depth analysis below.

{“editor”: “chad hart“}

Introduction

Inspired by the by Philipp Hancke’s (Fippo) Slack article, I started to analyze how Slack uses WebRTC more deeply. I really wanted to understand Slack’s WebRTC architecture to see if it could help inform other WebRTC engineers on how to build their own service or system. Slack has over 2 million active users at the time of writing, so seeing how they built their service should provide useful insight into how to build your own scalable WebRTC service.

Analysis method

As Philipp Hancke describes a in the Blackbox Exploration series of analysis, there are some useful tools to analyze WebRTC service. I chose the same tools:

  • Chrome webrtc-internals to see information like SDP and candidates and dump logs – Slack only supports Chrome so I couldn’t check about:webrtc in Firefox
  • JavaScript – their files are minified but we can check some functions like “RTCPeerConnection” anyway
  • Wireshark capture

Below is the result of my analysis.

Slack doesn’t use P2P

It’s common to use P2P topology in 1 to 1 communication since this optimizes user experience and minimizes the number of servers you need to maintain. Despite this fact Slack forces all stream to use TURN server. In addition to TURN they use janus as an endpoint.

Here is the toplogy picture of their usage:

 

Even though you are going to Slack WebRTC chat with someone sitting next to you on the same LAN, the communication path will still follow the above topology. Their TURN server is deployed on AWS.  As far as I tested, there’s no TURN server in Tokyo region and my WebRTC client (Chrome) is connected to TURN server in Singapore region, which causes unnecessary latency. I’ll describe their TURN deployment in more detail next.

How do they force us to user TURN?

The answer is found in SDP. Let’s check  the SDP Offer at first and then the Answer.

Offer from Janus

From *1, Apparently Slack uses Janus as a server side WebRTC endpoint. Janus is WebRTC Gateway and can behaves as MCU, SFU, and other modes with its plugin architecture. (See Lorenzo Miniero’s post for more on WebRTC Gateways)

*2 shows “RTP/SAVPF” is still used. In fact Lorenzo  fixed this into “UDP/TLS/RTP/SAVPF” immediately after Fippo’s post. After Slack updates their Janus, we should see the SDP updated here.

From *3, Slack sends us some ICE candidates in vanilla ICE style, which is in common in MCU or SFU usage since candidates are obvious to MCU or SFU. The key point here is that the IP address are all private ones. The means we can’t connect each other in P2P topology.

How do we connect these Private IP

The answer is found in the ICE candidates received. If you check chrome://webrtc-internals you will find host/srflx/relay as usual. (srflx should be absent when your machine has global IP address)

“host” and “srflx” are meaningless because these address can’t connect *3’s IPs. “relay” – a.k.a. the TURN address – is the most important ICE candidate and here is the concrete result:

Obviously you can find the global IP address, “52.77.208.161”. This is the TURN server’s address Slack deployed in their system. Reverse looking up this global IP shows …

TURN server are deployed on AWS. Because AWS EC2 instance both global IP and private IP in VPC, TURN server should have private IP. This private IP can connect to *3 private IP.

Slack’s TURN server

Speaking of TURN server, Slack doesn’t use the active coturn project but the former RFC5766 TURN SERVER. This result found at Wireshark dump:

The TURN server’s version is 3.2.3.96 and actually this is old version. The latest version of coturn server is ”3.2.5.9” according to GitHub.

In addition to version here are some information about Slack’s TURN server:

  • URN of the TURN server is   turn:slack-calls9.slack-core.com:22466  and it seems that there are about 23 TURN servers deployed around the world. You can check this by lookup from slack-calls1.slack-core.com  to slack-calls23.slack-core.com
  • I was able to find TURN servers deployed at least in Singapore, Ireland, and Northern California
  • TURN authentication information, user name and password, are dynamically obtained from JavaScript

Why do they force users to TURN server?

There might be some reasons:

To reduce call set up time like FaceTime and

WhatsApp ...  Continue reading

 

Dear Slack,

There has been quite some buzz this week about you and WebRTC.

WebRTC… kind of. Because actually you only do stuff in Chrome and your native apps:

I’ve been there. Launching stuff only for Chrome. That was is late 2012. In 2016, you need to have a very good excuse to launch something with WebRTC and not support Firefox like this:
 

Maybe you had your reasons. As usual, I tried to get a dump from chrome://webrtc-internals to see what is going on. Thanks to Dag-Inge Aas for providing one. The most interesting bit is the call to setRemoteDescription:

I would like to note that you reply to Chrome’s offer of UDP/TLS/RTP/SAVPF with a profile of RTP/SAVPF. While that is still tolerated by browsers, it is improper.
Your a=msid-semantic line looks very interesting. “WMS janus”. Sounds familiar, this is meetecho’s janus gateway (see Lorenzo’s post on gateways here). Which by the way works fine with Firefox from what I hear.

Using Janus is somewhat surprising, too. I expected your own MCU thing. What happened?

Since you did not configure ICE-TCP and you are running on an odd port 12000 (on private ip addresses), you also configured a TURN server:

Now… this is only using TURN over UDP. On an odd port 22466 that is about as likely to be open as port 12000 is. It will not work in most corporate environments where UDP is blocked. For those, you need TURN/TCP or TURN/TLS running on port 443. And hopefully there is no proxy server involved that requires authentication.

But Slack, do you really need to run a user-facing test to figure this out? This doesn’t look like the 21st century to me.

Best

Philipp
p.s.: Google Hangouts is moving away from sessions decrypted by a server in the middle. Maybe so should you?

{“author”: “Philipp Hancke“}

{“editor”: “Chad Hart“}

 

 

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.