27 comments on “Breaking Point: WebRTC SFU Load Testing (Alex Gouaillard)

  1. Nice post with surprisingly low numbers of connections; Red5 Pro does at least double on regular hardware and on beefy servers we’re at almost 10x. Also you mention Wowza and Ant Media among others, without bias or marketing; not buying it, just because of the simple fact that you left us out and you know who we are.

  2. Dear paul,

    Thank you for your comment.

    As explained in the “what use case” section, the results presented here are for the Video Conferencing use case and not the streaming use case which will be presented separately at live streaming west in a month or so.

    For a video use case, 500 streams is a lot, and none of the previous study went that far. For streaming, it is quite a few. The main reason being that in the video conferencing use case the number of streams going through the Server is quadratic to the number of incoming streams. Everyone received everyone else stream from the server and send one up. The number of streams served by the server is n*(n-1). In streaming, if you have 500 viewers, you send 500 streams, in Video Conference, if you had 500 individual in a single room, you would have 250,000 streams. In their previous post, jitsi showed that they can saturate a 1,000 streams server with only around 30 individuals (30*30 gets you close to 1,000). In our study, we limit the configurations to room of 7 individual to be only quadratic by segments: the formula giving you the number of streams in the server based on audience is then:
    – number of full rooms: n/7
    – remaining rooms: n%7 (always smaller than 7)
    – each full room of 7 generates 7*6 = 42 streams
    – total load on server: n/7*42+n%7
    simplified: 6n.
    So in our case we have almost 3,000 streams on the server with 500 viewer in video conferencing mode with rooms of 7.

    As we indicated in our previous contact with your team, we would be happy to add red5 to the result pool of the streaming use case.

    We usually requests that the team set themselves their server, to avoid mistakes we could do and bias to your results. We share results with the teams, and provide feedback and bug fix. Most of the team that have participate in this study got a better product out of it.

    Let me know what we can do to involve you.

    • Alex, I appreciate the response and I will concede on one detail which is conference (many-to-many) vs streaming (one-to-many). There is certainly a difference in overhead between the separation of pubs/subs and conference participants.

  3. Thx for the comparison @agouaillard, nice job!

    With the criteria being used wouldn’t the simplest SFU (no SVC, no BWE, RTCP, NACK, VAD…) win because of the lower overhead of the processing being done?

    Could that explain why the most featured (jitsi?) have worse results?

    • Thanks gustavo.

      The main goal of this study was to show that *comparative studies* are possible. A single test bed, all SFU run under the same condition, with the same use case, and so on and so forth. The goal was not to rank the SFU per say since the use case, and the question asked are by nature arbitrary. Same, Wether jitsi is the most feature complete or not really depends on what you want to do with it. We wanted to provide an environment where people could ask their own questions and get answers, by themselves. Since KITE separates the test infrastructure, the grid management, the tests and the reporting, we can now reuse all the setting of this study with a different test to ask a different question. Progress.

      I don’t think that jitsi had worse results per say. The variations are big, and the differences are within the tolerance margin, for the metric we reported. Note that No server side metrics like RAM and CPU were reported in the article.

      Putting several months of work in a 8 pages article requires compromises. We had to remove from the section VI, future work, our mention of simulcast and SVC. For this test, we switched OFF simulcast for all the SFUs that supported it, because we could not find a way to assess properly the video quality with a mix of simulcast and non simulcast streams. Also our testing environment was not stressing the feature that allow to evaluate simulcast or SVC well, opening the door for bias and misinterpretation.

      We are working with callstats.io, in the scope of the VERIFY project, in fully instrumenting the network layer, which will allow to, well, verify that the values reported by GetStats() are correct for callstats, but also assess the ramping time, Bandwidth estimations, Simulcast and SVC layer switching mechanisms, and so on and so forth automatically.

      At that stage, we will be in ideal situation to test simulcast and SVC implementation choices in different media servers. Think about the recent jitsi vs Zoom comparison made by the jitsi team, but fully automated and between all the possible media servers / infrastructure. We’re almost done, and are trying really hard to generate results in time for IETF in Bangkok, nov. 2 to 10.

      • I fully understand the test you did is really hard to do and really appreciate that effort and I enjoyed reading this post.

        But at the same time I think it is important to be very clear about what is measured and what is not. A SFU without SVC, BWE and retransmissions is a very bad SFU for videoconferencing in real network conditions and in this specific (and useful) study could get the highest score. Am I right?

        I think if we want to fully assess SFUs for videoconferencing use cases we should include these two aspects:
        1/ Environment: Test should be conveyed in non ideal network conditions. The audio quality is not the same with all the SFUs under packet loss and the framerate is not the same with all the SFUs under constrained bandwidth. Under ideal network conditions the one with just the best sockets&threads implementation could win.
        2/ Metrics: We should include audio metrics (buffer sizes, PLC occurrences…) and video metrics (buffering, framerate, lipsync). Maybe callstats.io or rtcstats could help here like you say, although I don’t think it is easy to generate scores aggregating all those aspects (I tried in the past). Maybe testrtc guys can help with it.

        Regarding “Wether jitsi is the most feature complete or not really depends on what you want to do with it. ” I would at least mention explicitly in the report that there are very relevant features that are present in some SFUs and not others. For example SFUs don’t have active speaker detection and can be much more critical than supporting 100 users more or less in a box.

        Looking forward for all that future work you plan to do. Great job!

  4. Pingback: 断点:WebRTC SFU负载测试(一) | WebRTC中文网-最权威的RTC实时通信平台

  5. Pingback: WebRTC Janus – CPU intensive – Johann Savalle

  6. We designed the study as follow:
    – we wanted more than 3 SFUs to make the comparison valuable,
    – we wanted very active SFUs that would be able to fix the bugs if we founded any,
    – we wanted SFUs with peer-reviewed published benchmarks and evaluations,
    – we wanted SFUs with their own testing tools in place so we can compare individual SFUs results.

    Meedoze, Jitsi, Kurento and Janus all met those targets. In terms of own testing tools, meedoze was already using KITE, Jitsi had jitsi-hammer and a great benchmarking post, kurento had the Kurento-Testing-Framework, which had evolved into the ElasTest European union project with a lot of published results, and Janus had Jattack.

    When presenting early results at CommConUK in april 2018, the main maintainer of media soup volunteered to participate, so we added there results for media soup in a second version of both the paper and the blog post, before the camera-ready versions. Other teams we reached out to either refused to participate, or did not reply in time for the publication. We have not reached out to the licode team.

    While we have not added licode at this stage, we want to keep the results updated on the cosmo website, and if the Licode team wants to participate, or any team with a media server that supports webrtc for that matter, they are more than welcome, just contact us through cosmosoftware.io

  7. Pingback: Happy new year 2019 – so many good things #WebRTC to come from CoSMo. | WebRTC by Dr Alex

  8. Pingback: 断点:WebRTC SFU负载测试(一) – WebRTC中文网-最权威的RTC实时通信平台

  9. Pingback: WebRTC 1.0 Simulcast vs ABR | WebRTC by Dr Alex

  10. Dear Alex,
    Thanks for your contribution of WebRTC SFU Load Testing.
    ” After a libnice patch was applied as advised by the Janus team, their results significantly improved”.
    but, in this article,what is the libnice patch? and how to improve the results?

    • Sorry for the late reply. There was a libnice patch, provided by RingCentral engineers, to address the fact that there was a single lock for all incoming packets, artificially creating a bottle neck. AFAIK this has been merged both in libnice upstream and in Janus as we speak. The results shown in our paper already integrated this optimisation.

    • I m not sure what you mean by “at end”. At the time of testing, Kurento Media Server crashed at 150 streams and that was pretty much it. At teh time of writing this answer, the team would have addressed most of the problem and wrote a dedicated blog post. We have not run the test with their new version. Here is the link to the blog post for further reading:

      • Thank you, becouse we use kurento for our streaming app, very small test server 2vcpu, 2gb ram $15 on digitalocean. there 3 rtp 500kbit streams of video h264(total 1.5mbit). it’s eat 54% of this computer. it’s ok?
        I plan to have source of 300 channels each 500kbit, and on other side will be like 1000 web client recipients (webrtc) that recieve one of the channel each.(like TV)

        Tasks: 107 total, 1 running, 64 sleeping, 0 stopped, 0 zombie
        %Cpu0 : 42.5 us, 0.0 sy, 0.0 ni, 57.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
        %Cpu1 : 17.1 us, 0.3 sy, 0.0 ni, 82.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
        KiB Mem : 2041272 total, 162712 free, 810852 used, 1067708 buff/cache
        KiB Swap: 0 total, 0 free, 0 used. 1061232 avail Mem

        10724 kurento 20 0 1856628 385952 18716 S 54.6 18.9 5649:36 kurento-media-s
        29766 kurento 20 0 3166748 325372 28380 S 0.3 15.9 15:45.15 java
        30758 root 20 0 44544 3988 3368 R 0.3 0.2 0:00.34 top
        1 root 20 0 159992 8676 6136 S 0.0 0.4 0:30.35 systemd
        2 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kthreadd
        4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H

  11. Am I reading it wrong or the load test numbers of Jitsi and Janus are swapped on the original paper?

    Jitsi Janus Medooze OpenVidu
    Number of rooms 70 35 70 20
    Number of client VMs 490 245 490 140

    The paper says Jitsi crashed at 245 users.

    • This is correct, the revision of jitsi we tested was crashing at exactly 245 users. It was not the use case for which jitsi had been designed, and it was alter fixed anyway. Thanks for catching this.

  12. Pingback: 转折点——WebRTC SFU负载测试(一) | WebRTC中文网-最权威的RTC实时通信平台

  13. Pingback: 转折点:WebRTC SFU负载测试(二) | WebRTC中文网-最权威的RTC实时通信平台

  14. I can’t see any graphs or numbers for CPU and footprint.
    Is there any benchmarks that can show how many CPU and RAM is required for various SFUs on different number of participants?
    (Full paper and summary slides from IIT Real-Time Communications Conference are not available anymore)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.