12 comments on “Does your video call have End-to-End Encryption? Probably not..

  1. Ah jeeze. Another reactionary article. Would have been more original if published before this zoom fiasco. Abysmal.

    • webrtcHacks has been publishing details on WebRTC and other RTC services since 2013, including our Blackbox series that attempts to reverse engineer how many popular RTC services work. We are happy a broader audience is taking some of the details of RTC technology more seriously and encourage readers to look through our archives.

    • we’re not making recommendations 🙂

      Jitsi is using a well-known and understood SFU architecture and has been covered here many times on various aspects. They do lack e2ee because they do terminate encryption but that is something Emil Ivov is well aware of. You can always run your own server and audit the code in theory. Its a matter of delegating trust.

      Signal has a very competent lead (moxie) and recently added a very experienced WebRTC expert to the team. While I haven’t met moxie I trust he knows his domain.

  2. In the Zoom blog post you link to in your April 2nd update, they claim “in a meeting where all of the participants are using Zoom clients, and the meeting is not being recorded, we … do not decrypt it at any point before it reaches the receiving clients”. But people are saying that they do decrypt streams in order to scale their quality to each client’s available bandwidth. See, for example, https://twitter.com/maxhawkins/status/1248139006887342080 . This would make sense of Zoom clients coping with more incoming video streams than, say, Jitsi Meet clients.

    So which is true? Was Zoom still spreading falsehoods when they clarified their non-use of end-to-end encryption? Or have they found some way to downscale video without decrypting it? Or are large Zoom video meetings only successful when everyone’s computers and connections are capable of handling many incoming high-definition video streams simultaneously?

    • There are two architecture approaches:
      – mesh where every client sends to every other client; that works up to four clients, maybe up to eight with a lot of tuning) and
      – SFU where a server redistributes. This typically involves simulcast (https://webrtchacks.com/sfu-simulcast/). SFUs terminate DTLS encryption (which is hop-by-hop) and so far do have access to the media. There are way to avoid this by encrypting the media with a key that is not known to the SFU which is what we did in https://jitsi.org/blog/e2ee/ after getting access to a new chrome api.

      Zoom can do the same and they fully control both client and server. What they do in reality…

      • Thanks for your reply.

        As for what Zoom actually does, they claim they can “Bring HD video and audio to your meetings with support for up to 1000 video participants and 49 videos on screen”; which makes me wonder how many laptops on wifi connections would be able to handle putting 49 incoming HD video streams on screen in real time.

        So if that Zoom claim isn’t itself misleading (and from what I’ve heard, Zoom meetings actually do cope with many participants, though I haven’t seen it in action myself), then that makes it seem much more likely that (at least in large meetings) they are in fact decrypting the incoming video streams to process them in some way before sending them to the clients.

        The remaining possibility is that they might have some way of downscaling the video streams without decrypting them. Despite what I wrote in a comment awaiting moderation on Matthew Green’s blog ( https://blog.cryptographyengineering.com/2020/04/03/does-zoom-use-end-to-end-encryption/?unapproved=4240&moderation-hash=f55e0727d50a730e93eb40022e208eec#comment-4240 ), I no longer think it would require efficient fully homomorphic encryption to achieve this; you might be able to do it with a carefully designed compression scheme (think of something like FLIF) that would allow the SFU to simply drop specific parts of an encrypted video stream (for low-bandwidth recipients) without preventing it from being decoded into a lower-quality video stream.

        Also, it seems to me that the question of whether Zoom decrypts video streams to downscale them in large meetings could be answered by some packet inspection of the kind Citizen Lab did for their April 3 report on Zoom’s encryption: Someone could check whether the vast majority of the encrypted data received by a client in a large meeting is identical to encrypted data sent by other clients.

        As for Jitsi Meet, I think I’d find it easier to persuade my friends to use my Jitsi Meet server, rather than Zoom, if it did do downscaling at the server, thus making it less resource-intensive for clients. Having said that, I get excited by the prospect of end-to-end-encrypted group chats, too, and I recognize that there are limited resources available for Jitsi Meet development.

        • Well, the trick here is that since you can’t display 49 times 720 (1280×720 pixels usually) on a normal screen any the clients never receive that so many streams in that quality.
          With simulcast each participants sends (unless muted) three resolutions, 720p, 640×360 and 320×180 (zoom might do more or less or different ratios) to the SFU.

          From other participants the client knows in which size they are displayed and can ask the SFU to only forward only a resolution matching that size.

          What you receive therefore depends a lot on the layout. For a “large presenter with thumbnail” you’ll get 720p plus a couple of 320×180, for the “brady bunch you might got for 25 times 640×360. https://jitsi.org/blog/new-feature-brady-bunch-style-layout/ explains this a bit.

          There are more optimizations like dropping the framerate for thumbnail. Same goes for doing speaker detection (easy when people explicitly mute but possible with just audio levels; the jitsi folks have an adaption of an algorithm that works with just the levels)

          The way SFUs are designed makes these routing decisions largely on the RTP header and only needs a single byte of the payload at most. See https://jitsi.org/blog/e2ee/ for a nice demo. The way we designed it there was even leaving the SFU unaware that the content is encrypted with a second layer.

          • Interesting. I hadn’t thought of getting the clients to send multiple streams. I guess I’ve been influenced by my experience of Jitsi Meet maxing out my decade-old laptop’s CPU in any meetings with more than a handful of participants, even in low bandwidth mode (so that when I’m speaking, the other participants report that it sounds as if my microphone is intermittently cutting out) — my inclination was to try to get the client to do as little as possible.

            But the way Jitsi Meet does it is probably suitable for most people’s machines (at least in richer places), even if it does require the client to do a little more work than downscaling at the server would.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.