Guide

There is a cool new feature everyone has been trying to implement – background transparency. Virtual backgrounds have been around for a while. Rather than inserting a new background behind user(s), transparency removes the background altogether, allowing the app to place users over a screen share or together in a shared environment. There doesn’t seem to be a universal name for this feature. Zoom calls it Immersive View. Microsoft calls it Together Mode. RingCentral calls them overlays. Virtual green screens or newscaster mode are other names. ...  Continue reading

local Jitsi recording hack with getDisplayMedia audio capture and mediaRecorder

I wanted to add local recording to my own Jitsi Meet instance. The feature wasn’t built in the way I wanted, so I set out on a hack to build something simple. That lead me down the road to  discovering that:

  1. getDisplayMedia for screen capture has many quirks,
  2. mediaRecorder for media recording has some of its own unexpected limitations, and
  3. Adding your own HTML/JavaScript to Jitsi Meet is pretty simple

Read on for plenty of details and some reference code. My result is located in this repo.

 

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

A couple of weeks ago, the Chrome team announced an interesting Intent to Experiment on the blink-dev list about an API to do some custom processing on top of WebRTC. The intent comes with an explainer document written by Harald Alvestrand which shows the basic API usage. As I mentioned in my last post, this is the sort of thing that maybe able to help add End-to-End Encryption (e2ee) in middlebox scenarios to WebRTC.

I had been watching the implementation progress with quite some interest when former webrtcHacks guest author Emil Ivov of jitsi.org reached out to discuss collaborating on exploring what this API is capable of. Specifically, we wanted to see if WebRTC Insertable Streams could solve the problem of end-to-end encryption for middlebox devices outside of the user’s control like Selective Forwarding Units (SFUs) used for media routing.

The good news it looks like it can! Read below for details.

Before we get into the project, we should first recap how media encryption works with media server devices like SFU’s.

Media Encryption in WebRTC

WebRTC mandates encryption. It uses DTLS-SRTP for encrypting the media. DTLS-SRTP works by using a DTLS handshake to derive keys for encrypting the payload of the RTP packets. It is authenticated by comparing the a=fingerprint lines in the SDP that are exchanged via the signaling server with the fingerprints of the self-signed certificates used in the handshake. This can be called end-to-end encryption since the negotiated keys do not leave the local device and the browser does not have access to them. However, without authentication it is still vulnerable to man-in-the-middle attacks.

See our post about the mandatory use of DTLS for more background information on encryption and how WebRTC landed where it is today.

SFU Challenges

The predominant architecture for multiparty is a Selective Forwarding Unit (SFU). SFUs are basically packet routers that forward a single or small set of streams from one user to many other users. The basics are explained in this post.

In terms of encryption, DTLS-SRTP negotiation happens between each peer endpoint and the SFU. This means that the SFU has access to the unencrypted payload and could listen in. This is necessary for features like server-side recording. On the downside, it means you need to trust the entity running the SFU and/or the client code to keep that stream private. Zero trust is always best for privacy.

Unlike a more traditional VoIP Multipoint Control Unit (MCU) which decodes and mixes media, a SFU only routes packets. It does not care much about the content (apart from a number of bytes in the header and whether a frame is a keyframe). So theoretically the SFU should not need to decode and decrypt anything. SFU developers have been quite aware of that problem since the early days of WebRTC. Similarly, Google’s webrtc.org library has included a “frame encryption” approach for a while which was likely added for Google Duo but doesn’t exist in the browser. However, the “right” API to solve this problem universally only happened now with WebRTC Insertable Streams.

Make it Work

Our initial game plan looked like the following:

  1. Make it work

End-to-End Encryption Sample

Fortunately making it work was a bit easier since Harald Alvestrand had been working on a sample which simplified our job considerably. The approach taken in the sample is a very nice demonstration:

  1. opening two connections,
  2. applying the (intentionally weak, xor-ing the content with the key) encryption on both but
  3. only decryption on one of them.

You can test the sample on here.  Make sure you run the latest Chrome Canary (84.0.4112.0 or later) and that the experimental Web Platform Features flag is on.

The API is quite easy to use. A simple logging transform function looks like this:

The transform function is then called for every video frame. This includes an encoded frame object (named chunk) and a controller object. The controller object provides a way to pass the modified frame to the next step. In our case this is defined by the pipeTo  call above which is the packetizer.

Iterating improvements

With a fully working sample (video-only at first because audio was not yet implemented), we iterated quickly on some improvements such as key changes and not encrypting the frame header. The latter turned out to be very interesting visually. Initially, upon receiving the encrypted frame, the decoder of the virtual “middlebox” would just throw an error and the picture would be frozen. Exploiting some properties of the VP8 codec and not encrypting the first couple of bytes now tricks the decoder into thinking that frame is valid VP8. Which looks … interesting:

Give the sample a try yourself here.

Insertable Streams iterates on frames, not packets

The Insertable Streams API operates between the encoder/decoder and the packetizer that splits the frames into RTP packets. While it is not useful for

inserting your own encoder with WebAssembly ...  Continue reading

WebRTC has made getting and sending real time video streams (mostly) easy. The next step is doing something with them, and machine learning lets us have some fun with those streams. Last month I showed how to run Computer Vision (CV) locally in the browser. As I mentioned there, local is nice, but sometimes more performance is needed so you need to run your Machine Learning inference on a remote server. In this post I’ll review how to run OpenCV models server-side with hardware acceleration on Intel chipsets using Intel’s open source Open WebRTC Toolkit (OWT).

Note: Intel sponsored this post. I have been wanting to play around with the OWT server since they demoed some of its CV capabilities at Kranky Geek and this gave me a chance to work with their development team to explore its capabilities. Below I share some background on OWT, how to install it  locally for quick testing, and show some of the models.

 

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

Don’t touch your face! To prevent the spread of disease, health bodies recommend not touching your face with unwashed hands. This is easier said than done if you are sitting in front of a computer for hours.  I wondered, is this a problem that can be solved with a browser?

We have a number of computer vision + WebRTC experiments here. Experimenting with running computer vision locally in the browser using TensorFlow.js has been on my bucket list and this seemed like a good opportunity. A quick search revealed somebody already thought of this 2 week ago. That site used a model that requires some user training – which is interesting but can make it flaky. It also wasn’t open source for others to expand on, so I did some social distancing via coding isolation over the weekend to see what was possible.

Check it out at facetouchmonitor.com and keep reading below for how it works. All the code is available at github.com/webrtchacks/facetouchmonitor. I share some highlights and an alternative approach here.

 

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

When most people think of WebRTC they think of video communications. Similarly, home surveillance is usually associated with video streaming. That’s why I was surprised to hear about a home security project that leverages WebRTC not for video streaming, but for the DataChannel. WebRTC’s DataChannel might not demo as well as a video call, but as you will see, it is a very convenient way to setup peer-to-peer information transfer. ...  Continue reading

WebRTC has a new browser – kind of. Yesterday Microsoft’s  “new” Edge browser based on Chromium – commonly referred to Edgium – went GA. This certainly will make life easier for WebRTC developers since the previous Edge had many differences from other implementations. The big question is how different is Edgium from Chrome for WebRTC usage?

The short answer is there is no real difference, but you can read below for background details on the tests I ran. If you’re new around WebRTC the rundown may give you some ideas for testing your own product.

Background

Edge is a big deal because Windows 10 is a big deal – according to StatCounter, Windows has a 34% OS market share overall and 78% of the Desktop browser market. Windows 10 is 65% of all Windows deployments, so Edge is the default install browser that could potentially be used by more than 22% of pageviews and more than 50% of all Desktop pageviews. Now potential use is different from actual use and the original Edge just didn’t catch on in a big way, which is why Microsoft announced it would switch to using Chromium as its web engine.

Unlike the old Edge which was limited to Windows 10 and Android, the new Chromium based Edge also available on OS X and iOS Android. Edgium follows Chromium’s Canary, Dev, Beta, and GA build process and the pre-GA releases have been available for a while. Microsoft started updating “legacy” Edge users to the new version automatically yesterday.

WebRTC Support

At last year’s Kranky Geek, Greg Whitworth, Edge Product Manager, basically said the initial goal for WebRTC was to keep parity with Chrome with some added support for perfect negotiation, screen capture improvements (lots of people share their Office documents), and bug fixes.

You should check out that video here (Greg’s section is during the first 5 minutes or so):

He also talks a bit about roadmap and what’ll be next.

Testing to see What’s Different

Methodology

I ran through a battery of tests, similar to what I did for Safari, to see what’s different about Edgium and Chrome. Most of my testing was done earlier this week on my OSX using Edge Beta 80 and Edge Canary 81. I also ran a smaller number of checks on Edge 10 on Windows 10 just to check for consistency.

Unless otherwise noted, I used the official WebRTC samples for various tests.

test.webrtc.org

test.webrtc.org is s simple way to check for WebRTC compatibility. Just load it up in Chrome and Edge and click run.

Results: nothing to see here – Chrome and Edge were both identical.

getUserMedia visualizations

There was no doubt the getUserMedia API would be supported, but I wondered if Microsoft choose to change the display or permissions behavior.

Whenever a camera or microphone source is active, you get a red notification dot to the left:

Chrome’s notification is a grey circle to the right that flashes briefly before going solid.

Within the URL bar, here are the Camera and Microphone in-use symbols:

The microphone icon appears when both the camera and microphone are in use. The Camera icon only appears if only the camera is being used.

Results: same concept as Chrome with slight visual differences

Media permissions UI

The permissions UI also looks the same:

Controlling individual site access to your camera and microphone is also the same – you can jump there quick by going to edge://settings/content/camera or edge://settings/content/microphone

I also checked to make sure there wasn’t any differences with secure origins for accessing user media – i.e. you must use HTTPS or a localhost if you want to use getUserMedia . This was exactly the same.

Results: Edge is same as Chrome

Screenshare / getDisplayMedia

Microsoft mentioned screen share improvements, I also wondering if that user flow would look any different?

The screen / app / tab picker is the same as Chrome:

Like the media in use indicator, the tab notification icon is on the left instead of the right with Chrome. Also like Chrome, if you share a tab it gives you a tab sharing indicator. Unlike Chrome, it does not blink for a few seconds when first initiated.

And if you are sharing a tab you also get the same tab sharing notification warning:

The window and application sharing notice is also the same as Chrome:

Screenshare Performance improvements

Greg also mentioned ...  Continue reading