• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
webrtcHacks

webrtcHacks

Guides and information for WebRTC developers

  • Home
  • About
    • Chad Hart
    • Philipp Hancke
  • Subscribe
  • Contact
  • Show Search
Hide Search

Guide getDisplayMedia, jitsi, mediaRecorder, Walkthrough Chad Hart · June 30, 2020

Using getDisplayMedia for local recording with audio on Jitsi

I wanted to add local recording to my own Jitsi Meet instance. The feature wasn’t built in the way I wanted, so I set out on a hack to build something simple. That lead me down the road to  discovering that:

  1. getDisplayMedia for screen capture has many quirks,
  2. mediaRecorder for media recording has some of its own unexpected limitations, and
  3. Adding your own HTML/JavaScript to Jitsi Meet is pretty simple

Read on for plenty of details and some reference code. My result is located in this repo.

jitsiRecorder


Editor’s note: see the comments section for some very relevant commentary and caveats from Jan-Ivar at Mozilla.


The Problem

I built a Jitsi Meet server a few months ago with the intention of updating my Build your own phone company with WebRTC and a weekend post. There are a billion posts/videos on how to set up Jitsi Meet, and I don’t have anything new or interesting to add to the technosphere there. However, one feature I really wanted to implement is recording. I often do demos and recording the session for others and future reference. On the surface, my requirements here are simple – record my audio and the audio and video of the Jitsi Meet session on demand and save the file locally. This sounds like a simple feature to add, but…

Jitsi Cloud Recording Challenges

Jitsi has a module called Jibri used for recording. The Jibri setup and configuration is more complicated to install than the base Jitsi Meet, but one can struggle through it in hours or less if you’re familiar with the underlying system. Jibri loads a headless browser that acts as a silent participant in the call, grabbing the audio and saving it to disk. This approach is fairly resource intensive, which forced me to upgrade my server from $5/mo to $20/mo. It also only handles a single recording at time. If you want to record multiple sessions you can set it up to launch multiple Docker containers, which starts getting complex. On top of that, you also need to build  a mechanism to transfer the files someplace after they are recorded or setup the Dropbox integration. I really just wanted a quick way to record a session and share it afterwards and this was all getting very complex.  Time to look for a simpler way.

Local Recording

Another approach is to just record locally. Local recording is more secure by nature as you are not leaving unencrypted media on a server somewhere. It is also less resource intensive since you are using your local computer to save media it is already receiving vs. adding a new element in the cloud.  Jitsi actually has an option for this, but that only includes the audio. I need to record whatever I am looking at on the screen too. So I set out on a hack to add local screen recording.

How to get the Media

The lazy way to do this is to hack together some audio sources for Quicktime recording or just use one of the WebRTC-based recording API’s or browser extensions, but that wouldn’t make for much of a post.  Nor is it something that will work universally for all users who join over the web. I had previously made an audio recorder that overloads the createPeerConnection  API, grabs all the tracks across multiple connections, and saves all the audio to file. This actually worked with Jitsi Meet’s multiple audio connections. I could have set up some canvas that would take the screen and save that too, but I found a much simpler way with getDisplayMedia.

getDisplayMedia review

The getDisplayMedia API for sharing Desktop media was introduced a while ago, which we covered here. The good news is all the major browsers implement getDisplayMedia. The bad news is these implementations are all different, and it could have an impact on the user experience. Let’s take a look.

Note: when I refer to Edge below, I am using the new Chromium-based Edge.

Test code

I thought this would be an easy API to evaluate, but I should know better. To help gather some data I wrote some code to help call getDisplayMedia and test parameters. You can see that on GitHub here or run it on my site here.

Screen Share Pickers

There are big differences in the recording picker options. Chrome and Edge allow choosing among any full display, application window, or browser tab. Firefox excludes the browser tab option. Safari has no picker and only lets you choose the current display.

You can see the UI differences below:

Chrome

Version: 84
Selection options: Display, Window, Tab
pasted image 0 7

Edge

Version: 84
Selection options: Display, Window, Tab
pasted image 0 10

Firefox

Version: 77
Selection options: Display, Window
pasted image 0 5

Safari

Version: 13.1
Selection options: Current display
pasted image 0 4
pasted image 0 3

Chrome and Edge display a blue highlight box around the inside of the window frame to indicate the tab is being shared.

displaySurface selection constraints are useless

The getDisplayMedia  API includes a displaySurface option for choosing between desktop display ,  window , application , or browser  tab. In my case I really only care about recording my Jitsi Meet tab. I wanted to see if I could simplify the user interface by limiting the section options. That seems like it should be easy to do with constraints.

However, the spec says:

> The user agent MUST let the end-user choose which display surface to share out of all available choices every time, and MUST NOT use constraints to limit that choice.

In fact, unlike getUserMedia :

> Constraints serve a different purpose in getDisplayMedia than they do in getUserMedia. They do not aid discovery, instead they are applied only after user-selection. (source)

This means in practice these constraints mean nothing. See the spec for more, but essentially the constraints aren’t allowed to do anything so there isn’t much point in using them. You also can’t enumerateDevices() on display surfaces or look for devicechange events either.

(As an aside: Perhaps these limitations are why Google Hangouts still uses its own extension mechanism instead of getDisplayMedia).

Note on Framerates

Like getUserMedia you can apply video resolution and frame rate constraints. The video resolution will just resize the video after capture. You can also reduce the frame rate if you want to reduce some cycles. In fact, you will see many WebRTC video conferencing services that send a screenshare reduce this to 10 or less depending on the content shared to minimize CPU usage.

User Gesture Requirements

Firefox and Safari require a user gesture, like a button click, before you can access getUserMedia . Chrome and Edge do not.

Try it yourself by pasting this into the JavaScript console:

JavaScript
1
navigator.mediaDevices.getDisplayMedia().then(console.log).catch(console.error);

iFrame Permissions

After trying to do my tests in codepen, I discovered there are restrictions on iFrames. Firefox and Safari won’t work in an iFrame without a special allow permission and these permissions are different.

Firefox requires