WebRTC Plumbing with GStreamer

GStreamer is one of the oldest and most established libraries for handling media. As a core media handling element in Linux and WebKit that as launched near the turn of the century, it is not surprising that many early WebRTC projects use various pieces of it. Today, GStreamer has expanded options for helping developers plumb their WebRTC stack by pipelining various elements together. In addition, GStreamer now has many new options for end-to-end WebRTC calling including WHIP support.

We have never covered GStreamer directly here on webrtcHacks, so I asked Matthew Waters, the lead author of much of the recent GStreamer WebRTC support if he would contribute a post here. Happily he obliged. Matthew is very active in the GStreamer community and a consultant at GStramer consulting shop Centricular,

Matthew includes:

See below for the full post with many reference links.

{“editor”, “chad hart“}

A (Quick) Introduction to GStreamer

Some of the readers may know about GStreamer and have used it before. For the uninitiated, GStreamer is a graph-based multimedia framework where components can be plugged together to perform a multimedia task. GStreamer is used in a wide variety of industries including but not limited to space gadgets, security cameras, medical devices, conference calls, gravitational wave analysis, radio streaming, non-linear video editing, etc. GStreamer supports many different network transports such as RT(S)P, SRT, RIST, HTTP(S), etc and WebRTC is one of the latest in the growing list. Readers looking for a general introduction to GStreamer can see resources such as the GStreamer documentation or this LinuxConfAU 2018 talk by a seasoned GStreamer developer.

GStreamer has many plugins and elements that can transform data in many interesting ways. The project ecosystem also includes various out-of-tree plugins and elements for different purposes. Some of these elements are tied to specific hardware encoders or decoders. Some elements provide functionality in domains outside of strict multimedia-related processes, such as gravitational wave analysis. This flexibility allows combining all these building blocks (what GStreamer calls elements) into a pipeline to perform the required task.

Block diagram showing High-level GStreamer architecture — High-level GStreamer architecture. Source: gstreamer.freedesktop.org

In the context of WebRTC, this flexibility allows the source of a WebRTC connection to be anything that GStreamer already supports (or that can be developed) such as local cameras, an RT(S)P/SRT/RIST/RTMP/HTTP network source, recording or reading files, HLS streaming, other WebRTC calls, etc. The reverse is also true. A received WebRTC stream can be forwarded to many different locations. Combined with support for many hardware encoding and decoding elements, GStreamer can take advantage of most aspects of the hardware where it is running.

block diagram showing a pipeline of GStreamer elements — Example GStreamer pipeline converting a file source to an audio and video sink. Source: gstreamer.freedesktop.org

GStreamer’s History with WebRTC

Let’s start our journey into the GStreamer world of WebRTC with a brief introduction to what existed before GStreamer gained native support for speaking WebRTC. The prior-art. There were a few interesting projects in the early days of WebRTC when we started to look at WebRTC – OpenWebRTC, Kurento, and Farstream.

OpenWebRTC

OpenWebRTC was an Ericsson Research project that aimed to provide a mobile-friendly SDK for WebRTC applications before libwebrtc had made some progress in that area. Some of the functionality currently available in GStreamer’s WebRTC implementation started as part of OpenWebRTC. Today OpenWebRTC is unmaintained and points to the native GStreamer WebRTC support.

Kurento

Kurento was limited to the functionality of an MCU at the time and required re-encoding streams. Its aim was and still is, to provide complete functionality for a WebRTC SFU/MCU server. As such, its implementation of WebRTC was very high-level and reasonably tightly integrated into the Kurento architecture. At the time Kurento’s signaling infrastructure was a required component. These days Kurento is in maintenance-only mode and has been superseded by OpenVidu which I do not have any experience with.

Editor Note

I checked with the OpenVidu team – they verified that Kurento is indeed dependent on GStreamer for routing, transcoding, and processing media, but that there are no major developments planned for the project. OpenVidu is a higher-level API and, as such, supports other media servers – mostly mediasoup.

{“editor”, “chad hart“}

Farstream

Farstream is another older project that uses GStreamer. That project aimed to provide the necessary infrastructure for audio/video conferencing applications. Some of the GStreamer elements it helped support were the original RTP elements and libnice ICE GStreamer elements.

Making GStreamer like libwebrtc

In this environment, I was asked to implement a GStreamer element wrapping the functionality of libwebrtc for use in a GStreamer pipeline. This was reasonably successful, however, it had many limitations. Some of the most problematic ones were:

libwebrtc does not guarantee a stable API
Support for application-provided video decoders and encoders was almost non-existent.
The build process is very complicated relying on Google-specific tools.

It is in this light that I decided to investigate the possibility of combining the relevant components already available in GStreamer for building a WebRTC implementation.

The Initial Implementation: webrtcbin

WebRTC is available in GStreamer via the webrtcbin plugin. We started the process of building webrtcbin with research on the required functionality for a minimal implementation. One strong motivator for this effort was the ability to use whichever encoder or decoders the user wants along with the flexibility to have any kind of source media. This means that webrtcbin is mostly concerned with the transport and network side of things rather than on the details of how data is encoded. Quickly, the following list of needed components (and their respective implementations) was produced:

ICE – libnice: a GObject library for initiating ICE connections
RTP – GStreamer’s existing battle-tested RTP stack
SRTP – libsrtp2 along with some associated GStreamer elements implemented within OpenWebRTC
DTLS – some GStreamer elements around OpenSSL written by some OpenWebRTC developers

Signaling

In addition to the above, one must consider how signaling is handled. We needed a way to handle signaling between webrtcbin elements, between webrtcbin and browsers, and from browsers to webrtcbin. We decided to mimic the API used by a web application as outlined in JSEP: Javascript Session Establishment Protocol, now RFC 8829. There are still some parts missing in the implementation, but we do aim to eventually support the entirety of JSEP.

You can see the open “JSEP” issues here or all of the open “WebRTC” issues here.

Media

On the media side, webrtcbin could ingest one of these media types:

raw audio/video frames
encoded audio/video – OPUS/H.264/VP8/AV1/etc
RTP payload packets (“payloaded” in GStreamer terminology)

With the requirement for the user to use whatever encoder they choose, webrtcbin takes the lowest reasonable data format as input – individual RTP payloaded packets. This allows the greatest flexibility on how the data that flows through webrtcbin is encoded. webrtcbin instead handles everything from there including RTP sessions, DTLS connection, SRTP en/decryption, and ICE connection.

Thanks to the use of GStreamer’s already existing RTP stack, some RTP features were already supported out of the box. Things like RTCP and the AVPF RTCP profile were already implemented from day one. Support for PLIs, FIRs, and NACKs were also implemented from day one.

After some development, an initial version of webrtcbin was released to the public at the annual GStreamer conference in October of 2017. A video recording of the public announcement is available from https://gstconf.ubicast.tv/videos/gstreamer-webrtc/. In early 2018, webrtcbin was merged and would become part of the GStreamer 1.14 release.

Closing the Feature Gap

After the initial release of webrtcbin, GStreamer had a very minimal implementation of WebRTC compared to libwebrtc’s functionality. To close the feature gap, we reimplemented much of what is in libwebrtc. To speed this along, we used many elements from the OpenWebRTC project.

Some features were implemented very quickly after the initial implementation as they already had some existing GStreamer elements. Forward Error Correction (FEC) is implemented using ULPFEC for video and RED for audio as is used by libwebrtc. libwebrtc’s version of ULPFEC differs slightly from the RFC version in a subtle but significant way. Retransmissions (RTX) (RFC 4588) are based on some previously existing GStreamer elements from other scenarios that use RTP. Some other quick-to-add features that hooked things up already supported by the lower level components include TURN server support, adding some properties that allowed following the WebRTC specification which was changing at the time.

Here is a quick rundown of some other core features:

Data channels: Integrated elements from OpenWebRTC for SCTP and implemented DCEP negotiation (RFC 8832: Data Channel Establishment Protocol).
RTP Bundling: GStreamer’s RTP implementation supported receiving bundled RTP streams. Added configuration for webrtcbin to group multiple RTP streams over a single socket for both sending and receiving.
Stream renegotiation: Initial support for adding/removing RTP streams was added in 2019, with some limitations such as ‘removed’ streams being marked inactive without transceiver/m-line removal or reuse.
DSCP: Added Differentiated Services Code Point support to mark streams with differing priorities for intermediate network devices supporting DSCP, improving performance during network congestion.
Port selection: Exposed libnice’s minimum and maximum RTP port range to control local port allocation for sending/receiving data, aiding applications behind network firewalls.
Transport Wide Congestion Control (TWCC): Implemented RTP packet header extension for outgoing packets and TWCC feedback for received packets to monitor bandwidth usage. Responsibility for configuring the encoder lies with the application or a parent element like webrtcsink. (Editor note: see our last post for more on congestion control)
Simulcast (send-only or receive-only): Added RTP-level simulcast support for sending/receiving different qualities of the same stream. Involves adding RTP header extensions (MID, RID) during RTP payloading.
Compatibility: Efforts to support remote peers with varying SDP formats, including different data channel formats, non-bundle peers, ICE-lite implementations, ICE candidates in SDP, and MSID support for browser peers to allow grouping of streams.
Statistics: Similar interface to the Web’s PeerConnection API for connection statistics except for encode/decode and packetization which doesn’t apply to webrtcbin. We have continuously improved the quality and quantity of reported statistics relevant to webrtcbin’s scope.

Additional Integrations

While webrtcbin is very powerful and supports many scenarios, the implementation curve is rather steep when writing a fully-fledged application. For example, congestion control is a feature that must be implemented for the specific encoders the application is using. RTX/FEC settings also have compatibility issues with some browsers, TWCC may also behave differently with different browsers. All of these things mean that additional work is needed for every webrtcbin application.

To alleviate some of this implementation cost, GStreamer has added helper elements for some limited scenarios – webrtcsink and webrtcsrc.

webrtcsink

The first helper GStreamer element built on top of webrtcbin was webrtcsink. webrtcsink is designed as a “batteries included” WebRTC send-only element. It implements several additional capabilities:

Selecting relevant encoders for a particular set of codecs through negotiation with the peer
Congestion control using TWCC information and a reimplementation of the GCC (Google Congestion Control) algorithm (rtpgccbwe element)
Custom signaling implementation for easy use on the command line.

Multiple consumers of this stream can have their own independent encodings of the source stream and can have that stream tailored to their particular network conditions. For example, assuming the sending peer’s network is not the issue, a peer on a poor network will not affect the quality of a peer on a good network.

webrtcsink also supports ingesting pre-encoded streams (H.264, H.265, VP8, etc). However, it cannot perform the relevant congestion control for you. For congestion control to work with pre-encoded streams, the application must be involved to configure the encoder’s bitrate accordingly.

We are also considering some preliminary ideas for implementing support for bitrate ladders. This is needed where a stream may be encoded a fixed number of times at different resolutions and bitrates to limit the number of encodes for a particular stream.

The default set of codecs attempted are VP8, VP9, H.264 and H265. The application can also limit the codec choices

Also, note AV1 support has very recently been merged in an upstream merge request.

Finally, there is a custom signaler implementation that handles transferring the SDP and ICE candidates produced by webrtcbin to the peer. This signaller can be overridden if the application needs to communicate with a different signaling infrastructure. Integrating with different WebRTC-related services usually involves writing a signaling implementation for webrtcsink or webrtcsrc. There are also some signaling implementations upstream for external services like AWS KVS, LiveKit, WHIP, etc (more on those below).

Running this from the command line can be as simple as:

gst-launch-1.0 -v \
  videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1 ! \
  webrtcsink stun-server=stun://stun.l.google.com:19302 signaller::uri=ws://<your-signaling-server>:8443

gst-launch-1.0 -v \

videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1 ! \

webrtcsink stun-server=stun://stun.l.google.com:19302 signaller::uri=ws://<your-signaling-server>:8443

The README for the plugin is the best resource for providing an overview of the available functionality.

webrtcsrc

The receive-only helper element built upon webrtcbin is called webrtcsrc. webrtcsrc supports the same custom signaling as webrtcsink and it is easily possible to have a single webrtcsink serve multiple webrtcsrc consumers.

webrtcsrc supports internally decoding the received stream to raw audio/video but can also output the encoded (VP8, VP9, H.264, etc) data as-is without any decoding.

Optionally, it is also possible to enable a custom data channel protocol for translating navigation events on the receiver toward the sender. This can allow for remote rendering of a particular scene on a server that is controlled from the receiver. webrtcsink supports the same data channel protocol.

On the command line, this is:

gst-launch-1.0 -v \
  webrtcsrc stun-server=stun://stun.l.google.com:19302 signaler::uri=ws://<your-signaling-server>:8443 ! \
  queue ! videoconvert ! autovideosink

gst-launch-1.0 -v \

webrtcsrc stun-server=stun://stun.l.google.com:19302 signaler::uri=ws://<your-signaling-server>:8443 ! \

queue ! videoconvert ! autovideosink

Other WebRTC Integrations

To simplify WebRTC pipeline development, GStreamer includes signaling integrations for a number WebRTC services:

AWS Kinesis Video Streams – our first external signaling implementation targets AWS’ Kinesis Video Streams, which supports webrtcsink functionality for streaming from GStreamer into AWS.
LiveKit – A signaller for LiveKit supports both webrtcsink and webrtcsrc roles, with a draft merge request in place for enabling simulcast with the LiveKit SFU.
Janus VideoRoom – The signaller for Janus’ VideoRoom supports webrtcsink functionality, and there is a draft upstream merge request to include webrtcsrc support.
WHIP – A signaller for the new general WHIP signaling specification is available, allowing both client and server use with whipserversrc and whipclientsink.
WHEP – Two draft merge requests for WHEP signaling cover client and server implementations. This topic remains a work in progress within GStreamer.

The Future

Here is a look at some of the major areas we are working on now.

Rust

Memory-safe programming languages are a new trend, especially when potentially untrusted data is handled. RTP packets received over the network are an example of data that may be untrusted. Malformed packets should not be able to cause erroneous application behavior. As a result, there has been an effort to reimplement some of the GStreamer RTP infrastructure in Rust, a memory-safe language. We have been implementing several RTP payloaders and depayloaders in various elements. We also rewrote RTP session management in Rust. We aren’t there yet, but Rust may eventually supersede the existing C-based implementations.

librice

Another avenue where untrusted data may be involved is with ICE. There has also been an effort to write a sans-IO implementation of an ICE state machine in Rust available from https://github.com/ystreet/librice. sans-IO is a design pattern where all external state is provided as inputs to a running state machine. This allows both flexibility and reusability for users of the API. A helper rust async API is provided on top for those users that do not want to deal with the sans-IO API.

The main reason for using a sans-IO design pattern is that libnice, which provides ICE connectivity for webrtcbin, currently uses GLib and GIO for networking which is not as performant as it could be. Adding support for some newer and more performant IO APIs into libnice or GIO is a complicated piece of work. The sans-IO design allows the flexibility for the application to provide its own (performant) IO or use librice’s async implementation.

Additionally, the sans-IO librice has some (not useful for WebRTC) compatibility modes for aging environments where a non-RFC-compliant implementation of ICE is required. This results in some implementation complexity that is not needed with an RFC-only ICE implementation. This hinders adding new features to libnice.

WebKitGTK/WebKitWPE

As a web framework/browser, there is a desire to support WebRTC with some of the WebKit ports. These WebKit ports already use GStreamer to support media playback for MSE, <video> tags, etc. Therefore, it also makes sense to use GStreamer for WebRTC playback too. WebRTC support used to be based on libwebrtc. However, for embedded use cases, libwebrtc is not easy to integrate. The current default is an implementation based on webrtcbin.

{“author”: “matthew waters“}

A (Quick) Introduction to GStreamer