Popular this Month

Pion seemingly came out of nowhere to become one of the biggest and most active WebRTC communities. Pion is a Go-based set of WebRTC projects. Golang is an interesting language, but it is not among the most popular programming languages out there, so what is so special about Pion? Why are there so many developers involved in this project? 

To learn more about this project and how it came to be among the most active WebRTC organizations, I interviewed its founder – Sean Dubois. We discuss Sean’s background and how be got started in RTC, so see the interview for his background.  I really wanted to understand why he decided to build a new WebRTC project and why he continues to spend so much of his free time on it. ...  Continue reading

Interview with WebRTC standards co-chair and author, Bernard Aboba. We cover the current status of WebRTC and where it is headed including WebRTC-NV, Simulcast, SVC, AV1, WebTransport, WebCodecs, ML and more.

Chrome recently added the option of adding redundancy to audio streams using the RED format as defined in RFC 2198, and Fippo wrote about the process and implementation in a previous article. You should catch-up on that post, but to summarize quickly RED works by adding redundant payloads with different timestamps in the same packet. If you lose a packet in a lossy network then chances are another successfully received packet will have the missing data resulting in better audio quality.

That was in a simplified one-to-one scenario, but audio quality issues often have the most impact on larger multi-party calls. As a follow-up to Fippo’s post, Jitsi Architect and Improving Scale and Media Quality with Cascading SFUs author Boris Grozev walks us through his design and tests for adding audio redundancy to a more complex environment with many peers routing media through a Selective Forwarding Unit (SFU).

{“editor”, “chad hart“}

Fippo covered how to add redundancy packets in standard peer-to-peer calls without any middle boxes like a Selective Forwarding Unit (SFU).  What happens when you stick in a SFU in the middle? There are a couple more things to consider.

  • How do we handle conferences where clients have different RED capabilities? It may be the case that only a subset of the participants in a conference support RED. In fact this will often be the case today since RED is a relatively new addition to WebRTC/Chromium/Chrome.
  • Which streams should have redundancy? Should we add redundancy for all audio streams at the cost of additional overhead, or just the currently active speaker (or 2-3 speakers)?
  • Which legs should have redundancy? In multi-SFU cascading scenarios, do we need to add redundancy for the SFU-SFU streams?
  •  ...  Continue reading

    Back in April 2020 a Citizenlab reported on Zoom’s rather weak encryption and stated that Zoom uses the SILK codec for audio. Sadly, the article did not contain the raw data to validate that and let me look at it further. Thankfully Natalie Silvanovich from Googles Project Zero helped me out using the Frida tracing tool and provided a short dump of some raw SILK frames. Analysis of this inspired me to take a look at how WebRTC handles audio. In terms of perception, audio quality is much more critical for the perceived quality of a call as we tend to notice even small glitches. Mere ten seconds of this audio analysis were enough to set me off on quite an adventure investigating possible improvements to the audio quality provided by WebRTC.

     

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

    local Jitsi recording hack with getDisplayMedia audio capture and mediaRecorder

    I wanted to add local recording to my own Jitsi Meet instance. The feature wasn’t built in the way I wanted, so I set out on a hack to build something simple. That lead me down the road to  discovering that:

    1. getDisplayMedia for screen capture has many quirks,
    2. mediaRecorder for media recording has some of its own unexpected limitations, and
    3. Adding your own HTML/JavaScript to Jitsi Meet is pretty simple

    Read on for plenty of details and some reference code. My result is located in this repo.

     

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

    Software as a Service, Infrastructure as a Service, Platform as a Service, Communications Platform as a Service, Video Conferencing as a Service, but what about Gaming as a Service? There have been a few attempts at Cloud Gaming, most notably Google’s recently launched Stadia. Stadia is no stranger to WebRTC, but can others leverage WebRTC in the same way?

    Thanh Nguyen set out to see if this was possible with his open source project, CloudRetro. CloudRetro is based on the popular go-based WebRTC library, pion (thanks to Sean of Pion for helping review here). In this post, Thanh gives an architectural review of how he build the project along with some of the benefits and challanges he ran into along the way.

    {“editor”, “chad hart“}

    Introduction

    Last year, Google announced Stadia, and It blew my mind. That idea is so unique and innovative. I kept questioning how it is even possible with the current state of technology. The motivation to demystify its technology spurred me to create an open-source version of Cloud Gaming. The result is fantastic. I would like to share my one year adventure working on this project in the article below.

    TLDR: the short slide version with highlights

    Why Cloud Gaming is the future

    I believe Cloud Gaming will soon become the new generation of not only games but also other fields of computer science. Cloud Gaming is the pinnacle of the client/server model. It maximizes backend control and minimizes frontend work by putting game logic on a remote server and streaming images/audio to the client. The server handles heavy processing so the client is no longer limited by hardware constraints.

    Looking at Google Stadia, it essentially allows you to play AAA games (i.e. high-end blockbuster games) on an interface like YouTube. The same methodology can be applied to other heavy offline applications like Operating System or 2D/3D graphic design, etc… so that we can run them consistently on low spec devices across many platforms.

    The future of this technology: imagine running Microsoft Windows 10 on a Chrome browser?

    Cloud Gaming is technically challenging

    Gaming is one of the rare applications that require continuous fast user reaction. If we click a page that takes a 2-second delay once in a while, it is still acceptable. Live broadcast video streams typically run many seconds behind, but still offer acceptable usability. However, if a game is delayed frequently for 500ms, it becomes unplayable. The target is to achieve extremely low latency to ensure the gap between game input and media is as small as possible. Therefore, the traditional Video streaming approach is not applicable here.

    Cloud Gaming common pattern

    The Open Source CloudRetro Project

    I decided to create a POC of Cloud-Gaming so that I can verify whether it is possible with these tight network restrictions. I picked Golang for my POC because it is the language I am most familiar with and it turned out to work well for many other reasons. Go is simple with fast development speed. Go channels are excellent when dealing with concurrency and stream manipulation.

    The project is CloudRetro.io: Open source Web-based Cloud Gaming Service for Retro game. The goal is to bring the most comfortable gaming experience and introduce network gameplay like online multiplayer to traditional retro games.

    You can reference the entire project repo here: https://github.com/giongto35/cloud-game

    CloudRetro Functionality

    CloudRetro used Retro games to demonstrate the power of Cloud Gaming. It enables many unique gaming experiences.

    • Portable Gaming experience
      • Instant play when the page is opened; no download, no install
      • Running on browser, mobile, so you don’t need any software to launch

      Gaming sessions can be shared across multiple devices and stored on cloud storage for next time
      Game is both streamed and playable and multiple users can join the same game:

      • Crowdplay like TwitchPlayPokemon but more real time and seamless
      • Online multiplayer for offline games without network setting. Samurai Showdown is now playable with 2 players over the network in CloudRetro

      Infrastructure

      Requirement and Tech Stack

      Below is the list of requirements I set before starting the project.

      1. Singleplayer:

      This requirement sounds not relevant and straightforward, but it’s one of my key findings that makes cloud gaming stand away from traditional streaming services. If we focus on singleplayer, we can get rid of a centralized server or CDN because we don’t need to distribute the stream session to massive users. Rather than uploading streams to an ingest server or passing packets to a centralized WebSocket server, the service streams to the user directly over a WebRTC peer connection.

      2. Low Latency media stream

      When I research about Stadia, some articles are mentioning the application of WebRTC. I figured out WebRTC is a remarkable technology and fits this cloud gaming use case nicely. WebRTC is a project that provides web browsers and mobile applications with Real-Time Communication via simple API. It enables peer communication and is optimized for media and has built-in standard codecs like VP8 and H264.

      I prioritized delivering the smoothest experience to users over keeping high-quality graphics. Some loss is acceptable in the algorithm. On Google Stadia, there is an additional step to reduce image size on the server, and image frames are rescaled to higher quality before rendering to peers.

      3. Distributed infrastructure with geographic routing.

      No matter how optimized the compression algorithm and the code is, network is still the crucial factor contributing the most to latency. The architecture needs to have a mechanism to pair the closest server to the user to reduce Round Trip Time (RTT). The architecture contains a single coordinator and multiple streaming servers distributed around the world: US West, US East, Europe, Singapore, China. All streaming servers are fully isolated. The system can adjust its distribution when a server joins or leaves the network. Hence, under high traffic, adding more servers allows horizontal scaling.

      4. Browser Compatible

      Cloud Gaming shines the best when it demands the least from users. This means being able to run on a browser. Browsers help bring the most comfortable gaming experience to users by removing software and hardware installs. Browsers also help provide cross-platform flexibility across mobile and desktop. Fortunately, WebRTC has excellent support across different browsers.

      5. Clear separation of Game interface and service

      I see the cloud gaming service as a platform. One should be able to plug in any to the platform. Currently, I integrated LibRetro with the Cloud Gaming service because LibRetro offers a beautiful gaming emulator interface for retro games like SNES, GBA, PS.

      6. Room based mechanism for multiplayer, crowd play and deep-link to the game

      CloudRetro enables many novel gameplays like CrowdPlay and Online MultiPlayer for retro games. If multiple users open the same deep-link on different machines, they will see the same running game as a video stream and even be able to join the game as any player.

      Moreover, Game states are stored on cloud storage. This lets users continue their game at any time on any different device.

      7. Horizontal scaling

      As every SAAS nowadays, it must be designed to be horizontally scalable. The coordinator-worker design enables adding more workers to serve more traffic.

      8. Cloud Agnostic

      CloudRetro infrastructure is hosted on various cloud providers (Digital Ocean, Alibaba, custom provider) to target different regions. I dockerize the infrastructure and configure network settings through bash script to avoid dependency on any one cloud provider. Combining this with WebRTC’s NAT traversal, we can gain the flexibility to deploy CloudRetro on any cloud platform and even on any user’s machines.

      Architectural design

      Worker: (or streaming server as referred above) spawns games, runs encoding pipeline, and streams the encoded media to users. Worker instances are distributed around the world, and each worker can handle multiple user sessions concurrently.

      Coordinator: in charge of pairing the new user with the most appropriate worker for streaming. The coordinator interacts with workers over WebSocket.

      Game state storage: central remote storage for all game states. This storage enables some essential functionalities such as remote saving/loading.

      User flow

      When a new user opens CloudRetro at steps 1 and 2 shown in the image below, the coordinator is requested for the frontend page along with the list of available workers. After that at step 3, the client calculates latencies to all candidates using an HTTP ping request. This list of latencies is later sent back to the coordinator so that it can determine the most suitable worker to serve the user. At step 4 below, the game is spawned. A WebRTC stream connection is established between the user and the designated worker.

      Inside the worker

      Inside a worker, game and streaming pipelines are kept isolated and exchange information via an interface. Currently, this communication is done by in-memory transmission over Golang Channels in the same process. Further segregation is the next goal –  i.e., running the game independently in a different process.

      The main pieces are:

      • WebRTC: Client-facing component where user input comes in and the encoded media from the server goes out.
      • Game Emulator: The game component. Thanks to Libretro library, the system is capable of running the game inside the same process and internally hooking media and input flow. In-game frames are captured and sent to the encoder.
      • Image/Audio Encoder: The encoding pipeline, where it accepts media frames, encodes in the background, and outputs encoded images/audio.

      Implementation

      CloudRetro relies on WebRTC as the backbone, so before going into details about my implementation in Golang, the first section is dedicated to introducing WebRTC technology. It is an awesome technology that greatly helps me achieve sub-second latency streaming.

      WebRTC

      WebRTC is designed to enable high-quality peer-to-peer connections on native mobile and browsers with simple APIs.

      NAT Traversal

      WebRTC is renowned for its NAT Traversal functionality. WebRTC is designed for peer-to-peer communication.  It aims to find the most suitable direct route avoiding NAT gateways and firewalls for peer communication via a process named ICE. As part of this process, the WebRTC APIs find your public IP Address using STUN servers and fallback to a relay server (TURN) when direct communication cannot be established.

      However, CloudRetro doesn’t fully utilize this capability. Its peer connections are not between users and users but between users and cloud servers. The server side of the model has fewer restrictions on direct communication than a typical user device. This allows doingthings like pre-opening inbound ports or using public IP addresses directly as the server is not behind NAT.

      I previously had ambitions of developing the project to become a game distribution platform for Cloud Gaming. The idea was to let game creators contribute games and streaming resources. Users would be paired with game creators’ providers directly. In this decentralized manner, CloudRetro is just a medium to connect third-party streaming resources with users, so it is more scalable when the burden of hosting does not rely on CloudRetro anymore. WebRTC NAT Traversal will play an important role when it eases the peer connection initialization on third-party streaming resources, making it effortless for the creator to join the network.

      Video Compression

      Video compression is an indispensable part of the pipeline that greatly contributes to a smooth streaming experience. Even though It is not compulsory to fully know all of VP8/H264’s video coding details, understanding its concepts helps to demystify streaming speed parameters, debug unexpected behavior, and tune the latency.

      Video Compression for a streaming service is challenging because the algorithm needs to ensure the total encoding time + network transmission + decoding time is as small as possible. In addition, the encoding process needs to be in serial order and continuous. Some traditional encoding trade-offs are not applicable – like trading long encoding time for smaller file size and decoding time or compressing without order.

      The idea of video compression is to omit non-essential bits of information while keeping an understandable level of fidelity for users. In addition to encoding individual static image frames, the algorithm made an inference for the current frame from previous and future frames, so only the difference is sent. As you see in the the Pacman example below, only the differential dots are transferred.

      Audio Compression

      Similarly, the audio compression algorithm omits data that cannot be perceived by humans. The audio codec with the best performance currently is Opus. Opus is designed to transmit audio wave over an ordered datagram protocol such as RTP (Real-time Transport Protocol). It produces lower latency than (mp3, aac) with higher quality. The delay is usually around 5~66.5 ms

      Pion, WebRTC in Golang

      Pion is an open-source project that brings WebRTC to Golang. Rather than simply wrapping the native C++ WebRTC libraries, Pion is a native Golang implementation for better performance, better Golang integration, and version control on constitutive WebRTC protocols.

      The library also provides sub-second latency streaming with many great built-ins. It has its own implementation of STUN, DTLS, SCTP, etc… and some experiments of QUIC and WebAssembly. This open-source library itself is really a good source of learning with a great document, network protocol implementations and cool examples.

      The Pion community, led by a very passionate creator, is lively and has many quality discussions about WebRTC. If you are interested in this technology, please join http://pion.ly/slack – you will learn many new things.

      Write CloudRetro in Golang

      Go Channel In Action

      Thanks to Go’s beautiful channel design, event streaming and concurrency problems are greatly simplified. As in the diagram, there are multiple components running parallely in different GoRoutines. Each component manages its own state and communicates over channels. Golang’s select statement enforces that one atomic event is processed each game tick. This means locking is unnecessary with this design. For example, when a user saves, a completed snapshot of the game state is required. This state needs to remain uninterrupted by running input until the save is complete. During each game tick, the backend can only process either save operation or input operation, so it is concurrently safe.

      Fan-in / Fan-out

      This Golang pattern perfectly matches my use-case for CrowdPlay and Multiple Player. Following this pattern, all user inputs in the same room are fanned-in to a central input channel.  Game media is then fanned-out to all users in the same room. Hence, we achieve game state sharing between multiple gaming sessions from different users.

      Synchronization between different sessions

      Downsides of Golang

      Golang isn’t perfect. Channel is slow. Compared to lock, Go channel is just a simpler way to handle concurrency and streaming events, but channel does not give the best performance. There is a complex locking logic under a channel. Hence, I made some adjustments in the implementation by reapplying locks and atomic value in replacement of channels to optimize the performance.

      In addition, the Golang garbage collector is uncontrollable, so sometimes there are some suspicious long pauses. This greatly hurts the realtime-ness of this streaming application.

      CGO

      The project uses some existing Golang open-source VP8/H264 Library for media compression and Libretro for Game emulators. All of these libraries are just wrappers of C library in Go by using CGO. There are some drawbacks that you can refer to this blog post by Dave. The issues I’m facing are:

      • Crash in CGO is not caught, even with Golang Recovery
      • Unable to define performance bottleneck when we cannot detect granular issues under CGO

      Conclusion ...  Continue reading

    A couple of weeks ago, the Chrome team announced an interesting Intent to Experiment on the blink-dev list about an API to do some custom processing on top of WebRTC. The intent comes with an explainer document written by Harald Alvestrand which shows the basic API usage. As I mentioned in my last post, this is the sort of thing that maybe able to help add End-to-End Encryption (e2ee) in middlebox scenarios to WebRTC.

    I had been watching the implementation progress with quite some interest when former webrtcHacks guest author Emil Ivov of jitsi.org reached out to discuss collaborating on exploring what this API is capable of. Specifically, we wanted to see if WebRTC Insertable Streams could solve the problem of end-to-end encryption for middlebox devices outside of the user’s control like Selective Forwarding Units (SFUs) used for media routing.

    The good news it looks like it can! Read below for details.

    Before we get into the project, we should first recap how media encryption works with media server devices like SFU’s.

    Media Encryption in WebRTC

    WebRTC mandates encryption. It uses DTLS-SRTP for encrypting the media. DTLS-SRTP works by using a DTLS handshake to derive keys for encrypting the payload of the RTP packets. It is authenticated by comparing the a=fingerprint lines in the SDP that are exchanged via the signaling server with the fingerprints of the self-signed certificates used in the handshake. This can be called end-to-end encryption since the negotiated keys do not leave the local device and the browser does not have access to them. However, without authentication it is still vulnerable to man-in-the-middle attacks.

    See our post about the mandatory use of DTLS for more background information on encryption and how WebRTC landed where it is today.

    SFU Challenges

    The predominant architecture for multiparty is a Selective Forwarding Unit (SFU). SFUs are basically packet routers that forward a single or small set of streams from one user to many other users. The basics are explained in this post.

    In terms of encryption, DTLS-SRTP negotiation happens between each peer endpoint and the SFU. This means that the SFU has access to the unencrypted payload and could listen in. This is necessary for features like server-side recording. On the downside, it means you need to trust the entity running the SFU and/or the client code to keep that stream private. Zero trust is always best for privacy.

    Unlike a more traditional VoIP Multipoint Control Unit (MCU) which decodes and mixes media, a SFU only routes packets. It does not care much about the content (apart from a number of bytes in the header and whether a frame is a keyframe). So theoretically the SFU should not need to decode and decrypt anything. SFU developers have been quite aware of that problem since the early days of WebRTC. Similarly, Google’s webrtc.org library has included a “frame encryption” approach for a while which was likely added for Google Duo but doesn’t exist in the browser. However, the “right” API to solve this problem universally only happened now with WebRTC Insertable Streams.

    Make it Work

    Our initial game plan looked like the following:

    1. Make it work

    End-to-End Encryption Sample

    Fortunately making it work was a bit easier since Harald Alvestrand had been working on a sample which simplified our job considerably. The approach taken in the sample is a very nice demonstration:

    1. opening two connections,
    2. applying the (intentionally weak, xor-ing the content with the key) encryption on both but
    3. only decryption on one of them.

    You can test the sample on here.  Make sure you run the latest Chrome Canary (84.0.4112.0 or later) and that the experimental Web Platform Features flag is on.

    The API is quite easy to use. A simple logging transform function looks like this:

    The transform function is then called for every video frame. This includes an encoded frame object (named chunk) and a controller object. The controller object provides a way to pass the modified frame to the next step. In our case this is defined by the pipeTo  call above which is the packetizer.

    Iterating improvements

    With a fully working sample (video-only at first because audio was not yet implemented), we iterated quickly on some improvements such as key changes and not encrypting the frame header. The latter turned out to be very interesting visually. Initially, upon receiving the encrypted frame, the decoder of the virtual “middlebox” would just throw an error and the picture would be frozen. Exploiting some properties of the VP8 codec and not encrypting the first couple of bytes now tricks the decoder into thinking that frame is valid VP8. Which looks … interesting:

    Give the sample a try yourself here.

    Insertable Streams iterates on frames, not packets

    The Insertable Streams API operates between the encoder/decoder and the packetizer that splits the frames into RTP packets. While it is not useful for

    inserting your own encoder with WebAssembly ...  Continue reading

    WebRTC has made getting and sending real time video streams (mostly) easy. The next step is doing something with them, and machine learning lets us have some fun with those streams. Last month I showed how to run Computer Vision (CV) locally in the browser. As I mentioned there, local is nice, but sometimes more performance is needed so you need to run your Machine Learning inference on a remote server. In this post I’ll review how to run OpenCV models server-side with hardware acceleration on Intel chipsets using Intel’s open source Open WebRTC Toolkit (OWT).

    Note: Intel sponsored this post. I have been wanting to play around with the OWT server since they demoed some of its CV capabilities at Kranky Geek and this gave me a chance to work with their development team to explore its capabilities. Below I share some background on OWT, how to install it  locally for quick testing, and show some of the models.

     

    Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.