• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
webrtcHacks

webrtcHacks

Guides and information for WebRTC developers

  • Home
  • About
    • Chad Hart
    • Philipp Hancke
  • Subscribe
  • Contact
  • Show Search
Hide Search

Reverse-Engineering Blackbox Exploration, Duo, e2ee, google, Google Meet, Opus, ssltcp, VP9, webrtc-internals Gustavo Garcia · June 15, 2022

Meet vs. Duo – 2 faces of Google’s WebRTC

Google recently announced their intent to merge Google Duo and Google Meet into one application. That, together with Gustavo Garcia’s recent blog post on how Google Meet uses WebRTC, made Philipp and Gustavo wonder what that means for the WebRTC bits of both applications. Hence we grabbed webrtc-internals dumps from both applications to do a back-to-back comparison. While there are a few some Googly commonalities, the WebRTC usage of both applications is surprisingly different as we’ll see.

Meet vs. Duo TL;DR Summary

Google Meet Google Duo
PeerConnections Single PeerConnection
ICE / TURN No STUN or TURN servers, ICE-TCP and nonstandard SSLTCP candidates; uses port 443 for TCP/TLS
SRTP encryption SRTP with newer AEAD_AES_256_GCM cipher suite SRTP with old AES_CM_128_HMAC_SHA1_80 cipher suite
End-to-end encryption No end-to-end encryption yet End-to-end encryption using WebRTC Insertable Streams
Audio capture Additional constraints but with default values No constraints passed
Audio transmission Opus with inband-fec and dtx. More audio codecs are supported in the SDP Opus with inband-fec and no dtx is the only codec that is supported
Video capture 720p at 24 fps 360p at 30fps then 720p at 30fps
facingMode “user” constraint
Video transmission VP9 SVC enabled through SDP munging and VP8 simulcast used for screen sharing VP8 simulcast enabled through the addTransceiver API
DataChannels 4 DataChannels, reliable and unreliable 3 reliable DataChannels
RTP header extensions AV1 dependency descriptor and video-layer-allocation extension Non-standard generic frame descriptor extension similar to the AV1 DD
RTCP transport-wide-cc bandwidth estimation, NACK/RTX retransmissions, RRTR XR packets transport-wide-cc bandwidth estimation, NACK/RTX retransmissions

Note: the scope of this post doesn’t include info about the native mobile versions of these apps. Duo is supposed to have relevant improvements in terms of connectivity, more efficient video codecs for battery saving and proprietary ML based audio codecs for reliability and support of low bandwidth networks but we did not verify that as part of this research.

PeerConnection

Both Meet and Duo use a single RTCPeerConnection for outgoing and incoming video streams. Historically, Duo started with unified-plan enabled by default while Google Meet took until mid-2021 to switch the SDP format.

The connections have slight configuration differences:

Meet Duo
Configuration: {
iceServers: [],
iceTransportPolicy: all,
bundlePolicy: max-bundle,
rtcpMuxPolicy: require,
iceCandidatePoolSize: 0,
sdpSemantics: “unified-plan”,
extmapAllowMixed: true
}
Legacy (chrome) constraints: {
advanced: [{googScreencastMinBitrate: {exact: 100}}]
}
Configuration:{
iceServers: [],
iceTransportPolicy: all,
bundlePolicy: max-bundle,
rtcpMuxPolicy: require,
iceCandidatePoolSize: 0,
sdpSemantics: “unified-plan”,
encodedInsertableStreams: true,
extmapAllowMixed: true
}
Legacy (chrome) constraints: {}

Duo sets the encodedInsertableStreams flag to true (which is required to get end-to-end encryption with insertable streams). Meet sets a legacy constraint to configure a minimum bitrate for screencasts (even though this connection is not used for screen sharing as we will see later). This isn’t surprising as Meet supports screen sharing while Duo does not. Note however that this constraint is going away.

Things start to differ after that however. The first thing Duo does is to create an offer with three audio m-lines and a data channel. It creates an offer, calls setLocalDescription and then continues to add two video m-lines. This is actually violating the specification (which Chrome still allows) since you can not set a local description in the have-local-offer state. Firefox forbids this, however Duo does not support Firefox so this was probably never noticed. The answer is an ice-lite answer with candidates listed in the first m-line.

Meet adds audio and video (via the addTransceiver API these days) and then creates a simpler offer with just a single audio and video m-line at first. Then it adds three receive-only audio m-lines. ICE candidates are added via addIceCandidate instead of being listed in the SDP.

ICE / TURN connectivity

Neither Meet nor Duo configure any STUN or TURN servers – which is expected since both servers are easily reachable and support ICE-TCP and the nonstandard ssltcp variant.

They do differ in where they put the ICE candidates – Duo puts them directly into the SDP of the answer:

1
2
3
4
5
6
a=candidate: 1 udp 2113939710 2001:4860:4864:4::81 19305 typ host generation 0
a=candidate: 1 tcp 2113939709 2001:4860:4864:4::81 19305 typ host tcptype passive generation 0
a=candidate: 1 ssltcp 2113939708 2001:4860:4864:4::81 443 typ host generation 0
a=candidate: 1 udp 2113932030 74.125.247.81 19305 typ host generation 0
a=candidate: 1 tcp 2113932029 74.125.247.81 19305 typ host tcptype passive generation 0
a=candidate: 1 ssltcp 2113932028 74.125.247.81 443 typ host generation 0

We see UDP candidates for both IPv6 and IPv4 on port 19305 as well as ICE-TCP (on port 19305) and ssltcp (on port 443) candidates which allow for TCP connectivity in cases where UDP is blocked (in particular corporate networks).

Meet sends the same types of candidates using addIceCandidate and in a slightly different order which suggests different server implementations. In addition, Meet also sends a UDP candidate for port 3478 which is the “standard” port used by many STUN servers. From our notes it is a relatively recent addition and may suggest that this port is often not blocked in enterprise environments that want to enable VoIP.

SSL-TCP candidates

The ssltcp candidates are a variant of ICE-TCP candidates that also send a “fake” SSL handshake which may fool some proxies into thinking this is a TLS connection. Whether this is still effective given that it uses a deprecated TLS variant is unclear, using port 443 is pretty much common practice.

getUserMedia audio constraints

Duo and Meet differ quite a bit with their getUserMedia audio constraints. Duo calls getUserMedia with both audio and video (twice) and only specifies audio: true.

Meet Duo
{deviceId:
{exact: [“fbb1641ed4a296addf6969b8197d7143ab503cd4dfc96072b07a2d9ceab8eca3”]}, advanced: [{googEchoCancellation: {exact: true}}, {googEchoCancellation: {exact: true}}, {autoGainControl: {exact: true}}, {noiseSuppression: {exact: true}}, {googHighpassFilter: {exact: true}}, {googAudioMirroring: {exact: true}}]}
true
Combined with video capture – see the next section.

Meet specifies a bunch of “goog” constraints. The impact of those constraints is very low (probably none) given that most of them are set to the default values used by browser WebRTC implementations.

Audio coding adjustments

SDP audio-answers from the server these also differ considerably.

Duo answer:

Duo only responds with Opus as audio codec, specifying the usage of inband FEC and using transport-cc for bandwidth estimation. Given its focus on end-to-end encryption we’re not sure we would recommend sending the audio level over the network without encryption. This is however quite useful for active speaker selection in the SFU. So far there are no known attacks that reconstruct the audio from the levels.

The Meet answer is longer:

In addition to Opus we see all other audio codecs supported by libwebrtc, including audio/red. We doubt those are actually supported though.

Opus parameters

In terms of actual Opus audio encoding, we see both applications enable useinbandfec to provide a certain degree of robustness against packet loss. Other than this, Google Meet enables “dtx” support in Opus (in a later setRemoteDescription call); Duo doesn’t.

Three audio stream sources

Both implementations make use of three “virtual” SSRCs. Since Chrome only renders the three loudest audio sources (a not widely known restriction that influences system design), it looks like both systems do rewriting of audio SSRCs and only forward the three loudest speakers.

getUserMedia video constraints

The video getUserMedia constraints these differ quite a bit:

Meet Duo
{
deviceId: {exact: [“b8fa4f11cda3094f85e15d8e5396b20f35a32164c06e5c9418bffab85113f2e1”]},
advanced: [
{frameRate: {min: 24}},
{height: {min: 720}},
{width: {min: 1280}},
{frameRate: {max: 24}},
{width: {max: 1280}},
{height: {max: 720}},
{aspectRatio:{exact: 1.77778}}
]
}
{
width: {ideal: 360},
height: {ideal: 640},
frameRate: {ideal: 30},
facingMode: {ideal: [“user”]}
}

Duo requests 640×360 (and later 1280×720) at 30fps with a facingMode constraint to get the user-facing camera. Duo comes from the mobile world where this constraint is necessary more often.

Meet splits its audio and video getUserMedia requests. It only requests a frameRate of 24 and is a bit more specific when asking for first 720p and then 640p:

We still wonder why the 24fps restriction is in place – a frame rate first standardized for sound films in 1926.

Video coding

Duo is using the addTransceiver API and the so-called “spec simulcast” which uses the mid and rid header extensions shown in this SDP that enables VP8 simulcast:

VP8 vs. VP9

Google Meet has traditionally been using SDP munging to enable VP8 simulcast. This continues to be the case but with a additional twist:

The answer from the server lists the RTP payload type 98 first:

This is for VP9 instead of VP8 (and uses a useadaptivelayering flag which is nowhere to be found in the WebRTC or Chrome code). As Iñaki Baz Castillo pointed out on twitter (in a sadly deleted thread) back in 2019, this actually enables VP9 SVC without the need for the field trial (which never rolled out). Work on the actual specification for WebRTC-SVC is actually progressing nicely (with a lot of delay). It seems the lack of this API did not stop Meet from using it. Obviously, given that VP9 SVC is using a single SSRC, munging in three SSRCs for VP9 SVC makes no sense at all.

Simulcast

In the case of Google Meet the a=x-google-flag:conference attribute is added to all the negotiated channels. This is required to enable simulcast via SDP munging (as seen in Fippo’s Simulcast Playground) but it is not needed anymore in case of Google Duo given that simulcast is enabled using the standard addTransceiver API.

Screensharing

While Google Duo does not support screen sharing we still find Google Meets approach to screensharing worth mentioning. It uses a different peerconnection, probably for reasons discussed by Tsahi Levent-Levi and Philipp in this fiddle of the month.

The initial connection setup is the same as for the video connection (it even negotiates the three virtual audio ssrcs again which does not make much sense to us). However, the answer from the server is preferring VP8 as codec. Using this together with VP8 simulcast and the x-google-conference flag triggers a special screen sharing mode which is mostly focused on providing different frame rates and not different resolutions (which is a risky thing for screen sharing as the resolution matters much more there).

DataChannels

Looking at the usage of RTCDataChannel we can see that both applications make use of them.

Google Meet creates 4 data channels named dataSendChannel, collections, audioprocessor and captions. They are created as reliable transports for the first two and unreliable for the others. Duo creates 3 datachannels named gcWeb, karma and collections. Most of them don’t show traffic. It is not easy to know what they are for, but we can see there are some few messages in the collections datachannel and a lot of messages in the audioprocessor channel when multiple participants are in the call. Google Meet uses it to provide active speaker / audio levels information given that the audio is multiplexed in three single audio tracks as described above in the Audio section.

Security

Both Google Meet and Duo support the standard WebRTC encryption based on DTLS for the keys negotiation and Datachannels encryption and SRTP for the media encryption.

One small difference is the SRTP crypto suite used for the encryption. While Google Meet uses the more modern and efficient AEAD_AES_256_GCM cipher (added in mid-2020 in Chrome and late 2021 in Safari), Google Duo is still using the traditional AES_CM_128_HMAC_SHA1_80 cipher.

Source: Google Duo End-to-End Encryption Overview

The biggest difference in terms of security is the support for End-to-End encryption in Google Duo. Frame encryption explained here was initially implemented for mobile and then included in the web application making use of the insertable streams API from WebRTC.

This has been previously explained in webrtcHacks and it is also done in other applications like Jitsi.

RTP header extensions

Even if both applications use standard RTP & RTCP for audio and video transmissions – like most of the WebRTC video conferencing applications – there are some differences in terms of the RTP header extensions used to convey additional information. They also have differences in the RTCP messages supported for media control.

There are many header extensions that are common for both Meet and Duo. That includes the abs-send-time, video-orientation, audio-levels, transport-wide-cc and sdes header extensions.

However Google Meet includes many more extensions related to timing and buffering:
a=extmap:5 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:9 http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time

Note that the video-content-type extension which is an in-band description of whether the media source is coming from the camera or a screen sharing source to allow specific optimizations in the receiver side.

And Meet also includes an extension related to frame marking that is specified by AV1 but can be used for other codecs as well:
a=extmap:15 https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension
In case of Google Duo, this is handled in a different way, with a nonstandard extension (which even lacks documentation called the generic-frame-descriptor instead of the av1 dependency descriptor:
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/generic-frame-descriptor-00
In addition to that Google Meet also uses a nonstandard extension called “video layer allocation”:
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00
This is (kinda) specified here. It is set in the remote description (with an id of a not-used extension from the offer). This is actually a very interesting extensions for SFUs since it solves two problems:

  1. When a simulcast or SVC layer is no longer sent due to bandwidth restrictions, this is communicated to the server. This allows the SFU to act faster and switch clients to a lower layer.
  2. It informs the server about the encoder target bitrates for each layer. This is more accurate than the server measuring this from the incoming packets. It also improves layer selection.

RTCP

In terms of RTCP there are no big differences, both applications use transport-wide congestion control messages for bandwidth estimation, NACKs with RTX for video retransmissions and PLIs for keyframe requests. The only difference found so far is the use of extended reports in Google Meet in the form of RRTR messages to be able to estimate the Round Trip Time delay from the receiver point of view instead of the sender. See this WebRTC samples pull request for some more information on how this enables calculation of time offsets.

Conclusions

As an app, Duo started out as a newer, more focused successor to Meet and that’s reflected in its implementation. Parts, like the use of three virtual tracks, are somewhat unique to Google, and are implemented in a similar way in both apps. The use of ICE with the absence of TURN servers is exactly the same. However, other parts of it are completely different – in particular the use of the WebRTC APIs like addTransceiver with simulcast. Duo’s later market entry meant it was able to leverage stable standards with unified-plan and “spec-simulcast” from the beginning. Duo also introduced innovative features for WebRTC, like end-to-end encryption.

Google Meet was comparatively slow to others in adopting new WebRTC features that came to the specification – to the extent of seriously delaying them (like WebRTC-SVC for which there is no product need as the feature is already enabled through a hack). Despite its spec compliance challenges, Meet has seen a tremendous amount of incremental improvements since early 2020. Examples include the video layer allocation extension, using GCM cipher suites for SRTP, a transition from the legacy addStream API towards addTransceiver (via addTrack) and of course the big transition from plan-b to unified-plan. Meet actually looks ahead in terms of encoding with the VP9 SVC support and RTP based features (rrtr, red, timing hdrext..). Meet is also a more feature rich product, which helps to push its WebRTC implementation.

Fippo kicked off webrtcHacks’ blackbox reverse-engineering series back in 2014 with How does Hangouts use WebRTC? Hangouts has evolved into Google Meet, where it continues to be a great reference implementation to examine. We were first able to easily look at Duo’s webrtc-internals when it introduced multi-party calling and web support two years ago. Reflecting back after our analysis this week, it feels like Duo has been in maintenance mode while Meet continues to push features and performance optimizations needed to compete during the pandemic.

Continued maintenance of two distinct stacks for the same purpose is not a healthy thing to do for an engineering organization. Philipp bets that the Meet stack will be the one that survives but we’ll see how this develops over the course of the next year.

{“authors”: [”Gustavo Garcia”, “Philipp (fippo) Hancke“]}

Reverse-Engineering Blackbox Exploration, Duo, e2ee, google, Google Meet, Opus, ssltcp, VP9, webrtc-internals

Related Posts

  • FaceTime finally faces WebRTC – implementation deep diveFaceTime finally faces WebRTC – implementation deep dive
  • Suspending Simulcast Streams for Savvy Streamlining (Brian Baldino)Suspending Simulcast Streams for Savvy Streamlining (Brian Baldino)
  • YouTube Does WebRTC – Here’s HowYouTube Does WebRTC – Here’s How
  • Dear Slack: why is your WebRTC so weak?Dear Slack: why is your WebRTC so weak?

RSS Feed

Reader Interactions

Comments

  1. Pablo says

    June 21, 2022 at 1:31 am

    How does using 3 audio tracks and rewriting SSRCs affect audio/video sync (lip-sync)?

    Reply
  2. Sivaruban says

    July 28, 2022 at 5:26 am

    Hi,

    Great post, giving a real insight into how Google Meet works. I have been studying the WebRTC Internals logs for Google Meet and came across that all invites and re-invites are initiated in a concatenated SDP in a single shot SDP from the browser client. i have tried the same in our POC SFU video conferencing but the packets are fragmented and causing issues on the client side and unstable, is this approach correct or my understanding is wrong about concatenated SDP ?

    Thanks

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Sponsored. Become a webtcHacks sponsor

Email Subscription

Subscribe to our mailing list

* indicates required

Twittering

Tweets by @webRTChacks
webrtcHacksguides and information for WebRTC developers

Footer

SITE

  • Post List
  • About
  • Contact

Categories

  • Guide
  • Other
  • Reverse-Engineering
  • Standards
  • Technology

Tags

apple Blackbox Exploration Brief camera Chrome code computer vision DataChannel debug e2ee Edge extension gateway getUserMedia ICE ims insertable streams ios ip leakage janus jitsi MCU Microsoft NAT opensource Opus ORTC Promo Q&A raspberry pi Safari SDES SDP sfu signaling simulcast standards TensorFlow TURN video vp8 Walkthrough Web Audio webrtc-internals wireshark

Follow

  • Twitter
  • YouTube
  • GitHub
  • RSS

webrtcHacks · copyright © 2023