How to use the AWS API Gateway WebSocket API functionality with Lamdba functions to implement a serverless WebRTC signaling architecture
webrtcH4cKS: ~ The Minimum Viable SDP
One evening last week, I was nerd-sniped by a question Max Ogden asked:
That is quite an interesting question. I somewhat dislike using Session Description Protocol (SDP) in the signaling protocol anyway and prefer nice JSON objects for the API and ugly XML blobs on the wire to the ugly SDP blobs used by the WebRTC API.
The question is really about the minimum amount of information that needs to be exchanged for a WebRTC connection to succeed.
WebRTC uses ICE and DTLS to establish a secure connection between peers. This mandates two constraints:
- Both sides of the connection need to send stuff to each other
- You need at minimum to exchange ice-ufrag, ice-pwd, DTLS fingerprints and candidate information
Now the stock SDP that WebRTC uses (explained here) is a rather big blob of text, more than 1500 characters for an audio-video offer not even considering the ICE candidates yet.
Do we really need all this? It turns out that you can establish a P2P connection with just a little more than 100 characters sent in each direction. The minimal-webrtc repository shows you how to do that. I had to use quite a number of tricks to make this work, it’s a real hack.
How I did it
Get some SDP ...
webrtcH4cKS: ~ Project WONDER: showing WebRTC NNI does not need SIP
While this approach is appropriate for many applications, it does leave many open questions for others:
- What happens when the caller wishes to retain some control of the call?
- Who determines the calling platform?
- How do you allow cross-domain calls?
- How do you avoid vendor lock-in to proprietary signaling protocols?
SIP-based IMS networks address all these problems by providing a vendor independent, end-to-end signaling mechanism to works within and across service provider domains. As a result, end-to-end SIP proponents often argue, “why create something new when SIP already exists?”
Is it possible to ensure interoperability between different WebRTC service providers while using application specific signaling?
This is the root question that drove the WONDER (Webrtc interOperability tested in coNtradictive DEployment scenaRios) project, a partnership between Deutsche Telekom and Portugal Telecom that is partially funded by the European Commission.
Is WebRTC’s non-standardized signaling, triangular network model, and minimum network side functionality fundamentally incompatible with the Telco model where distinct apps can natively communicate across any compliant environment? Or, is it possible to leverage the current WebRTC model within the Telco model? That is the the core concept behind project Wonder. The scientists from Portugal Telecom Inovação and Deutsche Telekom call this concept Signaling on-the-fly and they explored and tested this concept as part of this project.
The results of this experiment are discussed below.
Signaling on-the-fly architecture
Before we review how the Signaling on-the-fly concept works, let’s start by defining a few of its terms and look at how these relate in a diagram (Figure 1):
- Domain Channel: the signaling channel that is established with the domain’s messaging server as soon as the user is registered and is online
- Transient Channel: the signaling channel that is established, typically with a foreign messaging server (i.e. from another domain) in the scope of a certain conversation
- Messaging Stub: the script containing the protocol stack and all the logic needed to establish a Channel to a certain Messaging Server
- Conversation Host: the Conversation Host is the Message Server that is used to support all conversation messages exchanged among peers belonging to different domains
Called-party Domain hosting
Let’s use the classic Alice and Bob example to explain the concept assuming they are registered in different Service Provider domains. Alice wants to talk to Bob by using Bob’s RTC identity e.g. [email protected], so the process as illustrated in Figure 1 is:
- Information about the Identity of Bob, including Bob’s Messaging Stub provider, is provided and asserted by Bob’s Identity Provider (IdP).
- Alice downloads and instantiates Bob’s Messaging Stub in her browser to setup a Transient Channel with Bob’s domain Messaging Server.
- As soon as the Transient Channel is established, Alice can send an Invitation message to Bob containing her SDP offer.
- Since Bob is connected in the same Message Server via his Domain Channel, he will receive Alice’s invitation in his Browser. If Bob accepts the invitation, an Accepted message containing Bob SDP response will be send to Alice.
- As soon as Alice’s browser receives Bob’s SDP, the media and/or data streams can be directly connected between the two browsers.
Calling-party Domain hosting ...
webrtcH4cKS: ~ Signalling Options for WebRTC Applications
As I described in the standardization post, the model used in WebRTC for real-time, browser-based applications does not envision that the browser will contain all the functions needed to function as a telephone or video conferencing unit. Instead, is specifies the browser will contain the functions that are needed to run a Web application which would work in conjunction with back-end servers to implement telephony functions as required. According to this, WebRTC is meant to implement the media plane but to leave the signalling plane up to the application. Different applications may prefer to use different protocols, such as SIP or something custom to the particular application. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the configuration necessary to establish the media plane. In other words, WebRTC does not specify a particular signalling model other than the generic need to exchange SDP media descriptions in the offer/answer fashion. However, the browser is totally decoupled from the actual mechanism by which these offers and answers are communicated to the remote side. ...