We have have had many posts on Session Description Protocol (SDP) here at werbrtcHacks. Why? Because it is often the most confusing yet critical aspects of WebRTC. It has also been among the most controversial. Earlier in WebRTC debates over SDP lead the to the development of the parallel ORTC standard which is now largely merging back into the core specifications. However, the reality is non-SDP based WebRTC is still a small minority of deployments and many have doubts this will change any time soon despite its formal acceptance.
To make an updated case against SDP, former guest author Iñaki Baz Castillo joins us again. Having been involved in WebRTC standardization and deployments for many years, SDP has been opening new wounds for him as he has been building the mediasoup Selective Forwarding Unit (SFU) from scratch. Even if you do not care about the SDP debate, Iñaki provides great background on what SDP does, how ORTC differs, and practical issues with SDP in some increasingly common use cases that many of you will find useful.
{“editor”, “chad hart“}
The inspiration for this post comes from an email I sent to the MMUSIC WG in which I basically complain about the usage of SDP – Session Description Protocol – as “the minimum irreducible unit for exchanging multimedia information”.
I think it’s time to expel SDP from our lives, specially in WebRTC where we, the developers, can do media signaling much better than the SDP-way.
Before I dig into some of the new headaches SDP is making, lets first take a look again at what SDP does in WebRTC and recap on ORTC – the main alternative to using SDP with WebRTC.
A minimal recap of SDP in WebRTC 1.0
Although WebRTC 1.0 exposes an API to handle and transmit local audio, video and/or data to a remote WebRTC peer, the information exchange unit is the Session Description Protocol (SDP). This basically means that, whichever operation is locally executed, a SDP must be transmitted to the other party (or let the other party generate such a SDP representing our local operations) and “apply” it into its RTCPeerConnection by means of setRemoteDescription(). webrtcHacks has a line-by-line SDP guide here and is a good reference in addition to the spec to follow along.
What SDP does
A WebRTC SDP conveys information regarding all the communication layers, including:
- ICE parameters and optional ICE candidates (can be sent later) for establishing a network path between both peers in the multimedia session.
- DTLS parameters for performing a DTLS handshake and exchanging the SRTP keys to be used for audio and video, and/or establishing the encrypted transport on which DataChannel messages will be exchanged.
- RTP parameters determining what the peer is willing to send and receive.
Parameters & m= lines
The SDP offer contains m= (media) sections that must be accepted or rejected by the remote peer. Each m= section includes information regarding:
- The kind of media to be (bidirectionally?) transmitted, this is: audio , video or application .
- Whether the purpose of such a media section is to send media, receive media or both.
- RTP parameters (codec+configuration, supported RTCP feedback, redundancy mechanisms, etc).
- ICE and DTLS transport parameters.
The SDP also includes global parameters that allow, among others, the usage of BUNDLE. The purpose of BUNDLE is to group different m= sections into a “transport” group, so rather than establishing a separate ICE path and DTLS connection for each media section, a single one is used for all of them.
Although the meaning and purpose of a single media section in the SDP may be confusing, WebRTC 1.0 has adopted a philosophy by which each m= section carries information about a single MediaStreamTrack being sent, a single MediaStreamTrack being received, or both.
Assuming the common bidirectional audio and/or video between two peers, the media section RTP parameters (negotiated in a SDP offer/answer) determine the codecs to send and receive. Such a negotiation opens the door to use cases in which the SDP offerer transmits its media using codec A while the SDP answerer transmits its media using codec B. This is a typical source of problems and incompatibilities in SIP/VoIP implementations, and the same applies for current WebRTC endpoints.
One transport, one set of parameters, no exceptions…
Also, although each m= section carries its own RTP parameters for negotiation (which includes codec payload types, codec configurations, etc), SDP with BUNDLE specifications introduce a constraint by which all the media carried over the same transport must conform to the same parameters. This means that a single codec payload type across different media sections must reference the same codec name and configuration. Why? because an old and legacy specification defines a “media session” as all the multimedia data carried over a 5-tuple (AKA ICE+DLTS transport in WebRTC) and the same old and legacy specification defines and constraints the scope of codec payload types per media session. Then, since BUNDLE allows multiplexing many media sections into a single transport (or media session), existing RTP (legacy) middlewares would feel confused if a specific PT value does not always reference the same codec+configuration.
Summary of SDP in WebRTC 1.0
The WebRTC 1.0 API provides some entities:
- RTCPeerConnection: It represents almost everything (transport or transports, sending and receiving tracks or data, local and remote SDP representations, etc).
- RTCTransceiver: It references a m= section within the SDP.
- RTCRtpSender: The sending component of a transceiver. It sends a local track.
- RTCRtpReceiver: The receiving component of a transceiver. It receives a remote track.
For the purpose of this post, let’s focus on the following SDP summary:
- An m= section includes ICE and DTLS transport parameters for establishing a transport.
- When in the browser side, an m=audio or m=video section conveys RTP parameters for sending and/or receiving a MediaStreamTrack.
- Classes/entities provided by the WebRTC 1.0 API are strongly tiled to the anatomy and semantics of the SDP
A minimal ORTC overview
After playing for long time with ORTC I just can say that this is the proper way to go. It’s far from perfect, especially when it comes to handling sending/receiving RTP parameters (see this issue). However, unlike the SDP mechanism, it properly decouples the different communication layers into separate models and classes, leading to a much more developer friendly, understandable, and comfortable API.
ORTC suggests a completely different API based on classes that represent different communication layers (see webrtcHack’s many posts on this topic here). Hence, in ORTC we have entities such as RTCIceGatherer, RTCIceTransport and RTCDtlsTransport. And also entities shared with WebRTC such as RTCRtpSender and RTCRtpReceiver (but no RTCTransceiver…).
No SDP
ICE, DTLS and RTP parameters must be exchanged by peers in order to establish the multimedia session. However no specific format (such as SDP) is mandated by the API. It is up to the developer to determine the format and negotiation procedure.
Architecture
ORTC’s layers are:
- ICE: By providing a RTCIceGatherer class (for retrieving our ICE candidates) and a RTCIceTransport class (for establishing a network path with a remote peer/server) we are free to signal ICE related parameters to the remote peer by any means.
- DTLS: The RTCDtlsTransport class is the responsible of establishing a secure media/data channel over the network path provided by the RTCIceTransport. Again, it’s up to the developer to decide how to signal local DTLS parameters to the remote.
- Media: Once we have a secure path (ICE + DTLS) we can create RTCRtpSender and RTCRtpReceiver instances to send and receive media to/from the remote peer, or a RTCDataChannel instance to send/receive data. Yup, parameters associated to these classes can be transmitted to the remote by just sharing a JSON object.
To be perfectly clear, it doesn’t matter much how we signal our RTC parameters to the remote peer. The ORTC API provides the ability for the remote to take these parameters and make the appropriate ORTC API calls with them.
The main advantage of the ORTC design is clear: no excessive renegotiation! RTCRtpSender and RTCRtpReceiver (for sending or receiving a new stream) do not require renegotiating ICE and/or DTLS parameters again when something changes on with the RTP media. Instead, just the new RTP parameters must be transmitted to/from the remote.
So, what’s up with SDP in WebRTC 1.0?
In WebRTC 1.0 we don’t signal RTC parameters to the remote but, instead, we signal a complete SDP blob. A SDP includes all the information regarding ICE, DTLS, RTP, etc.
Basically, peer A generates a SDP offer that includes information about its local ICE parameters, DTLS parameters, sending media information and its wish to receive media. This SDP offer is transmitted to the remote peer which “applies” it as a remote description into its RTCPeerConnection.
Adding new streams
For future media stream additions (such as adding webcam video), the peer generates yet another full SDP offer and transmits it to the remote who, again, “applies” it as a remote description. Well, this is not exactly true. Theoretically there is no real need for transmitting the full SDP again. In fact, the sender may inspect its new local SDP offer and notify the remote about just the addition (video stream SSRC, codec, PT, etc). Anyway, the only way for the remote to instruct its browser about the new receiving media is having or rebuilding a full SDP offer and calling setRemoteDescription() with it.
To be clear: regardless how peers signal RTC parameters one to each other, at the end the communication between the JavaScript and the browser’s WebRTC engine is achieved by passing/retrieving full SDP blobs to/from the RTCPeerConnection. Period. There is no a “real API”, which means that the browser must perform a full “re-inspection” of the new remote SDP to figure out what has changed. This involves:
- Check whether remote ICE parameters have changed.
- Check whether remote DTLS parameters have changed.
- Check whether a new media track have been added.
- Check whether an existing ongoing media track have been modified or stopped.
- Check whether parameters for an ongoing media track have changed.
A real example on why this sucks
It’s clear how problematic this can be. And, if not, I will explain it with a real use case related to the SDP a=setup attribute (which represents who the DTLS client and server roles are):
- Let’s assume that Alice wishes an audio and datachannel communication with Bob so Alice creates her local RTCPeerConnection and gets the corresponding SDP offer.
- As per RFC 5763, the a=setup attribute of the SDP offer must be actpass , which means that the answerer (Bob) will decide who the DTLS client and DTLS server are.
- Bob creates the corresponding SDP answer which includes a=setup:active , meaning that Bob becomes the DTLS client and Alice the DTLS server.
- After ICE and DTLS procedures, both Alice and Bob exchange their audios and custom data (via RTCDataChannel).
- Later, Bob wants to add his webcam video to the communication so he gets a new SDP re-offer including his webcam stream information.
- Again, as per RFC 5673 the SDP re-offer has a=setup:actpass .
- Alice receives the SDP re-offer and creates a re-answer.
- In order to keep the existing DTLS association open, such a SDP re-answer must have a=setup:passive (so Alice remains being the DTLS server role).
Did you notice it? In order to NOT change the transport, both the SDP re-offer and re-answer must indicate values different than those in the initial SDP exchange in the a=setup attribute.
Having to change things to change nothing seems surreal.
That one thing may not be so hard, right? Well, this is the reality:
- “Upon receipt of a SIP re-INVITE, FreeSwitch breaks the previous DTLS association by setting a wrong DTLS role in the SDP answer” (issue report). The issue was fixed in 24/Apr/17 (still hot!).
- “When sending a re-INVITE, Asterisk sets an invalid a=setup:active attribute which, as per spec, must always be actpass” (issue report). Also fixed AFAIR.
- “Firefox switches to DTLS active when generating a re-answer” (issue report). AFAIK not yet fixed.
SDP for Notification?
Let’s re-think about this again:
In all the scenarios above there was zero interest in establishing a new DTLS association. Instead, a full SDP is signaled to the other party (or re-generated by the other party) to just for notification about changes related to RTP streams. Nothing else. However, the “full re-inspection” mandated by the WebRTC “API” makes the SDP receiver re-check all the stuff regarding ICE, DTLS, RTP and so on, leading to issues like the ones referenced above.
Let’s remember that in ORTC there is no chance for these kind of ridiculous issues. This is because, in ORTC, signaling information about a new sending RTP stream does not also involve signaling ICE+DTLS parameters again.
The Full Re-inspection
Already told about this above but, in order to be perfectly specific, this is all the tasks a WebRTC endpoint must perform when calling setRemoteDescription() with a SDP re-offer or re-answer (assuming BUNDLE usage, so all the media streams are carried over a single ICE+DTLS transport):
- Check whether a=ice-ufrag and/or a=ice-pwd have changed. If so, restart ICE.
- Check whether a=fingerprint has changed or the a=setup attribute in the re-answer has a value that breaks the previously negotiated DTLS role. If so, establish a new DTLS association.
- Check whether the direction attribute in a previously existing media section has changed to a=inactive or a=recvonly , or its media port is 0. If so, assume that such a media stream has been stopped (if in Plan-B, also check whether an existing a=ssrc line has been removed).
- Check whether a new m=audio/video/data section has been added (if in Plan-B, also check new a=ssrc lines).
- Check whether SSRC, codecs, payload numbers, RTP header extensions mapping, per-codec RTCP available feedbacks, etc, have changed. If so, be ready to receive the modified stream(s).
This is terribly problematic for WebRTC implementors and also for those developers building RTCPeerConnection shims on top of ORTC or any other media engine (such as a WebRTC media server).
Never disable audio!
OK, so we told above about how BUNDLE groups different media sections in a way that just the ICE/DTLS transport parameters of one of them are used for all the RTP streams and DataChannels in the SDP.
Let’s assume we have a bidirectional and BUNDLE’d audio+video session and that, as usual, the first m= section corresponds to the audio exchange.
Let’s also assume that I want to close/remove/switch-off my microphone. Typically I would get the corresponding RTCRtpSender and call stop() on it (I may also set direction = "recvonly" ) into the audio RTCTransceiver). A new createOffer() call would generate a m=audio section with a=recvonly .
Fortunately, the transceiver (or media section) is still active since we are receiving the audio of the remote peer. Now let’s assume that the remote peer also removes/closes audio input. After a SDP offer/answer re-negotiation, the m=audio section would have a=inactive meaning that, currently, no audio is being sent or received on it.
Firefox behavior on a=inactive
And how does Firefox react on this? When the audio is completely removed, Firefox sets port 0 and a=inactive in its m=audio section. It also removes all the ICE and DTLS params in it, and removes the audio tag from the BUNDLE group. Result? The ICE+DTLS is closed, no matter it was also used to carry video (and/or DataChannel or even another audio tracks).
Here some real SDP taken from Firefox in order to illustrate the issue:
- Firefox generates a SDP offer with both audio (mic) and video (webcam) enabled. The RTCPeerConnection got bundlePolicy: 'max-bundle' so it just sets the ICE stuff into the first media section (the m=audio section):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
v=0 o=mozilla...THIS_IS_SDPARTA-57.0a1 5362413390007385508 1 IN IP4 0.0.0.0 s=- t=0 0 a=sendrecv a=fingerprint:sha-256 CC:11:0E:0D:B2:7D:FC:F9:11:2F:68:74:11:B3:7E:13:C0:C3:05:78:F3:29:10:C6:E9:14:8A:0A:AB:A4:08:6A a=group:BUNDLE sdparta_0 sdparta_1 a=ice-options:trickle a=msid-semantic:WMS * m=audio 58960 UDP/TLS/RTP/SAVPF 109 9 0 8 101 c=IN IP4 1.2.3.4 a=candidate:0 1 UDP 2122252543 192.168.1.34 58960 typ host a=candidate:1 1 TCP 2105524479 192.168.1.34 9 typ host tcptype active a=sendrecv a=end-of-candidates a=extmap:1/sendonly urn:ietf:params:rtp-hdrext:ssrc-audio-level a=fmtp:109 maxplaybackrate=48000;stereo=1;useinbandfec=1 a=fmtp:101 0-15 a=ice-pwd:4650fd5710897bee3bf874465c372e87 a=ice-ufrag:62c997d1 a=mid:sdparta_0 a=msid:{ee311a2b-98e0-dc49-b7ab-5d5058ef8a08} {0aac500e-e6f0-1f4a-88e4-706d215e3bfe} a=rtcp-mux a=rtpmap:109 opus/48000/2 a=rtpmap:9 G722/8000/1 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=setup:actpass a=ssrc:858486657 cname:{18d5ea4e-9fdc-4844-90f6-10eabff59478} m=video 0 UDP/TLS/RTP/SAVPF 120 121 126 97 c=IN IP4 1.2.3.4 a=bundle-only a=sendrecv a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=fmtp:126 profile-level-id=42e01f;level-asymmetry-allowed=1;packetization-mode=1 a=fmtp:97 profile-level-id=42e01f;level-asymmetry-allowed=1 a=fmtp:120 max-fs=12288;max-fr=60 a=fmtp:121 max-fs=12288;max-fr=60 a=ice-pwd:4650fd5710897bee3bf874465c372e87 a=ice-ufrag:62c997d1 a=mid:sdparta_1 a=msid:{ee311a2b-98e0-dc49-b7ab-5d5058ef8a08} {8ba95436-369f-5b48-abe8-7c316b2b706d} a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=rtcp-fb:120 goog-remb a=rtcp-fb:121 nack a=rtcp-fb:121 nack pli a=rtcp-fb:121 ccm fir a=rtcp-fb:121 goog-remb a=rtcp-fb:126 nack a=rtcp-fb:126 nack pli a=rtcp-fb:126 ccm fir a=rtcp-fb:126 goog-remb a=rtcp-fb:97 nack a=rtcp-fb:97 nack pli a=rtcp-fb:97 ccm fir a=rtcp-fb:97 goog-remb a=rtcp-mux a=rtpmap:120 VP8/90000 a=rtpmap:121 VP9/90000 a=rtpmap:126 H264/90000 a=rtpmap:97 H264/90000 a=setup:actpass a=ssrc:2390117206 cname:{18d5ea4e-9fdc-4844-90f6-10eabff59478} |
- The remote provides his SDP answer which does not contain any audio or video streams (so Firefox will send audio and video but won’t receive anything from the remote):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
v=0 o=REMOTE 30596888 2 IN IP4 0.0.0.0 s=- t=0 0 a=sendrecv a=fingerprint:sha-512 E0:71:C6:F8:CD:DA:01:05:9A:9F:B7:E9:BD:BF:DA:5F:80:56:13:F4:48:F1:56:AC:2C:72:E1:54:8C:7C:9A:0C:2B:A1:B3:B3:A2:38:E7:83:87:D8:38:2B:4E:F0:CB:FD:FD:55:77:64:CF:26:45:FC:2C:59:A4:93:B8:3B:64:3B a=group:BUNDLE sdparta_0 sdparta_1 a=ice-lite a=msid-semantic:WMS * m=audio 7 RTP/SAVPF 109 c=IN IP4 127.0.0.1 a=candidate:udpcandidate 1 udp 1082702079 99.99.99.99 42936 typ host a=recvonly a=end-of-candidates a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=fmtp:109 maxplaybackrate=0;stereo=0;useinbandfec=1 a=ice-options:renomination a=ice-pwd:5rc2025mfkzff6cze0xj5ra34pogs7to a=ice-ufrag:wardj9g9mvmeixf6 a=mid:sdparta_0 a=rtcp-mux a=rtpmap:109 opus/48000/2 a=setup:active m=video 7 RTP/SAVPF 120 c=IN IP4 127.0.0.1 a=candidate:udpcandidate 1 udp 1082702079 99.99.99.99 42936 typ host a=recvonly a=end-of-candidates a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=ice-options:renomination a=ice-pwd:5rc2025mfkzff6cze0xj5ra34pogs7to a=ice-ufrag:wardj9g9mvmeixf6 a=mid:sdparta_1 a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=rtcp-fb:120 goog-remb a=rtcp-mux a=rtpmap:120 VP8/90000 a=setup:active |
- Later, the app running in Firefox decides to close the microphone and stop sending it. Basically it does this by calling pc.removeTrack(micRtpSender) and this is how its new local SDP looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
v=0 o=mozilla...THIS_IS_SDPARTA-57.0a1 5362413390007385508 2 IN IP4 0.0.0.0 s=- t=0 0 a=sendrecv a=fingerprint:sha-256 CC:11:0E:0D:B2:7D:FC:F9:11:2F:68:74:11:B3:7E:13:C0:C3:05:78:F3:29:10:C6:E9:14:8A:0A:AB:A4:08:6A a=group:BUNDLE sdparta_1 a=ice-options:trickle a=msid-semantic:WMS * m=audio 58960 UDP/TLS/RTP/SAVPF 0 c=IN IP4 1.2.3.4 a=inactive a=end-of-candidates a=mid:sdparta_0 a=rtpmap:0 PCMU/8000 m=video 58960 UDP/TLS/RTP/SAVPF 120 121 126 97 c=IN IP4 1.2.3.4 a=sendrecv a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=fmtp:126 profile-level-id=42e01f;level-asymmetry-allowed=1;packetization-mode=1 a=fmtp:97 profile-level-id=42e01f;level-asymmetry-allowed=1 a=fmtp:120 max-fs=12288;max-fr=60 a=fmtp:121 max-fs=12288;max-fr=60 a=ice-pwd:4650fd5710897bee3bf874465c372e87 a=ice-ufrag:62c997d1 a=mid:sdparta_1 a=msid:{ee311a2b-98e0-dc49-b7ab-5d5058ef8a08} {8ba95436-369f-5b48-abe8-7c316b2b706d} a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=rtcp-fb:120 goog-remb a=rtcp-fb:121 nack a=rtcp-fb:121 nack pli a=rtcp-fb:121 ccm fir a=rtcp-fb:121 goog-remb a=rtcp-fb:126 nack a=rtcp-fb:126 nack pli a=rtcp-fb:126 ccm fir a=rtcp-fb:126 goog-remb a=rtcp-fb:97 nack a=rtcp-fb:97 nack pli a=rtcp-fb:97 ccm fir a=rtcp-fb:97 goog-remb a=rtcp-mux a=rtpmap:120 VP8/90000 a=rtpmap:121 VP9/90000 a=rtpmap:126 H264/90000 a=rtpmap:97 H264/90000 a=setup:actpass a=ssrc:2390117206 cname:{18d5ea4e-9fdc-4844-90f6-10eabff59478} |
Notice it? There are no a=candidate nor a=ice-ufrag/pwd lines anymore in the first media section, meaning that, by following SDP rules, the browser closes the ICE+DTLS connection so, no matter video was still being sent, the media transmission has been stopped.
References:
- BUNDLE nightmare when first media section becomes inactive
- Firefox PeerConnection stops sending video RTP when the first m= section (audio) becomes a=inactive
- Firefox attempts to establish a new ICE+DTLS transport when moving from 0 local tracks to N
This is not a bug in Firefox but a bug in the specification (in BUNDLE) which fails because it depends on the SDP semantics.
It’s just insane that a ICE+DTLS transport gets closed/restarted just because we want to remove an audio track. Obviously this does not happen in ORTC. But, in WebRTC, we cannot just signal “audio track closed” but, instead, we signal a complete SDP. And the SDP includes information about all the layers and streams. And worse: each media section includes information about the transport. Then, if we remove the first media section, BUNDLE gets crazy and, depending on the browser, the transport is closed.
Can this be fixed? Yes, of course, more hacks can be added into the SDP to avoid this issue. And the most brilliant minds in the WebRTC ecosystem can spend valuable time debating the best way to standardize these hacks (just check the hyper long “BUNDLE nightmare when first media section becomes inactive” thread above).
When SDP was about simple bidirectional audio it was just fine. But nowadays SDP is unsustainable.
Really!?!
This is insane. We don’t need it. We don’t need this kind of problems originated by the usage of a legacy media description format (the SDP). Even more, we don’t deserve a JavaScript API that conforms to the anatomy of the SDP (hello RTCTransceiver). We just need an API to send and an API to receive (along with the ability to establish a connection or transport), nothing else.
{“author”: “Iñaki Baz Castillo“}
Eugene says
I feel the pain behind every line here, as i have already traped almost all issues mentioned here. Thanks for the great article!
Andrew says
Firefox has been pretty unpredictable when it comes to processing streams with no audio. Anyway thanks for the round-up!