Looking over the past few years of WebRTC growth, and the landscape of emerging WebRTC solutions, we see quite a number of WebRTC-centric JavaScript (JS) libraries on the scene. Indeed, not long after browser vendors began shipping WebRTC implementations, a bouquet of WebRTC signaling libraries bloomed.
Each delivers a JavaScript API surface in the browser, which provides signaling services that complement the WebRTC media services. This abundance of options begs a few questions:
- What is the real purpose of the JS library?
- Do I need one for WebRTC?
- What should I consider when evaluating this component of a WebRTC solution?
- What makes a good JS library, what makes bad one?
These are the nagging questions that I wrestled in my own early research into WebRTC. At this point, I have certainly not personally used every WebRTC JS library out there, but have certainly tried quite a few. In this post I’ll attempt to address these questions, and my own thoughts on what is in a WebRTC JS library.
A bit of history…
The very first time I was introduced to WebRTC, working groups in the IETF and W3C were newly formed, and only a few drafts were in place. In those days you got looks of confusion if you mentioned the term “WebRTC”, even among many CTOs, architects, and technology strategists. In my first exploration into the discussions and documents relating to WebRTC, an early impression stuck out to me: the realization that WebRTC’s power was going to be wielded by developers. Not vendor pushers trying to innovate through several layers of project management, and not telco spec jockeys that can quote endless architecture specifications, but Web developers. This is because WebRTC isn’t placed into the hands of the usual folks that speak SIP, or IMS, or VoIP of any kind, for that matter. WebRTC is given to those that speak JavaScript. After all, JavaScript provides the interface to the WebRTC media engine in the browser, and for a session to be established, JavaScript would need to deliver some form of signaling as well. It was easy to see that for communications to move at the speed of web, it would be driven by folks that understand and appreciate JavaScript, and how it works on the web.
an early impression stuck out to me: the realization that WebRTC’s power was going to be wielded by developers
As we today look at the maturing state of the JavaScript interface to WebRTC media in browsers, the media API is relatively straightforward and consistent (ok, some variations sill between browsers). However, an interface to a signaling mechanism for WebRTC sessions consists of a tremendous number of options, with no clear standard for signaling. For someone coming up in traditional communications, the notion of no defined signaling standard can be a bit off putting at first. However, if one considers the the many different aspects of signaling in traditional communications, you can start to see that providing flexibility because there no standard can be a good thing. For example, the high level goals of a traditional communications signaling network (e.g. using SIP or SS7) has been to provide services like directory and routing, establish namespace (E.164) and identity (caller, callee), and facilitate information exchange between endpoints for call setup. Considering these signaling characteristics, and what commonly happens all the time in the modern Web, you can quickly see that almost any website you can think of (Twitter, Facebook, etc.) can meet basic signaling requirements. You could say the web had an already established, full scale signaling plane, that was just waiting to be used for WebRTC. As WebRTC has been developed, special care was given to make sure that it was completely up to the JavaScript developer to choose how to signal, so that it could be used with any existing or new services. Because the mechanisms for signaling can vary widely depending on the back-end service being used, the WebRTC JS library helps out by providing an abstraction from the specifics of a given service, delivering simple and familiar JavaScript methods instead.
You could say that the web had an already established, full scale signaling plane, that was just waiting to be used for WebRTC
A quick primer
Before we jump into some of the options that are out there for WebRTC JS libraries, let’s have a quick Web primer for the uninitiated. First off, a given HTML5 web app typically consists of 3 basic components: HTML, CSS, and of course JavaScript. When you browse to a URL, your browser sucks up all the HTML, CSS and JavaScript files associated with the page, and renders it. HTML and CSS are used to present the structure and style that is shown to the user (i.e. the UI). The JavaScript is loaded into a sandbox environment, or virtual machine, and gives the page it’s brains.
The HTML and CSS for the page are represented inside the JavaScript sandbox, as what is called a Document Object Model, or DOM. By interacting with the DOM, JavaScript has full control over the HTML and CSS in the page. It can modify the DOM to manipulate HTML or CSS to create dynamic changes in the UI, or do things like detect mouse movements and clicks to perform functions, and even add or remove elements seen on the page. Complementing the DOM inside the JavaScript sandbox, are a host of other built-in Web objects as well. This includes things like Web Storage, which provides access to local system disk for storing data, or Web Workers which allow the app to spin off additional processing threads on the local system. It includes objects for transporting data in and out of the browser, namely HTTP and WebSockets. And of course, it is in this sandbox where objects for the local camera/microphone, and the WebRTC peerConnection live. How exactly all these things work together, is determined by the JavaScript(s) that get loaded with the page.
JavaScript libraries have long been a part of the Web, and they provides an effective way for development of complex web apps to scale. JavaScript tasks can be broken up into separate files or libraries, and it is common for a page to download many different actual JavaScript files that all get compiled together to run the app. The web developer’s main JavaScript code is able to call on the functions, or methods provided by all the other files in the page, providing the interaction between the different chunks of JavaScript. This is essentially how a JavaScript API is created. A JavaScript API provided by a library is simply interactions between chunks of JavaScript code that is loaded in the same sandbox. jQuery is a good example of one of the more popular libraries on the Web, whose API provides a number of functions for handling manipulation of the DOM in nifty ways, for doing things like animations or dynamic page updates.
The options
Now, getting back to the topic of WebRTC JS libraries, what they are, and what they do. Just as with any other JS library, the goal of the WebRTC library is provide a simple set of functions through it’s API, that make it easy for a JavaScript developer to integrate WebRTC into the page. The functions of most WebRTC JS libraries typically abstract more complex operations associated with signaling using a particular service, and it will commonly handle the browser WebRTC media engine APIs (getUserMedia, peerConnection) as well. Whether it is a WebRTC Gateway, an application server, or a cloud based WebRTC PaaS, the WebRTC JS libraries have become a typical component of many solutions for WebRTC. This is not a definitive listing, but a look at some of what is out there:
Listing shows WebRTC JS libraries with public documentation. Check out our Tool Vendor Directory for a more complete list of WebRTC tools.
So many choices!
Because the interface to the WebRTC media engine exists inside the browser’s JavaScript sandbox, clearly it is important that those wanting to offer services or signaling for WebRTC should meet the Web Developer in that sandbox as well. But how can we evaluate this critical component of a WebRTC solution? Let’s break it down into few areas that would be important to examine.
API Mechanics
Clearly the specifics of the functions and methods provided by the library are important. Given the plethora of WebRTC JS libraries, it might also seem that the APIs they offer would be very diverse as well. However, they might be more similar than you think. There are actually a number of functions that are common to most of them. Here are some mechanics that are typical for many of the WebRTC JavaScript libraries:
API Mechanic | Description |
---|---|
Initialization of the library |
Using most WebRTC JS libraries begins with a kind of bootstrapping process to instantiate it. This commonly involves invoking some form of a constructor, to create an instance of the JavaScript object which represents the services provided by the library. During this initialization, most libraries require some kind of basic configuration information to be passed to the JS library. For example, end user information is typically given to the library so it can attach an identity to any WebRTC sessions it might set up. This end user information could be provided by user input, or by information stored in the web server and sent in the initial page download. |
Registration |
After the library is initialized (and sometime in conjunction with initialization), it is common for some sort of registration function to take place. This can serve to nail up a connection to a signaling server somewhere, and let a service know that the user is ready to receive inbound signaling messages. It is common that registration also involve some security procedures to provide authentication of the user and/or the app as it connects back to the network (see security notes below). |
Create and Manage WebRTC Sessions |
Almost all the JavaScript libraries provide methods to create and manage WebRTC sessions. In most cases, the WebRTC JS library will handle getting access to the camera and mic (getUserMedia) and setting up the peerConnection, keeping the method very simple and straight forward. While the semantics of the method call may vary slightly from library to library, they tend to be very similar in practice. Typically when this method call is made, information is passed to the library to indicate who the call is going to (called party), and some event callbacks (more on Event Callbacks below) are passed as well. This method may also require the exchange of some information about the voice/video media streams which are about to be established, so they can be attached to the proper audio/video elements in the DOM. After the session is established, there is a usual set of additional methods provided to modify the session, or end it. |
Event Callbacks |
A variety of event callbacks used get passed to WebRTC JS libraries. These let the developer instruct the JavaScript library how it should notify, or “call back” to the developer’s JavaScript, in the event of something happening asynchronously within the internals of the library. Common events tend to be notification regarding the success or failure of a particular request (i.e. a call, or a registration), or that there is an inbound call or message received by the library. When one of these events occurs, the WebRTC JS library calls whatever function it was given by the developer, when the event callback was provisioned. WebRTC JS libraries that offer a rich set of callbacks expose better information to the developer’s app, regarding the status of a WebRTC session, allowing the app to notify the user or log what is happening. |
Beyond these common API methods, many libraries provide some more advanced methods, which may be specific to the services they represent. For example, a library associated with a conferencing service might provide additional methods around conference controls, while a telephony service might provide additional methods for SMS or RCS, and finally a collaboration service might offer additional methods for screen sharing and file transfer. True differentiation likely lies in the more advanced service-specific methods offered by a WebRTC JS library, since the high level API mechanics for initialization, registration, making/receiving calls, and event callbacks are very similar across most of the libraries. If there was any concern that the variance in these APIs was too great, and you wanted to keep your code agnostic to the WebRTC JS library underneath, there are even industry efforts and vendors that offer to help with that (for example ORCA.js ) by maintaining a bit of JavaScript wrapper around the WebRTC JS library API.
Ease of use
While getting up on stage at a trade show, and proclaiming how, “with our solution, I can make this call in just X lines of code!” can be a neat trick…in practice it always takes a bit more to make a decent WebRTC experience. For those that need convincing, I suppose this trick can be useful as a demonstration of the fact that web development can be fast and easy, but I wouldn’t take it much farther than that. I don’t mean to dispute the fact that JavaScript libraries are easy to use, because indeed they are. I just wouldn’t go so far as to make an ease-of-use judgement call, based on the fact that because someone’s library can make a call in a fewer lines of code, it must be so much easier than all the rest. In reality, I have found almost all the JavaScript libraries that I have worked with are sufficiently easy to use. While there may be significant complexities hidden beneath many of the libraries, the ones I’ve used do a good job of hiding this from the web developer. The WebRTC JS libraries I’ve played with were using a wide range of mechanisms under the hood, and my experience has been that I haven’t really needed to know a thing about underlying signaling transport (HTTP, Comet, Bosh, WebSockets), signaling protocol (XMPP, REST, JSON, SIP, other), or internals of the library state machine. Just basic JavaScript web developer skills required. In fact, I’ve found them easy enough to use that I’ve switched from one to another with minimal effort, on multiple occasions (especially easy when it comes to the common mechanics discussed above). Also, the wrapper libraries (mentioned in API Mechanics) offered by vendors can make a swap-out even easier.
It is certainly very important that a JavaScript library be easy to use. After all that is one of the whole points of using JavaScript libraries. However, because most libraries only require the developer to understand the bare minimum about the underlying WebRTC and signaling, I can’t say that I’ve found ones that are so much easier than the rest. While I haven’t come across it yet, I would certainly be on the lookout for a WebRTC JS library that is clunky and not easy to use. If you aren’t sure, just ask your friendly neighborhood web developer, and they should be able to provide quick feedback on this.
Extensibility and exposure
While ease of use is nice, it is also important that the library not sacrifice information and functionality for the sake making it more simple. As mentioned previously, the “call in X lines of code” is nice to get started, but inevitably you will find yourself needing much more code for greater functionality.
It can be nice when JavaScript libraries do a good job of being simple when necessary, but also extensible for advanced use cases. I find this especially true for those libraries that take over complete management of the WebRTC media APIs (getUserMedia and peerConnection), along with the signaling. An example use case that I have routinely been frustrated by, is one where my app captures the device camera and microphone streams prior to setting up a session. A web developer may want to do this so they can let a user check that everything looks and sounds good before initiating the call. Because many of the libraries completely abstract the getUserMedia API beneath their code, there is no clear way to pass already obtained streams to the JavaScript library, once the user is ready to initiate the call. This results in annoying, and unnecessary extra prompts to the user to gain camera and mic access.
In addition to a WebRTC JavaScript library being extensible for advanced use cases, they should not skimp on the information provided back to the Web developer. This can be in the form of a good set of event callbacks, or exposure of WebRTC related variables and state information. Again, because many of these libraries take over handling the WebRTC media engine APIs, they should have access to all sorts of session related stats and information (especially via the WebRTC Stats API). Ideally a WebRTC JavaScript library would make this info conveniently available to the web developer, should they want to use it.
Under the hood
While the various WebRTC JS libraries converge on the novice JavaScript developer with their generally simple and similar APIs, as we look under the hood we start to see them diverge considerably. Lets begin with the signaling channel used by the WebRTC JS library for transporting messages to and from the browser. As we know, WebRTC requires the exchange of SDP to establish the peerConnection. Because a request for a new peerConnection, or a modification to an established peerConnection could happen at any time, a long-lived connection for asynchronously transporting signaling messages between browser and back-end servers is typically required. Certainly WebSockets is a modern approach, but it is not the only one. The Web has been doing this long before WebSockets. The other tool available for getting data in and out of the browser is HTTP. This tool is commonly used for browser access to REST based services. The problem for HTTP, and especially REST, that they are inherently stateless, with strict client-server roles (i.e. requests always come from the browser, responses always come from the server). For a good WebRTC signaling channel, a state-ful connection is needed, where request messages can be pushed at any time from the server. For this, a number HTTP-based techniques, such as Long Polling, Comet, BOSH, and others, provide persistent connections for server push of asynchronous events. They often get used in conjunction with REST to provide a robust, bi-directional signaling channel. From WebSockets, to the various HTTP mechanisms, you will find a many of these different techniques used behind the scenes of the various WebRTC JavaScript libraries, depending on the back-end infrastructure they are connecting to.
In addition the varying signaling channel transport mechanisms, the WebRTC JavaScript libraries differ significantly in the message formats and protocols they use over the channel. For example XMPP, REST, XML, JSON, and even SIP are all used among the different WebRTC JavaScript libraries. Asking which one is “best” can evoke some flame wars, as strong preferences around signaling protocol exist.
Personally, I’ve come around (the hard way) to a fairly pragmatic stance on this point, and I hesitate from declaring “it has to be” one approach for signaling protocol, or another. Let me explain: Initially, I must admit I used to obsess about what protocol the library was doing under the hood. In fact, full disclosure: when I first started looking into JavaScript WebRTC signaling, I thought SIP over WebSockets was a bad idea. My rational was that browsers have high performance JSON parsers already built in, making JSON the de-facto preferred data format in JavaScript. Why use anything else? Why do all the extra parsing work to serialize and de-serialize JavaScript objects into strings formatted as SIP messages? I used to argued that a browser shouldn’t have to bother with processing information that is meaningless in a browser context. For example, storing and processing Via and Contact header information, when it can’t possibly contact a UDP/TCP SIP URIs given in those SIP headers. I was convinced that all the baggage of SIP in the browser would expose itself somehow in it’s usability.
However, after I did a few projects using libraries like JsSIP, and SIPML5, I started to have a change of heart. I found that in using the library, I was able to accomplish the goals of my projects, and the JavaScript interface to the library could be just as simple and powerful as all the others. I couldn’t find the fatal flaw I was expecting, and it just worked. In all this I started to care less and less about what was happening on the wire between the library and the service, and more about what features of the service I could access through the API. Perhaps some of this protocol bias comes from traditional communications world, where it is common to specify every aspect of an interface between two things on the network. In the case of a web JS library that is part of a client API to a service, the WebRTC JS library is really an extension of the service, and the interface to it is the JavaScript API. In other words, it is a client API, not a network API. When solutions provide a WebRTC JS library to go with their service, application, or gateway, what happens under the hood is indeed important. But perhaps only to the extent that it performs well, delivers the functionality you need, and hides the protocol complexity behind an elegant JavaScript API.
…what happens under the hood is indeed important…to the extent that it performs well, delivers the functionality you need, and hides the protocol complexity behind an elegant JavaScript API.
So, that means the important part is the actual functionality and capabilities delivered by a service, via the WebRTC JS library. Beyond the basic common methods, naturally much of this depends on the infrastructure and use case. Ranging from general telephony, conferencing, collaboration, IMS/RCS, customer relationship management, or social networking, the requirements of the web application can be quite different. Sadly we can’t possibly get into a complete analysis of all use case requirements, cross referenced with the features and functionality of the services that are represented by all the different WebRTC JS libraries. However, some questions to ask:
- Do I need access to a communications network (e.g. IMS, PSTN)?
- Is there an existing web, or communication architecture I need to integrate with?
- Can I plug into an external directory, or use my own?
- What are my security requirements (see note on security)?
- Do I have specific user experience requirements, like multi-party conferencing, co-browsing, file sharing, or screen share?
- Any requirements for Data Channels?
The WebRTC JS library’s critical role in each requirement is delivering an elegant JavaScript interface to a web client developer, providing value by managing underlying complexity associated with WebRTC media and signaling services. As you start to define what you need and want from WebRTC in your app or service, your requirements should help you make sure that the JavaScript library component of a solution has what you need under the hood.
A Quick Note on Security
I know, I know…I just got done saying don’t sweat what’s under the hood of a WebRTC JS library. However, something to be aware of when it comes to WebRTC JS libraries, is your user security requirements, and how the library handles them. For some applications that need to authenticate users, a basic username and password authentication works fine. Many of the WebRTC JS libraries can use end user provided user/password information (i.e. from a web form), and pass them to their associated service for verification. However, in some cases this may not be ideal. For example, your app may use a 3rd party identity provider for login, and you want to pass an access or identity token to the library instead, and have the service verify it. Or perhaps you are using shared communications infrastructure, and prefer that credentials in your comms domain not be exposed to web end users. Maybe you need auth information securely stored on a server, and only a hashed or encrypted form of it exist in the browser (Remember that all JavaScript is inherently “open” and exposed to anyone that downloads the app). OAuth 2.0 has become a popular mechanism for user authentication and authorization on the web, with support for a variety of advanced web signaling flows. It eschews sending usernames and passwords around, opting instead for shorter lived tokens. In any case, it is important to make sure that your WebRTC JS library (and underlying service) not inhibit the security mechanisms you require. The ability to fit into advanced web authentication schemes can be a clear differentiation among the options.
Documentation
What else is important in a JavaScript library? Documentation, documentation, documentation. A great WebRTC JS API that has poor documentation can cause more frustration, and cost more time than it is worth. Documentation should provide thorough information about all the properties and methods that it exposes, along with a generous helping of code samples. Complete working demos can be very useful as well. Easy navigation, and a clean and engaging interface to the docs is a bonus. Great documentation will allow your web developers to be up and running in no time, and can help you better evaluate what you might (or might not) be able to do with it. Twilio, Phono, Tokbox, and JsSIP come to mind as some of my personal favorites.
So, What’s in a WebRTC JS Library?
With no defined signaling protocol for WebRTC, JavaScript libraries that handle the browser media engine, and offer signaling services are here to stay. Do you think a fragmented landscape of WebRTC JS libraries is a good thing, or a bad thing? While some high level functions are similar among them, good ones are simple and yet extensible, provide access to plenty of information, and are well documented. Most importantly, the best one for you is flexible enough to meet your security, architecture, and functional requirements (no matter what protocol it uses). Let us know what is important to you in a WebRTC JS library!
{“author”:”Reid Stidolph“}
Doug Pelton says
Thanks for including EasyRTC in this fabulous post. I really like the Easy St and Functional Way picture.
I think two key considerations you could add in choosing an API is whether you need to control your own WebRTC signalling server on premise or in the cloud or use a PaaS,
AND whether the API and/or signalling server are open source or not.
Reid Stidolph says
Thanks Doug! You make a great point about deployment options and open source. Especially when it comes to the JS library, having it open source is definitely nice so you can contribute features, or make your own customization if needed.
Lorenzo Miniero says
Thanks for the interesting read, a really nice insight on what a library needs to provide.
I’ve recently worked on such an effort myself, writing a JavaScript library for our open source Janus WebRTC gateway (which does not appear on the list but I won’t hold a grudge! 🙂 ) and can confirm that I asked myself the same questions in the process, in terms of what to provide and how I mean.
One thing that had me thinking in particular was about the high level vs. low level approach for the library: this is a challenging question in general, as for instance it’s something that was part of the standardization process as well some time ago, which eventually led to JSEP. Eventually, considering the general purpose of the gateway, I chose a slightly lower level approach that does make things easier than using the WebRTC API, but also keeps some flexibility. After all, you can always tailor a higher level API on a lower level for some specific scenario, which is what I think I’ll try and face in the future.
Reid Stidolph says
Hi Lorenzo, thanks for the comments. Also, thanks for bringing Janus to my attention…I added Janus to the listing above.
You have a good memory on the early WebRTC API discussions, and what led to JSEP. I think developers will ultimately appreciate more, not less when it comes to functionality…but having an easy way to quickstart is always appreciated and can only help adoption. As you say, it’s good to start with all the power and functionality in mind, and then make it simpler as you need to.
I’ve been using AngularJS a lot lately, and I must say that there is just no easy way (that I’ve found) to quickstart the concepts of that API. I nearly gave up on it several times while climbing that learning curve, but In the end, I’m glad for all the super powers that you get with it.
Mark says
Thank you for this superlative article. I am particular happy about that brief mention of AngularJs and steep learning curve, which leads to my question:
Have you implemented any of this with Angular? Which will be your personal choice with Angular?
Thank you!
Reid Stidolph says
Hey Mark, thanks for the comments. Personally I am a big fan of Angular, but for me the learning curve was steep. I found many of the concepts, conventions, and lingo can be quite abstract at times (directives, data bindings, and factories, oh my 🙂 !), and I really had to commit myself to learning them. That said, once you get the hang of it, it really does give you super powers, and it is amazing how fast you can create awesome, dynamic web apps.
As for implementing Angular with WebRTC (and the WebRTC JS libraries), yes I use it all the time. Angular goes really great with WebRTC, because WebRTC implicitly requires you to maintain a somewhat complex signaling state machine in the browser, and at the same time, manage the browser’s media engine state through the WebRTC API. Both of these result in a lot of asynchronous activity that has the get handled via lots of events. Of course, to make your web app awesome, you need to handle many of the async changes in state, and use them to update information on your HTML (e.g. on incoming call, activate a div containing a call notification and start playing a ringing audio file).
Handling this sort of dynamic state is where Angular really shines. My basic approach has been to gather all the different handlers and methods into a single object, and I define a set of very basic properties that become my Angular data model (e.g. primitives for signaling channel state, incoming call state, and so on). Once you have your data model set up and working, it is a snap to use the Angular data-bindings and directives in your HTML to automatically trigger HTML5 awesomeness whenever your state changes!
Anyways…hope this helps!
venorin says
I am new to webrtc and your articles have been very informative. Thanks.
Please i would like some clarification.
Since there is no defined signalling protocol, which part of WebRTC makes the Signalling Data consistent across all Signalling Protocols ¬ and Why and how does it make signalling data consistent?
Chad Hart says
It’s called JSEP and it can be a little complex: https://tools.ietf.org/html/draft-ietf-rtcweb-jsep-11
I recommend looking at one of the many resources that walk through establishing and controlling a peerConnection.
Sayali says
I had a question, Is peerJS a signalling server that we use with WebRTC?