An analysis of the most popular open-source WebRTC repos on GitHub with a review of how WebRTC itself is doing there.
webrtcH4cKS: ~ Part 2: Building a AIY Vision Kit Web Server with UV4L
In part 1 of this set, I showed how one can use UV4L with the AIY Vision Kit send the camera stream and any of the default annotations to any point on the Web with WebRTC. In this post I will build on this by showing how to send image inference data over a WebRTC dataChannel and render annotations in the browser. To do this we will use a basic Python server, tweak some of the Vision Kit samples, and leverage the dataChannel features of UV4L.
To fully follow along you will need to have a Vision Kit and should have completed all the instructions in part 1. If you don’t have a Vision Kit, you still may get some value out of seeing how UV4L’s dataChannels can be used for easily sending data from a Raspberry Pi to your browser application.
A couple years ago I did a TADHack where I envisioned a cheap, low-powered camera that could run complex computer vision and stream remotely when needed. After considering what it would take to build something like this myself, I waited patiently for this tech to come. Today with Google’s new AIY Vision kit, we are pretty much there.
The AIY Vision Kit is a $45 add-on board that attaches to a Raspberry Pi Zero with a Pi 2 camera. The board includes a Vision Processing Unit (VPU) chip that runs Tensor Flow image processing graphs super efficiently. The kit comes with a bunch of examples out of the box, but to actually see what the camera see’s you need to plug the HDMI into a monitor. That’s not very useful when you want to put your battery powered kit in a remote location. And while it is nice that the rig does not require any Internet connectivity, that misses out on a lot of the fun applications. So, let’s add some WebRTC to the AIY Vision Kit to let it stream over the web.
webrtcH4cKS: ~ Computer Vision on the Web with WebRTC and TensorFlow
TensorFlow is one of the most popular Machine Learning frameworks out there – probably THE most popular one. One of the great things about TensorFlow is that many libraries are actively maintained and updated. One of my favorites is the TensorFlow Object Detection API. The Tensorflow Object Detection API classifies and provides the location of multiple objects in an image. It comes pre-trained on nearly 1000 object classes with a wide variety of pre-trained models that let you trade off speed vs. accuracy.
Decoding video when there is packet loss is not an easy task. Recent Chrome versions have been plagued by video corruption issues related to a new video jitter buffer introduced in Chrome 58. These issues are hard to debug since they occur only when certain packets are lost. To combat these issues, webrtc.org has a pretty powerful tool to reproduce and analyze them called video_replay. When I saw another video corruption issue filed by Stian Selnes I told him about that tool. With an easy reproduction of the stream, the WebRTC video team at Google made short work of the bug. Unfortunately this process is not too well documented, so we asked Stian to walk us through the process of capturing the necessary data and using the video_replay tool. Stian, who works at Pexip, has been dealing with real-time communication for more than 10 years. He has experience in large parts of the media stack with a special interest in video codecs and other types of signal processing, network protocols and error resilience.
Long have WebRTC developers waited for the day Apple would come around to WebRTC. It has not been simple for web developers and Apple due to their policy that requires web browsing functionality to use the WebKit engine along with Safari. This meant no WebRTC in Safari; no Firefox or Chrome WebRTC on iOS, no native WebView with WebRTC or iOS API’s (but plenty of 3rd party ones). Despite community efforts and active development inside the WebKit project, it was not entirely clear when there would be at launch. That changed earlier this month when Apple announced a WebRTC-enabled WebKit based on the Google-backed webrtc.org engine was coming to both High Sierra – the next version of OSX – and iOS 11. Even better, WebRTC is available today as part of the free Safari Technology Preview.
Media servers, server-side media handling devices, continue to be a popular topic of discussion in WebRTC. One reason for this because they are the most complex elements in a VoIP architecture and that lends itself to differing approaches and misunderstandings. Putting WebRTC media servers in the cloud and reliably scaling them is even harder. Fortunately there are several community experts with deep expertise in this domain to help. One of those experts who has always been happy to share his learnings is past webrtcHacks guest author Luis López Fernández.
webrtcH4cKS: ~ Let’s Encrypt – how get to free SSL for WebRTC
Way back in 47 (version that is), Chrome started to mandate the use of HTTPS in conjunction with getUserMedia. To use HTTPS you need a SSL/TLS certificate. Xander Dumaine covered this a bit for us before, but I still see a lot of people out there struggle with it. As it so happens, the certificate for my own personal website is about to expire and I’m not too excited about paying $70/year to renew it. Fortunately, there is a new way to get certificates for free through Let’s Encrypt. Let’s Encrypt is a non-profit certificate authority that formed with the backing of many major industry players like Mozilla, Akamai, Cisco, and many others to simplify and automate the process of setting up encryption for your website. Oh, and its completely free.
webrtcH4cKS: ~ Optimizing video quality using Simulcast (Oscar Divorra)
Dealing with multi-party video infrastructure can be pretty daunting. The good news is the technology, products, and standards to enable economical multiparty video in WebRTC has matured quite a bit in the past few years. One of the key underlying technologies enabling some of this change is called simulcast. Simulcast has been an occasional sub-topic here at webrtcHacks in the past and it is time we gave it more dedicated attention.
To do that we asked Oscar Divorra Escoda, Tokbox’s Senior Media Scientist and Media Cloud Engineering Lead to walk us through it. Tokbox was one of the first to market with a SFU and Oscar shares some of his learnings below.
Conference calling is a multi-billion dollar industry that is mostly powered by expensive, high-powered conferencing servers. Now you can replicate much of this functionality for free with a modern browser using the combination of WebRTC and WebAudio.
Like with video, multi-party audio can utilize a few architectures:
- Full mesh – each client sends their audio to every other client; the individual streams are then combined locally before they come out of your speaker
- Mixed with a conferencing server acting as a Multipoint Control Unit (MCU) – the MCU combines each stream and sends a single set to each client
- Routed with a conferencing server in a Selective Forwarding Unit (SFU) mode – each client sends a single stream to the server where it is replicated and sent to the others
This architecture represents a fourth type: client-mixed type where one of the clients acts like the server. This provides the server-less benefits of mesh conferencing without the excessive bandwidth usage and stream management challenges.