WebRTC has made getting and sending real time video streams (mostly) easy. The next step is doing something with them, and machine learning lets us have some fun with those streams. Last month I showed how to run Computer Vision (CV) locally in the browser. As I mentioned there, local is nice, but sometimes more performance is needed so you need to run your Machine Learning inference on a remote server. In this post I’ll review how to run OpenCV models server-side with hardware acceleration on Intel chipsets using Intel’s open source Open WebRTC Toolkit (OWT).
webrtcH4cKS: ~ Stop touching your face using a browser and TensorFlow.js
Don’t touch your face! To prevent the spread of disease, health bodies recommend not touching your face with unwashed hands. This is easier said than done if you are sitting in front of a computer for hours. I wondered, is this a problem that can be solved with a browser?
We have a number of computer vision + WebRTC experiments here. Experimenting with running computer vision locally in the browser using TensorFlow.js has been on my bucket list and this seemed like a good opportunity. A quick search revealed somebody already thought of this 2 week ago. That site used a model that requires some user training – which is interesting but can make it flaky. It also wasn’t open source for others to expand on, so I did some social distancing via coding isolation over the weekend to see what was possible.
webrtcH4cKS: ~ Perfect Negotiation
Series preface: We generally lean toward long posts here at webrtcHacks, but not all interesting topics warrant a lot of new text. Sometimes briefer is better. So to better address the many topics that fit into this category, we are starting a new Minimum Duration series. Here is our first post under this set covering Perfect Negotiation.
What is Perfect Negotiation and why do we need it?
Long ago the WebRTC specification designers settled on leaving the signaling communication mechanism between two WebRTC peers up to the application. This means your code needs to handle passing Session Description Protocol (SDP) back and forth and giving that to the peerConnection API. Today WebRTC implementations also almost universally use Trickle-ICE, a form of Interactive Connectivity Establishment (ICE), which passes potential network paths between those peers asynchronously so a connection can be established as soon as possible. The asynchronous but time sensitive nature of all this means it is possible for glare conditions to occur – situations where both sides are making updates at the same time causing their state machines to get out of sync. Differences in how developers implement their code and browsers variances make this worse.
webrtcH4cKS: ~ Does Chromium-based Edge’s WebRTC Look Like Chrome?
WebRTC has a new browser – kind of. Yesterday Microsoft’s “new” Edge browser based on Chromium – commonly referred to Edgium – went GA. This certainly will make life easier for WebRTC developers since the previous Edge had many differences from other implementations. The big question is how different is Edgium from Chrome for WebRTC usage?
The short answer is there is no real difference, but you can read below for background details on the tests I ran. If you’re new around WebRTC the rundown may give you some ideas for testing your own product.
webrtcH4cKS: ~ and the WebRTC Open Source Popularity Contest Winner is…
webrtcH4cKS: ~ Part 2: Building a AIY Vision Kit Web Server with UV4L
In part 1 of this set, I showed how one can use UV4L with the AIY Vision Kit send the camera stream and any of the default annotations to any point on the Web with WebRTC. In this post I will build on this by showing how to send image inference data over a WebRTC dataChannel and render annotations in the browser. To do this we will use a basic Python server, tweak some of the Vision Kit samples, and leverage the dataChannel features of UV4L.
To fully follow along you will need to have a Vision Kit and should have completed all the instructions in part 1. If you don’t have a Vision Kit, you still may get some value out of seeing how UV4L’s dataChannels can be used for easily sending data from a Raspberry Pi to your browser application.
A couple years ago I did a TADHack where I envisioned a cheap, low-powered camera that could run complex computer vision and stream remotely when needed. After considering what it would take to build something like this myself, I waited patiently for this tech to come. Today with Google’s new AIY Vision kit, we are pretty much there.
The AIY Vision Kit is a $45 add-on board that attaches to a Raspberry Pi Zero with a Pi 2 camera. The board includes a Vision Processing Unit (VPU) chip that runs Tensor Flow image processing graphs super efficiently. The kit comes with a bunch of examples out of the box, but to actually see what the camera see’s you need to plug the HDMI into a monitor. That’s not very useful when you want to put your battery powered kit in a remote location. And while it is nice that the rig does not require any Internet connectivity, that misses out on a lot of the fun applications. So, let’s add some WebRTC to the AIY Vision Kit to let it stream over the web.
webrtcH4cKS: ~ Computer Vision on the Web with WebRTC and TensorFlow
TensorFlow is one of the most popular Machine Learning frameworks out there – probably THE most popular one. One of the great things about TensorFlow is that many libraries are actively maintained and updated. One of my favorites is the TensorFlow Object Detection API. The Tensorflow Object Detection API classifies and provides the location of multiple objects in an image. It comes pre-trained on nearly 1000 object classes with a wide variety of pre-trained models that let you trade off speed vs. accuracy.
Decoding video when there is packet loss is not an easy task. Recent Chrome versions have been plagued by video corruption issues related to a new video jitter buffer introduced in Chrome 58. These issues are hard to debug since they occur only when certain packets are lost. To combat these issues, webrtc.org has a pretty powerful tool to reproduce and analyze them called video_replay. When I saw another video corruption issue filed by Stian Selnes I told him about that tool. With an easy reproduction of the stream, the WebRTC video team at Google made short work of the bug. Unfortunately this process is not too well documented, so we asked Stian to walk us through the process of capturing the necessary data and using the video_replay tool. Stian, who works at Pexip, has been dealing with real-time communication for more than 10 years. He has experience in large parts of the media stack with a special interest in video codecs and other types of signal processing, network protocols and error resilience.
Long have WebRTC developers waited for the day Apple would come around to WebRTC. It has not been simple for web developers and Apple due to their policy that requires web browsing functionality to use the WebKit engine along with Safari. This meant no WebRTC in Safari; no Firefox or Chrome WebRTC on iOS, no native WebView with WebRTC or iOS API’s (but plenty of 3rd party ones). Despite community efforts and active development inside the WebKit project, it was not entirely clear when there would be at launch. That changed earlier this month when Apple announced a WebRTC-enabled WebKit based on the Google-backed webrtc.org engine was coming to both High Sierra – the next version of OSX – and iOS 11. Even better, WebRTC is available today as part of the free Safari Technology Preview.