WebRTC has made getting and sending real time video streams (mostly) easy. The next step is doing something with them, and machine learning lets us have some fun with those streams. Last month I showed how to run Computer Vision (CV) locally in the browser. As I mentioned there, local is nice, but sometimes more performance is needed so you need to run your Machine Learning inference on a remote server. In this post I’ll review how to run OpenCV models server-side with hardware acceleration on Intel chipsets using Intel’s open source Open WebRTC Toolkit (OWT).
webrtcH4cKS: ~ Stop touching your face using a browser and TensorFlow.js
Don’t touch your face! To prevent the spread of disease, health bodies recommend not touching your face with unwashed hands. This is easier said than done if you are sitting in front of a computer for hours. I wondered, is this a problem that can be solved with a browser?
We have a number of computer vision + WebRTC experiments here. Experimenting with running computer vision locally in the browser using TensorFlow.js has been on my bucket list and this seemed like a good opportunity. A quick search revealed somebody already thought of this 2 week ago. That site used a model that requires some user training – which is interesting but can make it flaky. It also wasn’t open source for others to expand on, so I did some social distancing via coding isolation over the weekend to see what was possible.
webrtcH4cKS: ~ Smile, You’re on WebRTC – Using ML Kit for Smile Detection
Now that it is getting relatively easy to setup video calls (most of the time), we can move on to doing fun things with the video stream. With new advancements in Machine Learning (ML) and a growing number of API’s and libraries out there, computer vision is also getting easier to do. Google’s ML Kit is a recent example of a new machine learning based library that makes gives quick access to computer vision outputs.
To show how to use Google’s new ML Kit to detect user smiles on a live WebRTC stream, I would like to welcome back past webrtcHacks author and WebRTC video master Gustavo Garcia Bernardo of Houseparty. Joining him I would like to also welcome mobile WebRTC expert, Roberto Perez of TokBox. They give some background on doing facial detection, show some code samples, but more importantly share their learnings for optimum configuration of smile detection inside a Real Time Communications (RTC) app.
webrtcH4cKS: ~ Part 2: Building a AIY Vision Kit Web Server with UV4L
In part 1 of this set, I showed how one can use UV4L with the AIY Vision Kit send the camera stream and any of the default annotations to any point on the Web with WebRTC. In this post I will build on this by showing how to send image inference data over a WebRTC dataChannel and render annotations in the browser. To do this we will use a basic Python server, tweak some of the Vision Kit samples, and leverage the dataChannel features of UV4L.
To fully follow along you will need to have a Vision Kit and should have completed all the instructions in part 1. If you don’t have a Vision Kit, you still may get some value out of seeing how UV4L’s dataChannels can be used for easily sending data from a Raspberry Pi to your browser application.
A couple years ago I did a TADHack where I envisioned a cheap, low-powered camera that could run complex computer vision and stream remotely when needed. After considering what it would take to build something like this myself, I waited patiently for this tech to come. Today with Google’s new AIY Vision kit, we are pretty much there.
The AIY Vision Kit is a $45 add-on board that attaches to a Raspberry Pi Zero with a Pi 2 camera. The board includes a Vision Processing Unit (VPU) chip that runs Tensor Flow image processing graphs super efficiently. The kit comes with a bunch of examples out of the box, but to actually see what the camera see’s you need to plug the HDMI into a monitor. That’s not very useful when you want to put your battery powered kit in a remote location. And while it is nice that the rig does not require any Internet connectivity, that misses out on a lot of the fun applications. So, let’s add some WebRTC to the AIY Vision Kit to let it stream over the web.
webrtcH4cKS: ~ Computer Vision on the Web with WebRTC and TensorFlow
TensorFlow is one of the most popular Machine Learning frameworks out there – probably THE most popular one. One of the great things about TensorFlow is that many libraries are actively maintained and updated. One of my favorites is the TensorFlow Object Detection API. The Tensorflow Object Detection API classifies and provides the location of multiple objects in an image. It comes pre-trained on nearly 1000 object classes with a wide variety of pre-trained models that let you trade off speed vs. accuracy.
Sending real time communications from point A to point B? That functionality is relatively easy with WebRTC. Processing the media in real time to do something cool with it? That is an area I find a lot more interesting, but it is a lot tougher to do. When I was building my Motion Detecting Baby Monitor project, I wished I had some kind of media server to handle the motion detection processing. That would give me some flexibility to take the processor intensive algorithm off of my phone and stick that in the cloud if I wanted to save on battery. That also got me thinking – if you can do motion detection why not apply other more advanced image processing algorithms to the WebRTC stream? How about facial recognition, object detection, gesture tracking or many of the other cool features that are popping up all the time in the popular Open Source Computer Vision (OpenCV) project? I wrote this dream off as science fiction for another year or two.