Guide AIY, computer vision, DataChannel, raspberry pi, TensorFlow, uv4l, Vision Kit Chad Hart · March 6, 2018

Part 2: Building a AIY Vision Kit Web Server with UV4L

In part 1 of this set, I showed how one can use UV4L with the AIY Vision Kit send the camera stream and any of the default annotations to any point on the Web with WebRTC. In this post I will build on this by showing how to send image inference data over a WebRTC dataChannel and render annotations in the browser. To do this we will use a basic Python server, tweak some of the Vision Kit samples, and leverage the dataChannel features of UV4L.

To fully follow along you will need to have a Vision Kit and should have completed all the instructions in part 1. If you don’t have a Vision Kit, you still may get some value out of seeing how UV4L’s dataChannels can be used for easily sending data from a Raspberry Pi to your browser application.

Face detection example over the Web on the AIY Vision Kit with UV4L's WebRTC — Testing the AIY Vision Web Server face detection on Joy from Pixar’s Inside Out

Just let me try it

Have an AIY Vision Kit and want to see the project before you read? Even if you do want to read, use this as the setup. Here is what you need to do:

Buy a AIY Vision Kit – they probably won’t be in stock again until Spring sometime
Follow the Vision Kit Assembly Guide to build it

Install UV4L:

curl http://www.linux-projects.org/listing/uv4l_repo/lpkey.asc | sudo apt-key add -
echo "deb http://www.linux-projects.org/listing/uv4l_repo/raspbian/stretch stretch main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install -y uv4l uv4l-raspicam uv4l-raspicam-extras uv4l-webrtc-armv6 uv4l-raspidisp uv4l-raspidisp-extras

curl http://www.linux-projects.org/listing/uv4l_repo/lpkey.asc | sudo apt-key add -

echo "deb http://www.linux-projects.org/listing/uv4l_repo/raspbian/stretch stretch main" | sudo tee -a /etc/apt/sources.list

sudo apt-get update

sudo apt-get install -y uv4l uv4l-raspicam uv4l-raspicam-extras uv4l-webrtc-armv6 uv4l-raspidisp uv4l-raspidisp-extras

Install git if you don’t have it: sudo apt-get install git
Clone the repo: git clone https://github.com/webrtcHacks/aiy_vision_web_server.git
Go to the directory: cd aiy_vision_web_server/
Install Python dependencies: python3 setup.py install
Turn the default Joy Detection demo off: sudo systemctl stop joy_detection_demo.service
Run the server: python3 server.py
Point your web browser to http://raspberrypi.local:5000 or whatever you set your hostname or IP address to

If you reboot…

The default Joy Detection Demo is loaded as a system service and will start up again every time you boot. To permanently disable the service just run:

sudo systemctl disable joy_detection_demo.service

1	sudo systemctl disable joy_detection_demo.service

Environment changes

The original AIY kit code/image used a Python Virtual Environment that had to be loaded before running any of the Python commands. This was changed in the Feb 21 image so you don’t need the virtual environment. You might need to run sudo if you get errors.

Architecture

In the Computer Vision on the Web with WebRTC and TensorFlow post, I showed how to build a computer vision server using Tensor Flow and how to send frames from a WebRTC stream to it. This is different in that we are generating and sending a WebRTC stream from a local device (a Raspberry Pi Zero) with a camera that is actually doing the image processing itself. We will use a browser client to see what the Vision Kit sees and provide annotations.

The architecture of this project looks like this:

architecture diagram including the AIY Vision Kit's Raspberry Pi Zero, Pi Camera, UV4L — Project architecture showing AIY Vision Kit hardware and major software elements.

Hardware

On the hardware side, you attach a Pi Camera 2 and the Vision Bonnet to a Raspberry Pi Zero W attaches. I covered this a lot in part 1, so check back there for details.

WebRTC & DataChannels

Then for software we will use UV4L on the Pi Zero to manage all our WebRTC. UV4L includes a WebSocket-based signaling server, so we do not need to write any WebRTC logic on the server. This is one less thing we need to implement, but we also loose easy use of that WebSocket to transmit our data from the server to the client. We could build an additional WebSocket interface or some polling method, but there is an easier way – UV4L includes a gateway between Linux’s socket interface and a WebRTC DataChannel. Since the PeerConnect is already there we just add a DataChannel to it send our inference and annotation data to the browser.

Web server & client

We will also run a Python-based server that will interface with the Vision Bonnet and use Flask for our web server. Finally, our browser client just needs to receive a websocket, DataChannel, and video stream from the PiZero and display our annotations.

Each of these components will be explained in a lot more detail below.

Code

I will not be going line-by-line in order with any of the code below, but I will touch on the main pieces. You can follow along with the code in the repo.

server.py

server.py has 3 main elements:

Flask as our webserver
The AIY inference module to run our computer vision
Socket communications which UV4L will convert to websockets

I will describe each of these individually.

Server setup

Our main server code needs to do 2 things:

Setup Flask to serve static content for our web server
Spawn threads

Let’s do the Flask setup first.

Web Server

There isn’t much to this –

from flask import Flask, Response

# other functions to be covered later

# Web server setup
app = Flask(__name__)


def flask_server():
    app.run(debug=False, host='0.0.0.0', threaded=True)  # use_reloader=False


@app.route('/')
def index():
    return Response(open('static/index.html').read(), mimetype="text/html")


def main(webserver):
# I'll cover this next

if __name__ == '__main__':
    main(app)

from flask import Flask, Response

# other functions to be covered later

# Web server setup

app = Flask(__name__)

def flask_server():

app.run(debug=False, host='0.0.0.0', threaded=True) # use_reloader=False

@app.route('/')

def index():

return Response(open('static/index.html').read(), mimetype="text/html")

def main(webserver):

# I'll cover this next

if __name__ == '__main__':

main(app)

Threads

Working in Node.js for many years made me forget how much I hate dealing with threads.

We need to run 3 threads to make this all work:

The Flask code above that we will run in the main thread
A thread for inference
A thread for our socket communication

Since we need to pass data from our inference thread to the socket thread, we will also need an inter-thread communications mechanism. We will use the queue library to pass data and Event library in threading to make a global event that the threads can check to see if they should shut down.

Here are the imports and globals:

from threading import Thread, Event
import queue

q = queue.Queue(maxsize=1)  # we'll use this for inter-process communication

from threading import Thread, Event

import queue

q = queue.Queue(maxsize=1) # we'll use this for inter-process communication

Now we can finish our main function:

def main(webserver):

    is_running = Event()
    is_running.set()

    # run this independent of a flask connection so we can test it with the uv4l console
    socket_thread = Thread(target=socket_data, args=(is_running, 1 / args.framerate,))
    socket_thread.start()

    # thread for running AIY Tensorflow inference
    detection_thread = Thread(target=run_inference,
                              args=(is_running, args.model, args.framerate, args.cam_mode, args.hres, args.vres,))
    detection_thread.start()

    # run Flask in the main thread
    webserver.run(debug=False, host='0.0.0.0')

    # close threads when flask is done
    print("exiting...")
    is_running.clear()
    detection_thread.join(0)
    socket_thread.join(0)

def main(webserver):

is_running = Event()

is_running.set()

# run this independent of a flask connection so we can test it with the uv4l console

socket_thread = Thread(target=socket_data, args=(is_running, 1 / args.framerate,))

socket_thread.start()

# thread for running AIY Tensorflow inference

detection_thread = Thread(target=run_inference,

args=(is_running, args.model, args.framerate, args.cam_mode, args.hres, args.vres,))

detection_thread.start()

# run Flask in the main thread

webserver.run(debug=False, host='0.0.0.0')

# close threads when flask is done

print("exiting...")

is_running.clear()

detection_thread.join(0)

socket_thread.join(0)

Running Inference

The Vision Kit comes with several examples of how to run various inference models. In my setup I cared about 2 – the object detection and face detection models. There is no reason this would not work with the other models, but those just provide a label and that is not really to relevant for realtime web-based annotations.

Make sure you include the relevant AIY libraries along with the PiCamera libary:

from aiy.vision.leds import Leds

1	from aiy.vision.leds import Leds

from aiy._drivers._rgbled import PrivacyLed from aiy.vision.inference import CameraInference from aiy.vision.models import object_detection, face_detection from picamera import PiCamera

I also included the PrivacyLed just to make it easier to see when the camera is on. Note the Leds code changed in the Feb 21 image.

Much like in the Computer Vision on the Web with WebRTC and TensorFlow project, we will eventually need to convert our TensorFlow model helper library to JSON. To make this simple, we’ll make a simple Class to help set this up:

# helper class to convert inference output to JSON
class ApiObject(object):
    def __init__(self):
        self.name = "webrtcHacks AIY Vision Server REST API"
        self.version = "0.0.1"
        self.numObjects = 0
        self.objects = []

    def to_json(self):
        return json.dumps(self.__dict__)

# helper class to convert inference output to JSON

class ApiObject(object):

def __init__(self):

self.name = "webrtcHacks AIY Vision Server REST API"

self.version = "0.0.1"

self.numObjects = 0

self.objects = []

def to_json(self):

return json.dumps(self.__dict__)

Now let’s stub out what we want our inference function to look like:

def run_inference(run_event, model="face", framerate=15, cammode=5, hres=1640, vres=922):

1	def run_inference(run_event, model="face", framerate=15, cammode=5, hres=1640, vres=922):

You’ll see the run_event argument a few times – that is a cross-process event to signal when to shut down the function so its thread can be killed. The model parameter let’s you choose between the face or object detection modes. The other arguments let you configure the camera – I left these here to help with optimizations between image quality, frame-rate, CPU, battery, and bandwidth. For more on camera initialization parameters, see the PiCam v2 docs.

Then we setup our camera with the parameters provided:

    with PiCamera() as camera, PrivacyLed(leds):
        camera.sensor_mode = cammode
        camera.resolution = (hres, vres)
        camera.framerate = framerate
        camera.start_preview(fullscreen=True)

with PiCamera() as camera, PrivacyLed(leds):

camera.sensor_mode = cammode

camera.resolution = (hres, vres)

camera.framerate = framerate

camera.start_preview(fullscreen=True)

In addition to initializing the camera above, we also started the Privacy LED.

The AIY Vision Kit comes with a privacy LED that used to tell when the camera is running

Next we choose our TensorFlow model:

        if model == "object":
            tf_model = object_detection.model()
        elif model == "face":
            tf_model = face_detection.model()
        else:
            print("No tensorflow model or invalid model specified - exiting..")
            camera.stop_preview()
            os._exit(0)
            return

if model == "object":

tf_model = object_detection.model()

elif model == "face":

tf_model = face_detection.model()

else:

print("No tensorflow model or invalid model specified - exiting..")

camera.stop_preview()

os._exit(0)

return

Now the fun part – running inference:

            for result in inference.run():

                # exit on shutdown
                if not run_event.is_set():
                    camera.stop_preview()
                    return

                output = ApiObject()

for result in inference.run():

# exit on shutdown

if not run_event.is_set():

camera.stop_preview()

return

output = ApiObject()

There is isn’t much here other than an infinite look that keeps checking returning any inference data to result . We’ll exit the loop if we notice the run_event is cleared.

Object Detection Model

Next we will need separate modules to handle that result object. Let’s do the object detection model first:

                # handler for the AIY Vision object detection model
                if model == "object":
                    output.threshold = 0.3
                    objects = object_detection.get_objects(result, output.threshold)

                    for obj in objects:
                        # print(object)
                        item = {
                            'name': 'object',
                            'class_name': obj._LABELS[obj.kind],
                            'score': obj.score,
                            'x': obj.bounding_box[0] / capture_width,
                            'y': obj.bounding_box[1] / capture_height,
                            'width': obj.bounding_box[2] / capture_width,
                            'height': obj.bounding_box[3] / capture_height
                        }

                        output.numObjects += 1
                        output.objects.append(item)

# handler for the AIY Vision object detection model

if model == "object":

output.threshold = 0.3

objects = object_detection.get_objects(result, output.threshold)

for obj in objects:

# print(object)

item = {

'name': 'object',

'class_name': obj._LABELS[obj.kind],

'score': obj.score,

'x': obj.bounding_box[0] / capture_width,

'y': obj.bounding_box[1] / capture_height,

'width': obj.bounding_box[2] / capture_width,

'height': obj.bounding_box[3] / capture_height

}

output.numObjects += 1

output.objects.append(item)

The AIY Kit comes with a object_detection library that takes the inference result and a minimum score threshold since you will often will give lots of values with low scores. It outputs a bounding box with x & y coordinates with a pixel width and height , score , and a class_name corresponding to each object it saw. The built in object detection model only only includes 3 classes – person , cat , and dog . I am not totally sure why they limited this to just 3, but I have people and cats in my house to test on so this is an ok demo model for me. (Note: after more testing I think this is limited due to performance – see that section later on).

I hardcoded the input threshold to 30% ( 0.3 ) – I guess I could make this a parameter but as we’ll see in the next section, there is not a corresponding input parameter in that model.

After that this is very similar to what I did in the Computer Vision on the Web with WebRTC and TensorFlow project with the by adding each item to its own object so it can be converted to JSON nicely. On the bounding box – unlike the TensorFlow Object Detection API where the result was a percentage you could then multiply by the image dimensions, the Vision Kit returns actual pixel width. To align this with my previous project, I needed to convert it to a percentage.

Face Detection Model

The face detection model is simpler:

                # handler for the AIY Vision face detection model
                elif model == "face":
                    faces = face_detection.get_faces(result)

                    for face in faces:
                        print(face)
                        item = {
                            'name': 'face',
                            'score': face.face_score,
                            'joy': face.joy_score,
                            'x': face.bounding_box[0] / capture_width,
                            'y': face.bounding_box[1] / capture_height,
                            'width': face.bounding_box[2] / capture_width,
                            'height': face.bounding_box[3] / capture_height,
                        }

                        output.numObjects += 1
                        output.objects.append(item)

# handler for the AIY Vision face detection model

elif model == "face":

faces = face_detection.get_faces(result)

for face in faces:

print(face)

item = {

'name': 'face',

'score': face.face_score,

'joy': face.joy_score,

'x': face.bounding_box[0] / capture_width,

'y': face.bounding_box[1] / capture_height,

'width': face.bounding_box[2] / capture_width,

'height': face.bounding_box[3] / capture_height,

}

output.numObjects += 1

output.objects.append(item)

This model just takes the result and returns a face_score , bounding box coordinates that are the same as the object detection model, and a joy_score .

Lastly, we take this data and send it to the console and socket if the socket is connect (more on that next) before we repeat the loop:

                # No need to do anything else if there are no objects
                if output.numObjects > 0:
                    output_json = output.to_json()
                    print(output_json)

                    # Send the json object if there is a socket connection
                    if socket_connected is True:
                        q.put(output_json)

# No need to do anything else if there are no objects

if output.numObjects > 0:

output_json = output.to_json()

print(output_json)

# Send the json object if there is a socket connection

if socket_connected is True:

q.put(output_json)

Socket

As I explained in the Architecture section above, UV4L provides a bridge between Linux’s built in sockets and the WebRTC DataChannel. This means you just need to use Python’s socket module and do not need to touch any WebRTC code inside Python. There is an example of how to do this here.

We do not need to do anything fancy for our example – just send the output_json we created in the previous section to our client. Indeed this is simple, but I found the control loop logic to run this in a single thread while managing clients connecting and disconnecting while still being able to exit the thread cleanly on exit to be less than straight forward. Coming from more of a Node.js background where I am used to callbacks and promises, I found managing the process by exception handling to be strange. After I took a step back and made a flow chart diagram, I managed to figure it out.

flow chart diagram — Flow chart my socket logic with sub-functions inside the gray boxes

It looks complicated for 58 lines or so of Python. If that did not scare you off, read on for the code.

Setup

Import the socket module and make a global variable to track if it is connected:

import socket

socket_connected = False

import socket

socket_connected = False

The socket_data function

My main socket_data function is below. As I illustrated above, this includes 2 sub-functions that I will just leave placeholders for and cover in the following sections. The Python socket library requires that you bind to something. That can be any address, like an IP address, but in this case we will use the socket_path that is setup by default with uv4l-raspidisp-extras when we installed UV4L. Make sure you see the note on the owner of this file in the Just Let Me Try It section.

As a precaution, we will unlink this file incase it is used somewhere else. Then we setup our socket, bind it, set how many max connections to listen for, and then set a timeout parameters before moving to the wait_to_connect sub-function.

# Control connection to the linux socket and send messages to it
def socket_data(run_event, send_rate):
    socket_path = '/tmp/uv4l-raspidisp.socket'    

   # wait_to_connect sub-function goes here

   # send_data sub-function goes here

   try:
        # Create the socket file if it does not exist
        if not os.path.exists(socket_path):
            f = open(socket_path, 'w')
            f.close()

        os.unlink(socket_path)
        s = socket.socket(socket.AF_UNIX, socket.SOCK_SEQPACKET)
        s.bind(socket_path)
        s.listen(1)
        s.settimeout(1)
        wait_to_connect()
    except OSError:
        if os.path.exists(socket_path):
            print("Error accessing %s\nTry running 'sudo chown pi: %s'" % (socket_path, socket_path))
            os._exit(0)
            return
        else:
            print("Socket file not found. Did you configure uv4l-raspidisp to use %s?" % socket_path)
            raise
    except socket.error as err:
        print("socket error: %s" % err)

# Control connection to the linux socket and send messages to it

def socket_data(run_event, send_rate):

socket_path = '/tmp/uv4l-raspidisp.socket'

# wait_to_connect sub-function goes here

# send_data sub-function goes here

try:

# Create the socket file if it does not exist

if not os.path.exists(socket_path):

f = open(socket_path, 'w')

f.close()

os.unlink(socket_path)

s = socket.socket(socket.AF_UNIX, socket.SOCK_SEQPACKET)

s.bind(socket_path)

s.listen(1)

s.settimeout(1)

wait_to_connect()

except OSError:

if os.path.exists(socket_path):

print("Error accessing %s\nTry running 'sudo chown pi: %s'" % (socket_path, socket_path))

os._exit(0)

return

else:

print("Socket file not found. Did you configure uv4l-raspidisp to use %s?" % socket_path)

raise

except socket.error as err:

print("socket error: %s" % err)

Waiting for a connection

The Socket library uses s.accept to listen for an incoming connection. It will only do this for the time you specify in the settimeout parameter in the previous section. The problem is that a timeout while we are waiting for our browser to connect is expected, while a timeout when we are trying to send data is not. To handle this we will just catch the timeout exception and continue. If there is another socket error then we’ll assume something is wrong and close the connection down.

Once we are connected, we will proceed to the send_data sub-function.

    # wait for a connection
    def wait_to_connect():
        global socket_connected

        print('socket waiting for connection...')
        while run_event.is_set():
            try:
                socket_connected = False
                connection, client_address = s.accept()
                print('socket connected')
                socket_connected = True
                send_data(connection)

            except socket.timeout:
                continue

            except socket.error as err:
                print("socket error: %s" % err)
                break

        socket_connected = False
        s.close()
        print("closing socket")

# wait for a connection

def wait_to_connect():

global socket_connected

print('socket waiting for connection...')

while run_event.is_set():

try:

socket_connected = False

connection, client_address = s.accept()

print('socket connected')

socket_connected = True

send_data(connection)

except socket.timeout:

continue

except socket.error as err:

print("socket error: %s" % err)

break

socket_connected = False

s.close()

print("closing socket")

Sending data

The last thing we will do is check our global q to see if there is any data. If there is we will send it. To keep from blocking the thread we will use a sleep command.

This will go on until there is a socket error, which would happen if there is a disconnect.

    # continually send data as it comes in from the q
    def send_data(connection):
        while run_event.is_set():
            try:
                if q.qsize() > 0:
                    message = q.get()
                    connection.send(str(message).encode())

                sleep(send_rate)
            except socket.error as err:
                print("connected socket error: %s" % err)
                return

# continually send data as it comes in from the q

def send_data(connection):

while run_event.is_set():

try:

if q.qsize() > 0:

message = q.get()

connection.send(str(message).encode())

sleep(send_rate)

except socket.error as err:

print("connected socket error: %s" % err)

return

Client

Lastly we have our web client which consists of:

A few lines of html
JavaScript to interface with UV4L’s signaling, WebRTC video feed, and DataChannel
JavaScript to draw our annotations (i.e. bounding boxes and labels) on the screen

HTML – index.html

In our HTML file we will link to WebRTC’s adapter.js (as you always should), define some minimal styles for proper display, define our video element to play the WebRTC stream, and initialize our other JavaScript libraries:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>AIY Vision Detection View and Annotation</title>
    <script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>
    <style>
        video {
            position: absolute;
            top: 0;
            left: 0;
        }
        canvas{
            position: absolute;
            top: 0;
            left: 0;
            z-index:1
        }
    </style>
</head>
<body>
<video id="remoteVideo" autoplay></video>
<script src="static/drawAiyVision.js"></script>
<script src="static/uv4l.js"></script>
</body>
</html>

<!DOCTYPE html>

<html>

<head>

<title>AIY Vision Detection View and Annotation</title>

<style>

video {

position: absolute;

top: 0;

left: 0;

}

canvas{

position: absolute;

top: 0;

left: 0;

z-index:1

}

</style>

</head>

<body>

</body>

</html>

Receiving WebRTC – uv4l.js

Handling the WebRTC connection is one of the most complex elements of this project, especially if you are not familiar with WebRTC. Unfortunately there is no JavaScript client library but there is some WebRTC signaling documentation. If you know the typical WebRTC flow, getting started with this is not so bad. I started with the built-in UV4L WebRTC app you can see at https://raspberrypi.local/stream/webrtc, cut out everything not needed to a bare minimum. This application only needs to receive a video stream, not send one, so that helps to minimize what we need to do.

After that I cleaned up the code to more closely match best practices for a modern WebRTC implementation – thanks to Fippo for his guidance there.

I do not have room here to dive deep here, but let me give some highlights. See the uv4l.js file on GitHub for the whole thing in order.

Setup the WebSocket

The first thing we’ll do is connect to the UV4L websocket server:

const uv4lPort = 9080; //This is determined by the uv4l configuration. 9080 is default set by uv4l-raspidisp-extras
const protocol = location.protocol === "https:" ? "wss:" : "ws:";
const signalling_server_address = location.hostname;
let ws = new WebSocket(protocol + '//' + signalling_server_address + ':' + uv4lPort + '/stream/webrtc');

const uv4lPort = 9080; //This is determined by the uv4l configuration. 9080 is default set by uv4l-raspidisp-extras

const protocol = location.protocol === "https:" ? "wss:" : "ws:";

const signalling_server_address = location.hostname;

let ws = new WebSocket(protocol + '//' + signalling_server_address + ':' + uv4lPort + '/stream/webrtc');

Then we need to setup our WebSocket logic:

function websocketEvents() {

    ws.onopen = () => {
        console.log("websocket open");

        startCall();

    };

    /*** Signaling logic ***/
    ws.onmessage = (event) => {
        let message = JSON.parse(event.data);
        console.log("Incoming message:" + JSON.stringify(message));

        if (!message.what) {
            console.error("Websocket message not defined");
            return;
        }

        switch (message.what) {

            case "offer":
                offerAnswer(JSON.parse(message.data));
                break;

            case "iceCandidates":
                onIceCandidates(JSON.parse(message.data));
                break;

            default:
                console.warn("Unhandled websocket message: " + JSON.stringify(message))
        }
    };

    ws.onerror = (err) => {
        console.error("Websocket error: " + err.toString());
    };

    ws.onclose = () => {
        console.log("Websocket closed.");
    };

}

function websocketEvents() {

ws.onopen = () => {

console.log("websocket open");

startCall();

};

/*** Signaling logic ***/

ws.onmessage = (event) => {

let message = JSON.parse(event.data);

console.log("Incoming message:" + JSON.stringify(message));

if (!message.what) {

console.error("Websocket message not defined");

return;

}

switch (message.what) {

case "offer":

offerAnswer(JSON.parse(message.data));

break;

case "iceCandidates":

onIceCandidates(JSON.parse(message.data));

break;

default:

console.warn("Unhandled websocket message: " + JSON.stringify(message))

}

};

ws.onerror = (err) => {

console.error("Websocket error: " + err.toString());

};

ws.onclose = () => {

console.log("Websocket closed.");

};

}

When the WebSocket is opened we will start our call with a startCall function. When we get an offer message we’ll start an offerAnswer function. When we get an iceCandidates we’ll start an onIceCandidates function. We’ll cover those below in a moment.

PeerConnect setup

Then we will setup our PeerConnection.

//////////////////////////
/*** Peer Connection ***/

function setupPeerConnection() {
    const pcConfig = {
        iceServers: [{
            urls: [
                //"stun:stun.l.google.com:19302",
                "stun:" + signalling_server_address + ":3478"
            ]
        }]
    };

    //Setup our peerConnection object
    pc = new RTCPeerConnection(pcConfig);

    //Start video
    pc.ontrack = (event) => {
        if (remoteVideo.srcObject !== event.streams[0]) {
            remoteVideo.srcObject = event.streams[0];
            remoteVideo.play()
                .then(() => console.log('Remote stream added.'));
        }
    };

    pc.onremovestream = (event) => {
        console.log('Remote stream removed. Event: ', event);
        remoteVideo.stop();
    };

    //Handle datachannel messages
    pc.ondatachannel = (event) => {

        dataChannel = event.channel;

        dataChannel.onopen = () => console.log("Data Channel opened");

        dataChannel.onerror = (err) => console.error("Data Channel Error:", err);

        dataChannel.onmessage = (event) => {
            //console.log("DataChannel Message:", event.data);
            processAiyData(JSON.parse(event.data));
        };

        dataChannel.onclose = () => console.log("The Data Channel is Closed");
    };

    console.log('Created RTCPeerConnnection');

}

//////////////////////////

/*** Peer Connection ***/

function setupPeerConnection() {

const pcConfig = {

iceServers: [{

urls: [

//"stun:stun.l.google.com:19302",

"stun:" + signalling_server_address + ":3478"

]

}]

};

//Setup our peerConnection object

pc = new RTCPeerConnection(pcConfig);

//Start video

pc.ontrack = (event) => {

if (remoteVideo.srcObject !== event.streams[0]) {

remoteVideo.srcObject = event.streams[0];

remoteVideo.play()

.then(() => console.log('Remote stream added.'));

}

};

pc.onremovestream = (event) => {

console.log('Remote stream removed. Event: ', event);

remoteVideo.stop();

};

//Handle datachannel messages

pc.ondatachannel = (event) => {

dataChannel = event.channel;

dataChannel.onopen = () => console.log("Data Channel opened");

dataChannel.onerror = (err) => console.error("Data Channel Error:", err);

dataChannel.onmessage = (event) => {

//console.log("DataChannel Message:", event.data);

processAiyData(JSON.parse(event.data));

};

dataChannel.onclose = () => console.log("The Data Channel is Closed");

};

console.log('Created RTCPeerConnnection');

}

First we create a new RTCPeerConnection with some STUN servers. UV4L actually acts as a STUN server and I do not want any traffic leaving my LAN, so I just used that. Stick in the Google STUN server and the TURN server of your choice if your network topology requires that.

Then we assign of bunch of actions to various events. All of this is pretty standard. The unique one for this application is handling the dataChannel.onmessage event to pass that data to the processAiyData function we made in the previous section.

Start the call

The way we start a call with UV4L is to signal the server to call is back with a basic WebSocket command. We can pass some options as we do this. As I covered in Part 1, force_hw_vcodec will use hardware encoding. Without that our CPU usage is likely to be too high to do anything else. If you omit these parameters it should use the defaults set in /etc/uv4l/uv4l-raspidisp.conf. vformat: 55 corresponds to 720p at 15 Frames Per Second (FPS).

function startCall() {

    //Initialize the peerConnection
    setupPeerConnection();

    //Send the call commmand
    let req = {
        what: "call",
        options: {
            force_hw_vcodec: true,
            vformat: 55
        }
    };

    ws.send(JSON.stringify(req));
    console.log("Initiating call request" + JSON.stringify(req));

}

function startCall() {

//Initialize the peerConnection

setupPeerConnection();

//Send the call commmand

let req = {

what: "call",

options: {

force_hw_vcodec: true,

vformat: 55

}

};

ws.send(JSON.stringify(req));

console.log("Initiating call request" + JSON.stringify(req));

}

ICE Candidates

Part of a usual WebRTC exchange is handling ICE Candidates. Unfortunately UV4L does not seem to support Trickle-ICE (see here for more on what that is), so we get them all at once which adds some delay to the setup. They also do not appear to be in a standard format that adapter.js likes so I had to take the data provided and regenerate them.

function onIceCandidates(remoteCandidates) {

    function onAddIceCandidateSuccess() {
        console.log("Successfully added ICE candidate")
    }

    function onAddIceCandidateError(err) {
        console.error("Failed to add candidate: " + err)
    }

    remoteCandidates.forEach((candidate) => {
        let generatedCandidate = new RTCIceCandidate({
            sdpMLineIndex: candidate.sdpMLineIndex,
            candidate: candidate.candidate,
            sdpMid: candidate.sdpMid
        });
        console.log("Created ICE candidate: " + JSON.stringify(generatedCandidate));
        pc.addIceCandidate(generatedCandidate)
            .then(onAddIceCandidateSuccess, onAddIceCandidateError);
    });
}

function onIceCandidates(remoteCandidates) {

function onAddIceCandidateSuccess() {

console.log("Successfully added ICE candidate")

}

function onAddIceCandidateError(err) {

console.error("Failed to add candidate: " + err)

}

remoteCandidates.forEach((candidate) => {

let generatedCandidate = new RTCIceCandidate({

sdpMLineIndex: candidate.sdpMLineIndex,

candidate: candidate.candidate,

sdpMid: candidate.sdpMid

});

console.log("Created ICE candidate: " + JSON.stringify(generatedCandidate));

pc.addIceCandidate(generatedCandidate)

.then(onAddIceCandidateSuccess, onAddIceCandidateError);

});

}

Offer/Answer

The final piece is handling the Offer Answer exchange. This looks kind of complicated, but are really only doing a few things:

Setting our remote SDP based on what was sent(see here for our many posts covering SDP)
Generating a local SDP
Setting that local SDP
Responding to the UV4L server with that SDP

Concurrent with all of this, we will ask the UV4L server to generate some ICE candidates which we handle above.

function offerAnswer(remoteSdp) {

    //Start the answer by setting the remote SDP
    pc.setRemoteDescription(new RTCSessionDescription(remoteSdp))
        .then(() => {
                console.log("setRemoteDescription complete");

                //Create the local SDP
                pc.createAnswer()
                    .then(
                        (localSdp) => {
                            pc.setLocalDescription(localSdp)
                                .then(() => {
                                        console.log("setLocalDescription complete");

                                        //send the answer
                                        let req = {
                                            what: "answer",
                                            data: JSON.stringify(localSdp)
                                        };
                                        ws.send(JSON.stringify(req));
                                        console.log("Sent local SDP: " + JSON.stringify(localSdp));

                                    },
                                    (err) => console.error("setLocalDescription error:" + err));
                        },
                        (err) =>
                            console.log('Failed to create session description: ' + err.toString())
                    );
            },
            (err) => console.error("Failed to setRemoteDescription: " + err));

    //Now ask for ICE candidates
    console.log("telling uv4l-server to generate IceCandidates");
    ws.send(JSON.stringify({what: "generateIceCandidates"}));
    
}

function offerAnswer(remoteSdp) {

//Start the answer by setting the remote SDP

pc.setRemoteDescription(new RTCSessionDescription(remoteSdp))

.then(() => {

console.log("setRemoteDescription complete");

//Create the local SDP

pc.createAnswer()

.then(

(localSdp) => {

pc.setLocalDescription(localSdp)

.then(() => {

console.log("setLocalDescription complete");

//send the answer

let req = {

what: "answer",

data: JSON.stringify(localSdp)

};

ws.send(JSON.stringify(req));

console.log("Sent local SDP: " + JSON.stringify(localSdp));

(err) => console.error("setLocalDescription error:" + err));

(err) =>

console.log('Failed to create session description: ' + err.toString())

);

(err) => console.error("Failed to setRemoteDescription: " + err));

//Now ask for ICE candidates

console.log("telling uv4l-server to generate IceCandidates");

ws.send(JSON.stringify({what: "generateIceCandidates"}));

}

Annotation – drawAiyVision.js

The annotation code in drawAiyVision.js is very similar to the annotation code in Computer Vision on the Web with WebRTC and TensorFlow. Enough so that I am not going to explain it again – see the client section in that post to see how to use the canvas to draw some squares and add labels.

The main modification I made here was to change the bounding box color to match the joy_score variable in face detection mode. This is very similar to how the arcade button changes color depending on mood of the detected faces in the joy_detection_demo.py that runs by default in the AIY Vision Kit.

Here is what the main function looks like:

//Main function to export
function processAiyData(result) {
    console.log(result);

    lastSighting = Date.now();

    //clear the previous drawings
    drawCtx.clearRect(0, 0, drawCanvas.width, drawCanvas.height);

    result.objects.forEach((item) => {
        if (item.name === "face") {
            let label = "Face: " + Math.round(item.score * 100) + "%" + " Joy: " + Math.round(item.joy * 100) + "%";
            let color = {
                r: Math.round(item.joy * 255),
                g: 70,
                b: Math.round((1 - item.joy) * 255)
            };
            drawBox(item.x, item.y, item.width, item.height, label, color)
        }
        else if (item.name === "object") {
            let label = item.class_name + " - " + Math.round(item.score * 100) + "%";
            drawBox(item.x, item.y, item.width, item.height, label)
        }
        else
            console.log("I don't know what that AIY Vision server response was");
    });

}

//Main function to export

function processAiyData(result) {

console.log(result);

lastSighting = Date.now();

//clear the previous drawings

drawCtx.clearRect(0, 0, drawCanvas.width, drawCanvas.height);

result.objects.forEach((item) => {

if (item.name === "face") {

let label = "Face: " + Math.round(item.score * 100) + "%" + " Joy: " + Math.round(item.joy * 100) + "%";

let color = {

r: Math.round(item.joy * 255),

g: 70,

b: Math.round((1 - item.joy) * 255)

};

drawBox(item.x, item.y, item.width, item.height, label, color)

}

else if (item.name === "object") {

let label = item.class_name + " - " + Math.round(item.score * 100) + "%";

drawBox(item.x, item.y, item.width, item.height, label)

}

else

console.log("I don't know what that AIY Vision server response was");

});

}

See the whole file on GitHub for the helper functions around this.

Test it

When you are all done, you should be able to run:

python3 server.py -m object

1	python3 server.py -m object

Then point your browser to http://raspberrypi.local:5000 and you should see something like this (if you have a cat, dog, or person):

Optimizations

The face detection model runs very fast without any tinkering. No so much for the object detection model. I have a lot more work to do here, but here are some starting thoughts for making this work better.

Use the latest AIY code

The Raspberry Pi image was updated 3 times since I purchased the kit and the repo has had had a few performance updates. As stated by weiranzhao here, the AIY Kit team has already improved performance with more to come.

Tweaking UV4L

There are a lot of things running on the Raspberry Pi image – most of which we do not need for this application, so we can turn some of them off or disable them. I found that disabling the uv4l_raspicam service helped with stability.

sudo systemctl disable uv4l_raspicam.service

1	sudo systemctl disable uv4l_raspicam.service

There are also a few parameters that can be updated in /etc/uv4l/uv4l-raspidisp.conf . We can set the resolution and frame rate to 720px15FPS or lower if we will not ever be sending a stream higher than that. The UV4L server also does not need to send audio, or receive anything. UV4L also has a CPU overuse detection feature that appears to help the most.

framerate = 15 #default: 0
resolution = 7 #default: 0
server-option = --enable-webrtc-audio=no #default: yes
server-option = --webrtc-receive-video=no #default: yes
server-option = --webrtc-receive-audio=no #default: yes
server-option = --webrtc-cpu-overuse-detection=yes #default: no

framerate = 15 #default: 0

resolution = 7 #default: 0

server-option = --enable-webrtc-audio=no #default: yes

server-option = --webrtc-receive-video=no #default: yes

server-option = --webrtc-receive-audio=no #default: yes

server-option = --webrtc-cpu-overuse-detection=yes #default: no

Make sure to restart the service or reboot after any changes. Checking the service status output will tell you what the current resolution settings are.

sudo service uv4l_raspidisp restart
sudo service uv4l_raspidisp status

1 2	sudo service uv4l_raspidisp restart sudo service uv4l_raspidisp status

Check raspidisp service status to check what resolution the output is set to

I found I had to go into sudo raspi-config and set the resolution of the display there to match what I had configured elsewhere to make sure the preview window showed full screen:

Performance Tests

I did some quick tests to check the inference speed how much CPU I was using in various modes. You can see that the object model is much more CPU intensive than the face detector.

Mode	UV4L	CPU	Inference time	FPS
face	off	20%	0.737	~13
face	on	80%	0.743	~13
object	off	61%	0.138	~7
object	on	94%	0.37	~3

What Else

The CPU usage is still too close to 100%. This can be lowered further by reducing the resolution that UV4L runs at. Running it at 640×480 seemed to help a lot, but you loose a lot of the field of view from the camera. Lowering the resolution going into the inference code does not seem to make much difference – I guess the VPU hardware is more than thought to handle that piece.

One area that could use some throttling is the object_detection.get_objects method. This seems to consume a lot of CPU and we probably do not need to output those results multiple times per second. I would like to play around with putting a maximum output rate on that whole function so processing the inference results would be skipped if we do not need them as soon as we can produce them. Also, when the CPU gets overrun the socket stream gets behind – that could be controlled better too.

Thanks

Thanks to Fippo for his review of my WebRTC code and thank you to Luca Risolia at Linux Projects for his inputs on UV4L and looking into my feature requests.

{“author”, “chad hart“}

FYI – if you are looking for more things you can do with the AIY Vision Kit, I wrote a post on how to do custom Tensorflow training for kit over at cogint.ai. Check that out here: https://cogint.ai/custom-vision-training-on-the-aiy-vision-kit/

Comments

Rachel says

March 27, 2018 at 8:26 pm

Hi,

I’m running into an error when trying to run the server.py script:

(env) pi@raspberrypi:~/AIY-projects-python $ sudo ../../../../env/bin/python3 server.py

Traceback (most recent call last):
File “server.py”, line 15, in
from aiy.vision.leds import Leds
ImportError: No module named ‘aiy.vision.leds’

It’s not able to find this module. Is there a step I’m missing somewhere?

Thanks
Rachel

Reply
- Chad Hart says
  
  March 27, 2018 at 8:42 pm
  
  Do you know what aiy image you loaded? It looks like you are on an older one. They removed the virtual environment and changed some of the library references, including the LEDs. You should download and reflash your SD card with the latest image that has some improvements over the previous ones: https://dl.google.com/dl/aiyprojects/aiyprojects-latest.img.xz
  
  My code is working with aiyprojects-2018-02-21.img.xz.
  
  The old source under the virtual environment use to be:
  aiy._drivers._rgbled import PrivacyLED
  
  That module just illuminates the privacy LED, so worst case you can comment it out if that is the only thing that is giving you trouble and you don’t care about the light.
  
  Reply
roberto says

April 13, 2018 at 7:10 pm

I’ve been unable to find any documentation the face object and its methods and attributes (such as joy_score). could you point me in the right direction?

Reply
- Chad Hart says
  
  April 13, 2018 at 10:36 pm
  
  The AIY team has not published any docs on this that I know of. I learned enough for my projects just by reviewing their code samples in the repo and looking for related comments in their the repo’s issues section. Most of the samples have a good amount of comments in the code.
  
  Reply
S aroj says

April 19, 2018 at 9:18 pm

Good work!

Reply
David says

May 7, 2018 at 1:45 pm

Hi!
Great project! This was a smooth ride so far!
Any idea how I could go about adding a layer of security, like a login and password page before accessing the stream?
I would like to set it up as a security camera.
Also, I notice that only 1 device can access the stream with the setup in part 1. Do you have any pointers as to how to overcome that?

Reply
- Chad Hart says
  
  May 7, 2018 at 4:57 pm
  
  Its good to hear you are having success with this. The UV4L Server has a HTTP Basic authentication option – see the manual or just run through he config file to set this. I use this at home just by passing the user name and password as part of the URL. My first thought on improving this would be to setup a webpage that does some user authentication, then use Flask to mediate between the user authentication and to hide the UV4L credentials so they aren’t exposed to the browser.
  
  When you say only 1 device can access the stream, I assume you mean only 1 WebRTC stream at a time? I believe best practice there is to use a media server like Janus or Jitsi for that. I know Janus runs well on a Pi 3, but I am not sure if you would have enough CPU to run it on a Pi Zero while doing inference processing too.
  
  Reply
  - David says
    
    May 7, 2018 at 8:25 pm
    
    Thank you for your response! I will try the username, password url method.
    I wonder if we could just record the stream and do the inference processing on the pi and upload the result as a video file on the cloud. This way we can then stream it directly from there. I don’t mind if my real-time has little delay. What do you think?
    Unfortunately, although part I was working like a breeze, part II doesn’t work for me yet. Here’s my setup.
    I am using Windows 10 and Putty. I am also using the latest AIY vision kit (1.1 I think). Joy detector is disabled.
    
    The script runs and the inference processing seems to work well also (based on the terminal output). Although, when I use Chrome to connect on my local pi(with local IP) and port 5000. All I get is a blank page. The html is the same as described on this page.
    Do I need to stop UV41 process? Why is the host 0.0.0.0 in your script ? Do we need to put our local pi IP address ?
    
    Reply
    - Chad Hart says
      
      May 7, 2018 at 10:28 pm
      
      If you want to setup a server in the cloud instead of running inference on device, see this post: https://webrtchacks.com/webrtc-cv-tensorflow/
      
      I ran through the “Just let me try it” instructions above on the new 1.1 kit last week and did not have any issues.
      
      You are connecting from your Windows 10 machine to the Pi Zero. “http://raspberry.pi:5000” should work in your browser unless you changed your Pi Zero host name and as long as you are on the same LAN.
      
      What does the Python console and javascript console say when you try to connect?
      
      The 0.0.0.0 just tells Flask to bind to the localhost on the Pi Zero.
      
      I can give mine a try with my Win 10 machine tomorrow.
      
      Reply
      - Chad Hart says
        
        May 8, 2018 at 8:33 pm
        
        I gave this a try again with a fresh install on my 1.1 Vision Kit and did not have any issues. I did notice the websocket connection seems to be way slower now for some reason, but the video feed and annotation worked fine otherwise. I’ll need to investigate the websocket performance issue
David says

May 9, 2018 at 2:38 pm

Hi Chad!
Thanks for the update and the link for the tutorial. I will def give it a try too.:)
I have the webRTC running from the first tutorial. Should I disable it like you suggest in your tweaking section?
I will run some more tests tonight with part 2.
Right now, I am using a cronjob to start a tweaked version of face_detection_camera.py script to take a pictures and email it to me but it’s CPU intensive and I am afraid I am just opening sockets without closing them… I looked over your code and it seems to take care of that. Am I right?

Reply
- Chad Hart says
  
  May 9, 2018 at 6:37 pm
  
  You don’t need to keep the raspicam service going. That’s not used in part 2. You can disable that or even remove the uv4l-raspicam-extras package. However, that should not cause a conflict unless you connect to the uv4l raspicam somehow – so you don’t have to remove it.
  
  I only open a single socket.
  
  One other suggestion – check out the https://motion-project.github.io. You could set this up to snap a picture when motion is detected (with many parameters to choose from) and then run inference on that. That project is very CPU friendly.
  
  Reply
Davesdere says

May 11, 2018 at 5:57 pm

Thanks Chad! Your code works perfectly fine! I was going about it all wrong. I was using the uv4l on port 8080 to stream as stated in the previous tutorials. I re-read your comments and went back to your original code, on the LAN. It worked like a charm until it got too CPU intensive(I assume) and crashed.
Here’s an example of what was printing in loop in the terminal.

Message from syslogd@raspberrypi at May 11 16:31:40 …
kernel:[ 780.303823] Internal error: : 11 [#1] ARM

Message from syslogd@raspberrypi at May 11 16:31:40 …
kernel:[ 780.304228] Process Thread 0x0xb390 (pid: 1059, stack limit = 0xc2190188)

But then I re-read the section about Optimizations. You really thought about everything!

I am very grateful about these blog posts. It got me to read about WebRTC, UV4l and to do some pretty cool tests with the face recognition scripts and live streaming!

Since my last post, I have a camera that records if there’s 2 faces and take a picture every minutes if there’s only 1 face. I wanted to do a timelapse but that was too heavy for the pi. The files are uploaded to Dropbox after being captured and are being deleted locally. I was thinking of using the Dropbox API and using it as a base to put my UI on it. I got it running in minutes but it’s too slow. Google Drive might be the next winner but I didn’t get so lucky with the Python 3 implementation.
I will give a try to the Motion project and the other webrthacks tutorial you suggested and run inference on the cloud or on some other machine at home. I wish I could use my old Xbox one lol.

Reply
- luminous says
  
  May 29, 2018 at 2:51 pm
  
  You can use a raspberry pi 3b or 3b+, just connect the aiy bonnet and the raspberry using a standard camera adapter cable, just cut the 3.3v line.
  I suggest doing the cut on the raspberry side of the cable as the cable is larger and easier to not screw up the cut.
  
  on the 3b+ be careful of the poe plug.
  
  Should give you a nice bump in the cpu area.
  
  Reply
Zhou Justin says

May 14, 2018 at 3:29 am

Hey guy, this is very cool project. I got some issues when run the server.py, seems like the uv4l was broken after about one minute. Then the process uv4l’s cpu was over 70% and ram was over 50%. Any ideas?

Reply
- Chad Hart says
  
  May 15, 2018 at 6:04 pm
  
  Did you apply the config changes in “Tweaking UV4L”? If the CPU gets overrun the whole thing stops working. If you are on a bad network connection with a lot of packet loss, UV4L will consume more CPU since it will need to work harder to encode the WebRTC stream. If you are going to use this outside of a tightly controlled environment I would recommend using a 640×480 resolution.
  
  Reply
  - Zhou Justin says
    
    May 16, 2018 at 5:52 am
    
    Thanks! That’s RIGHT, I config the uv4l and tweak it. The webrtc works well. But there still some problems. Such as the delay of the RECT drawed on the web.
    
    Reply
    - Chad Hart says
      
      May 16, 2018 at 10:19 am
      
      I did not experience the rectangle drawing delay when I first released the the post, but noticed the issue on the new AIY Kit image. I’ll need to look into that. Make sure to keep an eye on the repo for updates when I get around to do that: https://github.com/webrtcHacks/aiy_vision_web_server
      
      Or better yet, submit a pull request if you figure it out.
      
      Reply
      - Chad Hart says
        
        June 10, 2018 at 3:59 pm
        
        I did some investigation on the annotation delay. It was a problem with UV4L’s socket-to-dataChannel handling and Luca at UV4L fixed it today. Please run the following to fix this:
        sudo apt-get install --reinstall uv4l uv4l-webrtc-armv6 uv4l-raspidisp uv4l-raspidisp-extras
        
        Add uv4l-raspicam and uv4l-raspicam-extras if you are using those. Things will still get messed up if you overrun the CPU, but the UV4L fix should bring the annotation updates back up to the video framerate.
      - Zhou Justin says
        
        August 1, 2018 at 3:56 am
        
        That’s very COOL! I will try it this evening, and give you a result!
Christina Donovan says

May 29, 2018 at 12:49 am

Thanks for the writeup. I’m trying to output the recognition feed and inference to ffmpeg. Do you think that’s possible?

Reply
- Chad Hart says
  
  May 29, 2018 at 11:51 am
  
  You are looking to save the video with annotations overlaid on it while streaming with WebRTC? The challenge is avoiding simultaneous access to the raspicam, which is why I did the uv4l-raspivid approach in the first place. In theory you should be able to write the uv4l-raspivid feed to disk. I would try some of the comments here: https://raspberrypi.stackexchange.com/questions/43708/using-the-uv4l-driver-to-stream-and-record-to-a-file-simultaneously.
  
  Another approach would be to just do a camera.start_recording, and then use something like opencv to put the annotations over it later.
  
  Either way, you will need to be sensitive to your CPU consumption. If are are streaming constantly it might be easier to just use WebRTC to record remotely.
  
  Reply
  - luminous says
    
    June 11, 2018 at 5:52 pm
    
    Just plug the thing into a raspbery pi 3 or 3b+,
    
    https://photos.app.goo.gl/jETGuDE3UUnD1XRi6
    
    It works fine, you just have to cut the 3.3v line which is the first wire in the flex cable.
    
    Reply
    - Chad Hart says
      
      June 11, 2018 at 8:50 pm
      
      I am curious why you need to cut the 3.3 line? I didn’t see any references to that in other posts on that topic (like https://www.raspberrypi.org/forums/viewtopic.php?t=205926)
      
      Reply
      - luminous says
        
        June 11, 2018 at 10:15 pm
        
        Its 3.3v+ on both sides, my vision bonnet is from the first batch before they where pulled from shelves temporary, it is possible the issue has been fixed. Its the version where you have to flip the 22 to 22 pin flex cable and connect the arrow for the pi to the bonnet instead when connecting to a pi zero.
        
        Mine won’t work without clipping that line.
        
        I asked google about this however and they did not seem to be aware of any change to fix the issue above, or acknowledge that it was an issue. From the couple of emails I had tho, I very much got the impression that at least the support guys had no clue about the hardware.
        
        Maybe I will buy another kit and see what is different.
      - Chad Hart says
        
        June 28, 2018 at 4:30 pm
        
        I was able to get the AIY Vision Kit to work with a Pi 3 without adjusting any of the cables. Just make sure the cables are connected properly and everything works with the stock SD card image right out of the box.
    - Chad Hart says
      
      September 10, 2018 at 6:33 am
      
      Just a note for others on using the Pi3 – make sure to use a 3 mode B, not the B+. There are some issues on the latest image and the 3B+: https://github.com/google/aiyprojects-raspbian/issues/310
      
      Reply
QI XIANJUN says

June 21, 2018 at 5:59 am

pi@raspberrypi:~$ ~~sudo service uv4l-raspidisp restart~~sudo service uv4l_raspidisp restart
Failed to restart uv4l-raspidisp.service: Unit uv4l-raspidisp.service not found.

Reply
- Chad Hart says
  
  June 21, 2018 at 9:54 am
  
  It sounds like uv4l-raspidisp might not be installed. Are you sure it was included in the sudo apt-get install command to install the packages in the “Just Let me Try it” section above? You can see all the available services if you do a sudo service status-all
  
  Reply
  - QI XIANJUN says
    
    June 21, 2018 at 9:44 pm
    
    pi@raspberrypi:~$ sudo apt-get install -y uv4l uv4l-raspicam uv4l-raspicam-extras uv4l-webrtc-armv6 uv4l-raspidisp uv4l-raspidisp-extras
    Reading package lists… Done
    Building dependency tree
    Reading state information… Done
    uv4l is already the newest version (1.9.16).
    uv4l-raspicam is already the newest version (1.9.60).
    uv4l-raspicam-extras is already the newest version (1.42).
    uv4l-raspidisp is already the newest version (1.6).
    uv4l-raspidisp-extras is already the newest version (1.7).
    uv4l-webrtc-armv6 is already the newest version (1.84).
    0 upgraded, 0 newly installed, 0 to remove and 113 not upgraded.
    
    Yes. I installed all the installation packages!
    
    Reply
  - QI XIANJUN says
    
    June 21, 2018 at 9:55 pm
    
    I found the problem!
    Your command was wrong:
    What you wrote is: sudo service uv4l-raspidisp restart
    The correct one should be: sudo service uv4l_raspidisp restart
    “_” is not “-”
    Let’s change the content of the article. Lest other people be misled like me!
    
    Reply
    - Chad Hart says
      
      June 28, 2018 at 4:24 pm
      
      fixed. sorry for the typo
      
      Reply
QI XIANJUN says

June 21, 2018 at 10:44 pm

http://10.197.229.44:5000/
I access the page through the LAN is blank!

There is video on the monitor!
However, no face is identified after the face is recognized

i@raspberrypi:~$ ls
AIY-projects-python AIY-voice-kit-python bin Documents drivers-raspi Music Public Templates
aiy_vision_web_server assistant-sdk-python Desktop Downloads models Pictures python_games Videos

10.197.229.6 – – [22/Jun/2018 02:36:43] “GET / HTTP/1.1” 200 –
INFO:werkzeug:10.197.229.6 – – [22/Jun/2018 02:36:43] “GET / HTTP/1.1” 200 –
10.197.229.6 – – [22/Jun/2018 02:36:44] “GET /static/drawAiyVision.js HTTP/1.1” 200 –
INFO:werkzeug:10.197.229.6 – – [22/Jun/2018 02:36:44] “GET /static/drawAiyVision.js HTTP/1.1” 200 –
10.197.229.6 – – [22/Jun/2018 02:36:45] “GET /static/uv4l.js HTTP/1.1” 200 –
INFO:werkzeug:10.197.229.6 – – [22/Jun/2018 02:36:45] “GET /static/uv4l.js HTTP/1.1” 200 –

Reply
Swann Schilling says

October 29, 2018 at 7:20 pm

Hey there, thanks a lot for sharing your knowledge! Its been a great leap forward to my robotics project.
I am not a programmer or software engineer, so I depend on good tutorials like the one you did! 🙂
My understanding of the code only scratches the surface of understanding what is going on here…so my question would be, is it possible to also send the numerical values for the face.bounding_box in tandem with the camera stream?
I would like to be able to access my robots camera from within Unity, and drive my model with the position of the face.bounding_box. I would use http GET request from within Unity to fetch the values, if possible? Getting the Motion JPEG stream already works great, but it would be great to also have those values available! 🙂

Reply
- Chad Hart says
  
  October 29, 2018 at 10:03 pm
  
  So if I understand correctly you have already figured out how to get a video stream into Unity and now you just want a REST API on the Pi that will return the bounding box coordinates?
  
  I did something along those lines here: https://webrtchacks.com/webrtc-cv-tensorflow/
  
  Reply
  - Swann Schilling says
    
    October 30, 2018 at 7:01 am
    
    With the SampleUnityMjpegViewer I am able to stream the Pi camera to Unity http://192.168.178.60:8080/stream/video.mjpeg
    
    I am not able to import the http://raspberrypi.local:5000 though!
    Also it seems that I am not able to import http://192.168.178.60:9080/stream/webrtc…
    
    I would like to find a way to send face.bounding_box as numerical values, plus the camera stream to Unity!
    
    At the moment I am running Blynk local server on my Pi, which works great but I could never wrap my head around how to stream images…so this here was just what I was looking for!
    
    Let me know if I am on the right way, or if it is a dead end… 🙂
    
    Reply
    - Chad Hart says
      
      October 31, 2018 at 7:33 am
      
      You will not be able to use the uv4l WebRTC with Unity unless you setup a WebRTC stack inside Unity and adapt it to use uv4l’s signaling.
      
      If you have already figured out how to send a video stream, then I would recommend setting up a simple REST API (just using HTTP GETs) to send that information from the Pi to Unity.
      
      Reply
      - Swann Schilling says
        
        October 31, 2018 at 3:55 pm
        
        Thanks a lot for the info, that will safe me the time trying to set this up…so I will use a simple mjpg-streamer to get my camera stream to Unity and sent the data with my REST API!
        But thanks again for the help and for introducing uv4l WebRTC, it has some great features!
        All the other tutorials are also great btw, they are all on my to do agenda!! 🙂
Swann Schilling says

October 31, 2018 at 8:30 pm

One thing that would be nice, is to get the http://raspberrypi.local:5000 as a video.mjepg, is this possible?
Then I could just send the values from within the Python script to my REST server, I cannot import to Unity in another form than Motion JPEG…

Reply
Swann Schilling says

January 27, 2019 at 7:41 pm

Hello, again…so I got to explore this project a little more, and it seems to be the most reliable option for streaming after all! I just had to let go the idea of showing the camera feed inside of Unity, which is not the worst thing. So here I am again! 🙂

I am redesigning my robot a bit, and I was wondering, is there a way to flip the camera while using raspidisp? It would be beneficial to my robot design to do so!

Reply
- Chad Hart says
  
  January 27, 2019 at 8:23 pm
  
  I haven’t tried this, but there are hflip and vflip options if you look in the raspicam.conf configuration file that should help you.
  
  Reply
  - Swann Schilling says
    
    January 28, 2019 at 5:20 am
    
    Hey, thanks for the quick reply…it was a bit late and I was overthinking!! It turned out that I just had to do this in your server.py script! 🙂
    
    camera.vflip = True
    
    Reply
Swann Schilling says

May 13, 2019 at 2:26 pm

Hey, I just checked back on this since I am rebuilding my robot…
It seems like I cannot open the stream from my Windows 10 PC,
Android works fine with Firefox though.

Strange…is there a workaround? 🙂

Reply
- Chad Hart says
  
  May 13, 2019 at 5:41 pm
  
  Hi Swann – what browser are you using on Windows 10?
  
  Reply
  - Swann Schilling says
    
    May 15, 2019 at 7:07 pm
    
    Hey, sorry for the late reply…I am using Firefox on Windows 10!
    If there is any other browser you are recommending, I would surely switch!! 🙂
    
    Reply
Swann Schilling says

June 3, 2019 at 5:03 pm

Hey, can you confirm the issue with desktop browsers?
Firefox on Android is the only browser which still shows the video stream…
If there is a solution how to fix this, it would be highly appreciated!! 🙂

Reply
Swann Schilling says

August 15, 2019 at 5:48 am

Hey there, I updated the WebRtc packages…now Firefox Desktop on Windows 10 works fine, but Firefox Mobile on Android is only showing the bounding box, but no image!!

Just to let you know whats going on!! 🙂

Reply
LP says

January 12, 2021 at 8:10 am

For those curious about object detection, now the UV4L supports live, real-time object detection with accelerated TensorFlow Lite models, object tracking and WebRTC streaming “out-of-the-box”. Here is how to realize a Raspberry Pi-based robot doing the above:

https://www.linux-projects.org/uv4l/tutorials/video-tracking-with-tensor-flow/

Reply
- Chad Hart says
  
  January 12, 2021 at 10:04 am
  
  Thanks Luca. I am ordering the hardware to try this out.
  
  Reply

Part 2: Building a AIY Vision Kit Web Server with UV4L

Just let me try it

If you reboot…

Environment changes

Architecture

Hardware

WebRTC & DataChannels

Web server & client

Code

server.py

Server setup

Web Server

Threads

Running Inference

Object Detection Model

Face Detection Model

Socket

Setup

The socket_data function

Waiting for a connection

Sending data

Client

HTML – index.html

Receiving WebRTC – uv4l.js

Setup the WebSocket

PeerConnect setup

Start the call

ICE Candidates

Offer/Answer

Annotation – drawAiyVision.js

Test it

Optimizations

Use the latest AIY code

Tweaking UV4L

Performance Tests

What Else

Thanks

Related Posts

RSS Feed

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

SITE

Categories

Tags

Follow