When most people think of WebRTC they think of video communications. Similarly, home surveillance is usually associated with video streaming. That’s why I was surprised to hear about a home security project that leverages WebRTC not for video streaming, but for the DataChannel. WebRTC’s DataChannel might not demo as well as a video call, but as you will see, it is a very convenient way to setup peer-to-peer information transfer.
Ivelin Ivanov is a long-time open source contributor in a variety of projects and organizations like RedHat, Mobicents, and Telestax. Recently he started to address some home IoT privacy concerns with a new open source project, Ambianic. His project includes many interesting elements, including computer vision on a Raspberry Pi (one of my favorite topics), but for this post I asked him to talk about how he leveraged WebRTC and the DataChannel in his architecture.
{“editor”, “chad hart“}
Home surveillance systems are now mainstream. Easy to setup and easy to use. Mobile app to see who is at the door. Cloud backup for a subscription fee. All great. Except that privacy issues are getting out of control. In this post, I’ll dive into a technical design and implementation of a solution that leverages WebRTC to prevent data sensitive data storage on devices that aren’t directly controlled by the user.
Requirements
First let’s clarify that the goal is not to implement yet another variation of Ring. When I started the project, I did consider the possibility, but that was not what I would end up using at my own house.
I am not too interested in talking to people who show up unannounced at my door while I am away. If they are friends, they know how to reach me. Nor am I remotely interested in watching mostly uneventful surveillance camera streams.
What I really wanted to have is a system that would tell me about things I actually care to know about. Meaningful, actionable observations. For example I want to know if:
- UPS delivered the package today as their SMS said they did, is the package at my house or did they drop it at my neighbor’s again?
- Someone uninvited grabs a package from my front door?
- My kids stage a zombie attack at midnight while I am sound asleep?
And since the data comes from my private property, it must not be shared, leaked or sold without my explicit knowledge and permission.
High Level Architecture
What I ended up with is a 3 layer architecture:
- Edge IoT device (Raspberry Pi, gstreamer, Python, TensorFlow Lite, aiortc)
- Running on the same local network as my cameras
- Constantly observing the video streams from the cameras
- Occasionally inferring that there is something interesting I should see
- Storing observations and sending events to my mobile device
- User Interface (web browser on mobile)
- Progressive Web App (PWA) accessible from anywhere
- Able to communicate securely and directly with the edge device
- Installable on desktop and mobile as a native app
- Storing in a local database my favorite picks from the event timeline
- Communication layer (REST over WebRTC)
- Secure peer-to-peer data channel between UI app and Edge device
- Enable UI app to see Edge device as a regular REST service
- Enable Edge device to serve data over plain REST interface
- Privacy preserving signaling – no ability to identify peers with a physical address or a person
Signaling Flow
The signaling part of the system has two main responsibilities:
- Initial discovery and pairing between user’s UI app and Edge device
- Brokering terms between UI and Edge for consecutive peer-to-peer (p2p) sessions
Device Discovery and Pairing
The initial discovery part relies on physical proximity and trust similar to Bluetooth. However web browsers don’t readily support Bluetooth, zeroconf, mDNS or any other standard p2p pairing protocols yet. The next best idea I found was implemented successfully by sharedrop.io. They call it HTML5 clone of Apple’s AirDrop.
The idea makes two important assumptions:
- The user only allows trusted devices on their local home network.
- Devices connected on the same local network share the same public IP for Internet traffic. The shared public IP does not need to be permanent. It is only used for a few seconds during initial pairing.
Both assumptions are pragmatic with the caveat that some users still leave their home networks poorly secured. There is still no cure for human negligence.
During the initial pairing process, both the UI app and Edge device ask the signaling server for a room ID. The signaling server calculates sha1 hash using the public IP of each client and a secret string. This makes it practically impossible for two clients in different households to end up with the same matching room ID.
Once UI and Edge obtain their local room ID, they join the room and ask the signaling server for a list of room members. This is the point where UI and Edge discover for the first and only time each other’s unique IDs (which are generated via crypto strong UUID4).
Peer to Peer Connection Negotiation
From that point forward, UI and Edge can establish p2p sessions as needed by exchanging SDP offers via the signaling server.
To minimize the possibility of man in the middle attacks, or peer impersonation, the signaling server is ephemeral. It has no permanent storage of client IDs or passwords. It’s not even able to identify its clients (peers) with anything other than their dynamically ISP assigned (non-static) public IP address at the time of pairing.
The following sequence diagrams shows a simplified discovery and negotiation flow:
Identity Management
I wanted to avoid forcing users to trust yet another cloud provider with their account information. The chosen design delegates trust to the user’s local network and personal device access. Hopefully users are well aware by now that if they don’t lock down their phones and WIFI networks well, lots of bad things can happen.
Since the pairing mechanism assigns crypto strong IDs to each peer, that also serves as an identity management solution. The peers simply have to store locally these IDs, which is simple enough for both Edge (Pi local file) and UI (PWA localstorage/indexeddb).
More layers of security can be added in the future such as Edge 2FA and PWA biometrics as needed.
P2P Data Flow
Once p2p connection is negotiated and the shortest Internet path between them is established, peers can begin bi-directional data (and media) exchange.
While it’s possible to stream camera feeds directly to the UI over WebRTC media, it’s not the primary goal of this project. As mentioned earlier, the main goal is to inform the user of any events of interest. Like a social app news feed. Skip the boring stuff. This of course is subjective, which means that over time the system has to learn about the user’s particular preferences. But let’s stay on track; the AI part of the system is subject of a different blog post.
WebRTC provides DataChannel API as a means of sending and receiving data messages asynchronously.
The main function of the UI is showing a timeline of interesting events such as cars, license plates, pets, people and faces near the house. That’s a relatively straightforward problem to solve if we have something like a newsfeed REST API. Plenty of good web browser and server side libraries for that.
Unfortunately the WebRTC DataChannel API is a relatively low level transport API which is not as intuitive for app developers as the HTML 5 Fetch API.
However it is possible to write a Fetch implementation over DataChannel. So that’s what I did on the browser side for the UI.
Fetch for DataChannel
Browser-side HTML
Here is an example snippet of code from the /timeline UI web page (Vue.js) which pulls from the Edge REST API latest events as JSON array and then fills in image blobs into the HTML body:
1 2 3 |
<v-list-item v-for="(sample, index) in timeline" :key="index" class="pa-0 ma-0"> <v-list-item-content class="pa-0 ma-0"> <v-img v-if="sample.args.thumbnail_file_name" :src="imageURL[index]" class="white--text align-start" alt="Object Detection" contain @load="setImageLoaded(index)"> |
If you are familiar with Vue.js or a similar front end framework (React, Angular, etc.) the code above looks quite typical. The part that is not so typical is how the front end framework gets the data from the “back end”. Since there is no SaaS back end here, our WebRTC abstraction has to take care of the non-trivial async mechanics of mapping fetch requests to a sequence of async DataChannel interactions.
Browser-side JavaScript
Here are a few of lines from the get(url, params) method of the PeerFetch class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
async get ({ url = '/', params = {} }) { console.debug('PeerFetch.get enter', { url, params }) var esc = encodeURIComponent var query = Object.keys(params) .map(k => esc(k) + '=' + esc(params[k])) .join('&') url += '?' + query console.debug('PeerFetch.get', { url, query }) console.debug('PeerFetch.get post process', { url }) const request = { url, method: 'GET' } // get a ticket that matches the request // and use it to claim the corresponding // response when availably const ticket = this._enqueueRequest(request) const response = await this._receiveResponse(ticket) return response } |
Notice it may need to multiplex simultaneous request/response pairs over a single DataChannel. This is separate from the issue of handling large payloads in a single request or response, which is an ongoing topic in the WebRTC community with various non-standard interim solutions.
Edge-side (Python)
On the Edge side, respectively there is no immediate way to serve web resources over DataChannel, but it is possible to write an http-proxy over DataChannel that hides the mechanics from the web server (Flask).
Here is how the receiving end of a PeerFetch request looks like in peerjs.ext.HttpProxy :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
@peerConnection.on(ConnectionEventType.Data) async def pc_data(data): log.debug('data received from remote peer \n%r', data) request = json.loads(data) # check if the request is just a keepalive ping if (request['url'].startswith('ping')): log.debug('received keepalive ping from remote peer') await _pong(peer_connection=peerConnection) return log.info('webrtc peer: http proxy request: \n%r', request) # schedule frequent pings while waiting on response_header # to keep the peer data channel open waiting_on_fetch = asyncio.Event() asyncio.create_task(_ping(peer_connection=peerConnection, stop_flag=waiting_on_fetch)) response = None try: response, content = await _fetch(**request) except Exception as e: log.exception('Error %s while fetching response' ' with request: \n %r', e, request) finally: # fetch completed, cancel pings waiting_on_fetch.set() if not response: response_header = { # internal server error code 'status': 500 } response_content = None return response_content = content response_header = { 'status': response.status, 'content-type': response.headers['content-type'], 'content-length': len(response_content) } log.info('Proxy fetched response with headers: \n%r', response.headers) log.info('Answering request: \n%r ' 'response header: \n %r', request, response_header) header_as_json = json.dumps(response_header) await peerConnection.send(header_as_json) await peerConnection.send(response_content) |
Request-response flow
The following sequence diagram is a simplified version of the request-response flow:
In addition to keeping track of request-response correlation as would be expected from an HTTP proxy, there is also a need to deal with firewalls that tend to close mapping ports within a few seconds of inactivity. While the HTTP proxy is waiting on a response from the Edge REST server, it also needs a keepalive mechanism for the DataChannel to the remote peer.
Notable external resources
While working on this project, it was of great help to be able to leverage two other open source WebRTC projects:
- PeerJS – a browser side abstraction which has been around for awhile and has an active community.
- Aiortc – a relatively new, but quickly evolving webrtc implementation in Python.
PeerJS comes with a simple signaling server, which fits perfectly this project and a client that simplifies the stages of signaling negotiation. I forked it and added the local room discovery capabilities. There is a discussion with the core PeerJS team to bring the room extension to the main project.
PeerJS did not have a Python version, so I ported that. There is a separate discussion to include the Python port in the main PeerJS repo.
Aiortc’s rise in popularity seems to be reflective of the growing number of practical use cases for WebRTC in the IoT and AI communities where Python is a major force. Aiortc is far easier to work with in Python context. The code is clear, easy to read, debug and contribute to from a Python developer perspective.
Summary
Overall it turned out to be a feasible and fun project to implement remote camera surveillance without trusting any cloud services with private data.
The version of the code as of this writing has been operational at my house for a few weeks without unplanned downtimes. I am looking forward to feedback from WebRTC experts on potential weak spots and areas of improvement. Check out the repo and give it a try!
{“author”: “Ivelin Ivanov“}
Aswath Rao says
Can you use QR Code based pairing first used by Tim Panton and then subsequently used by WhatsApp and possibly others?
Chad Hart says
Sharing the thread about this on Twitter for others: https://twitter.com/aswath/status/1232510736708050945
Javeed says
Hi,
Is webrtc the fastest and scalable way to transfer live video feed to server? can’t we use WebSockets?
Ivelin Ivanov says
Javeed,
For two-way real time video, WebRTC is the way to go.
For video streaming from a web server to a browser, you have many options. WebSockets is one of them.
Ivelin