• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
webrtcHacks

webrtcHacks

Guides and information for WebRTC developers

  • Home
  • About
    • Chad Hart
    • Philipp Hancke
  • Subscribe
  • Contact
  • Show Search
Hide Search

Guide audio level, FFT, getUserMedia, RMS, VAD, volume, Walkthrough, Web Audio Philipp Hancke · December 10, 2015

Shut up! Monitoring audio volume in getUserMedia

Speak No Evil

A few days back my old friend Chris Koehncke, better known as “Kranky” asked me how hard it would be to implement a wild idea he had to monitor what percentage of the time you spent talking instead of listening on a call when using WebRTC. When I said “one day” that made him wonder whether he could offshore it to save money. Well… good luck!

A week later Kranky showed me some code. Wait, he is writing code? It was not bad – it was using the WebAudio API so going in the right direction. It was enough to prod me to finish writing the app for him.

The audio stream volume sample application from Google calculates the root mean square (RMS) of the audio signal which is extracted from the input stream using a script processor every 200ms. There is a lot of tuning options here of course.

Instead of starting from scratch, I decided to use hark, a small open source module for this task that my coworker Philip Roberts had built in mid-2013 when the WebAudio API became first available.

Instead of the RMS, hark uses the Fast Fourier Transformation to obtain a frequency domain representation of the input signal. Then, hark picks the maximum amplitude as an indication for the volume of the signal. Let’s try this (full code here):

1
2
3
4
5
6
7
8
9
10
11
12
  var hark = require('../hark.js')
  var getUserMedia = require('getusermedia')
  getUserMedia(function(err, stream) {
    if (err) throw err
 
    var options = {};
    var speechEvents = hark(stream, options);
 
    speechEvents.on('volume_change', function(volume) {
      console.log('current volume', volume);
    });
  });

On top of this, hark uses a simple speech detection algorithm that considers speech to be started when the maximum amplitude stays above a threshold for a number of milliseconds. Much less complicated than typical voice activity detection algorithms but pretty effective. And easy to use as well, just subscribe to two additional events:

1
2
3
4
5
6
7
   speechEvents.on('speaking', function() {
      console.log('speaking');
    });
 
    speechEvents.on('stopped_speaking', function() {
      console.log('stopped_speaking');
    });

Tuning the threshold for accurate speech detection is pretty tricky. So I needed visualization (and just requiring hark only took five minutes so I had plenty of time). Using the awesome Highcharts graph library I quickly added plot bands to the graph I was generating:

With the visualization I could easily see that the speech detection events happened a bit later than I expected since hark requires a certain history over the threshold for the trigger to work (say: 400ms).  To adjust for this in the graph had to substract this speech starting to trigger time from my x-axis (now()– 400ms for example).

That graph is still visibile on the more techie variant of the website so if you think the results are not accurate… it might help you figure out what is going on. I am happy with the current behavior.

The percentage of speech then calculated as the sum of the intervals that speech is detected divided by the duration of the call. As a display, a gauge chart is used with three different colors:

  • up to 65% speech time: green
  • up to 79%: yellow
  • more than 80%: red

Adding remote audio to this would be awesome. However, while the WebAudio API is supported for local media streams in Chrome, Firefox and Edge, it is only supported for remote streams in Firefox. Hooking this up with the getStats API (in Chrome) to get the audio level would certainly be possible, but would require calling getStats at a very high frequency to get proper averages.

Check out the app in action at talklessnow and let us know what you think.

{“author”: “Philipp Hancke“}

Guide audio level, FFT, getUserMedia, RMS, VAD, volume, Walkthrough, Web Audio

Related Posts

  • Your Browser as a Audio Conference Server with WebRTC & Web Audio (Alexey Aylarov)Your Browser as a Audio Conference Server with WebRTC & Web Audio (Alexey Aylarov)
  • Surviving Mandatory HTTPS in Chrome (Xander Dumaine)Surviving Mandatory HTTPS in Chrome (Xander Dumaine)
  • getUserMedia Mirrors and Frame RatesgetUserMedia Mirrors and Frame Rates
  • How to Figure Out WebRTC Camera ResolutionsHow to Figure Out WebRTC Camera Resolutions

RSS Feed

Reader Interactions

Comments

  1. leeoxiang says

    January 22, 2017 at 9:39 am

    how i can do this in native api?

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Sponsored. Become a webtcHacks sponsor

Email Subscription

Subscribe to our mailing list

* indicates required

Twittering

Tweets by @webRTChacks
webrtcHacksguides and information for WebRTC developers

Footer

SITE

  • Post List
  • About
  • Contact

Categories

  • Guide
  • Other
  • Reverse-Engineering
  • Standards
  • Technology

Tags

apple Blackbox Exploration Brief camera Chrome code computer vision DataChannel debug e2ee Edge extension gateway getUserMedia ICE ims insertable streams ios ip leakage janus jitsi MCU Microsoft NAT opensource Opus ORTC Promo Q&A raspberry pi Safari SDES SDP sfu signaling simulcast standards TensorFlow TURN video vp8 Walkthrough Web Audio webrtc-internals wireshark

Follow

  • Twitter
  • YouTube
  • GitHub
  • RSS

webrtcHacks · copyright © 2023