Saturday, August 12, 2017

Voice recognition for your Linux VM

I remember in the good old days of Internet Explorer 6 when Dragon NaturallySpeaking would allow you to do most anything that could be done. Technology then moved forward, other versions of Dragon came in and went, Flash and dancing monkeys became more ubiquitous on webpages. I have not been able to re-experience that level of control since then, but while Dragon is still mighty for web browsing on native systems, it has become quite expensive with the dropping out of the home versions and is not available for Linux at all.

Is it possible to get some of that control back and for your Linux virtual machine guest?

As it turns out, it can be done, albeit in a limited fashion, thanks to the generous time spent by independent developers. For Windows 10 this is what you'll need:

  • Glovepie .23 by Carl Kenner
  • The latest version of Firefox for your VM, though a older one might still work
  • The Mouseless Browsing add-on by Rudolf Noe for Firefox (as far as I know, there is no Chrome equivalent)
  • A good microphone. Shop around for the best one you can afford.
  • A script for Glovepie (I include mine a bit down)
  • Your virtual machine client (I use virtualbox)

With this setup, the Mouseless Browsing add-on gives numbered labels to all clickable items on a webpage rendered by Firefox; with the script Glovepie translates your voice commands, leveraging Windows 10's own voice engine, into keystrokes which the VM passes on to the add-on; this one in turn reacts to the received keystrokes and magic happens. In other words, you browse by speaking the numbered labels and your own custom commands.

There is a bit of a lag when each page is number-populated, though some benefit might be gained on Firefox goes multithreaded on version 54 or so. For my own personal set up, I added the control key as a switch and set the control + Y key combination as the hide numbers switch.