Vocal Joystick Home

Vocal Joystick Video Demonstrations

Gnomedex 2008

We were honored to be invited to Gnomedex 2008. This conference featured many speakers on a variety of topics related to technology, blogging and the internet in general. More information can be found at the conference web site. Note that the talk is about half an hour and the videos are relatively large.

Winter 2008

After our first university press release, two news organizations came to film segments about the Vocal Joystick. These were sent to local news stations; a version of one of them can be found under the video section on our news page.


The following video presents Voicebot, a robotic arm controlled entirely though the Vocal Joystick. The arm itself is a Lynx 6 robotic arm, by Lynxmotion. The two clips in this video demonstrate forward and inverse kinematic control modes, and show how the Vocal Joystick is a viable option for 3-D control of a robotic limb.


The first clip in the video demonstrates forward kinematic control. In this method, each joint angle is controlled explicitly by a vocal parameter, and the "ck" sound changes from shoulder and elbow joint control to wrist control. In the second video clip, inverse kinematic control is used to position the gripper in cartesian space using vocal parameters, and again the "ck" sound toggles changes in gross positioning to changes in wrist orientation. Both clips demonstrate picking up and moving a small object on a table, where the 'ch' discrete sound is used to open and close the gripper.


The following videos were shown as a part of the presentation of the Vocal Joystick system at the ASSETS 2006 conference in Portland, Oregon.


The videos below show a VJ system with more degrees of freedom. The first video demonstrates playing video games using the VJ which allows movements in 8 directions. The second video demonstrates using five vowels plus pitch to control a simulated three-joint robotic arm. The third video demonstrate a four-way vowel classifier which enables diagonal movements via adaptive filtering. In addition, we have implemented a new velocity control scheme based on human perception of loud, normal and quiet voice. The vowel quality, loudness and discrete sounds can be adapted using an adaptation tool which is shown in the last video.

Early videos


The following was part of a submission to the ICSLP conference, now known as Interspeech.

ICSLP Submission



UIST Submission

An edited video showing several real-world VJ applications, including browsing a news site, playing a computer game, using Google Maps and a visualization tool. The system is based on a four-way vowel classifier, and diagonal movements can be enabled by filering techniques. The velocity is normalized for each vowel so that the cursor moving speed in different directions is more uniform. Voice-less consonants are again used as discrete sounds, but with a more robust rejection scheme, to reject breathing and extraneous speech.


In this system, both vowel classification and velocity control are adapted to the user's voice, and the VJ performs more reliably than our first system. In addition, consonant-vowel patterns are used for discrete sounds to avoid false positives caused by breathing.


Below are videos demostrating the first VJ system, where velocity is determined by amplitude and direction determined by the four cardinal directions in the compass rose. Note also that mouse clicks in this version are simulated using simple voice-less consonants, to ensure that there is no confusion with the vowel detector and the mouse-up/down controls. The two video demonstrations were performed by the two graduate student implementers of the VJ-engine: Here is Jon Malkin and Xiao Li demonstrating the VJ system.