Search My Blog

Friday, July 9, 2010

Get started with speech recognition - Hack a Day

Get started with speech recognition

posted Jul 9th 2010 8:14am by Rachel Fee
filed under: how-to

Headset and microphone

Speech recognition makes it easier for us to be lazy with our devices – or perhaps set up the coolest voice-controlled project around. After the voice controlled home automation post, we received a lot of emails asking “how can I make it recognize my voice?”. Whether your project involves a PC or an Android phone, a high-budget, or no budget at all, there is a solution out there.  Join us after the break for a complete set of instructions on setting up speech recognition, and some of the best software options out there to meet your needs.

Got a Microphone?

Using a microphone is the only way to get your voice commands to the computer for interpretation. If you’ve got a laptop, you’re probably set to go, as most laptops come with microphones already built in. Not sure? Look for a small hole around the screen or keyboard. It may be labeled, but not always. You can also try checking the list of features in your computer’s manual, or head to your control panel and select “Sound”. In this pop-up window, you’ll find a tab titled “Recording”. If you’ve got a mic installed, it will be listed here.

A built in mic

If you’re using a desktop, you’ll likely have to buy an external microphone. Many webcams include a built-in mic – check the package to make sure. Some newer media keyboards also include a microphone. If this is the case for you, you may have to reposition your keyboard out of confined space to reduce echo. If you’re a PC or Xbox 360 gamer, you might have a headset used to communicate with other players live. This can double as a mic for voice recognition. Don’t have any of these? Head to your nearest store which sells computer accessories – try Best Buy, Future Shop, RadioShack, or your favorite locally-owned retailer. A basic, usable microphone can range from a few dollars to hundreds of dollars. While a six hundred dollar microphone is unnecessary unless you plan to record a studio album with your computer, it might be a good idea to stay away from the cheapest of the cheap – these can often have a choppy and uneven sound of which your computer can not interpret. Generally a headset mic (or gaming headset) is the best way to go, as it sits close to your mouth for minimal interference. Make sure the mic you choose is compatible with your computer’s operating system and has an input your computer uses, and buy away!

A headset microphone

Flickr: [Yoppy] [Link]

Set Up Your Voice Recognition Software

Window 7 or Windows Vista

Voice recognition on either of these operating systems is as easy as a few clicks. With these operating systems, voice commands are thorough and simplistic, allowing you to control everything from form navigation, menu navigation, Office programs, and more. For almost anything you need to do, there is a voice command. To get started, head to the control panel and select “Speech Recognition”.

Control Panel in Windows 7

From here, you can test your microphone, train your computer to understand your individual style of speech, or view and print a reference card containing the commands your computer will understand.

Speech Recognition Dashboard in Windows 7

You can also take a tutorial which teaches you the ins and outs of speech recognition in one simple lesson. Select the “Start Speech Recognition” option when you’re ready to get started. This leads you through optimizing your computer’s sound input with positioning tips and speech tests, and guides you through the rest of the configuration in a very user-friendly manner. When you finish the wizard, you’ll be ready to go!

Speech Recognition Wizard in Windows 7

You can refer back to the speech recognition reference card as often as you need to review the commands your computer will understand.

Speech Recognition Reference Card in Windows 7

Windows XP

Voice recognition in XP is as easy to set up as it is with the newer Windows operating systems, however, it lacks the vast array of features that Vista and 7 offer. Speech recognition is supported by all Microsoft Office programs, however, only 2002 and 2003 versions are supported. With a version earlier that 2002, or with 2007 or 2010 versions in XP, you’re out of luck, as built in speech recognition is not supported. Otherwise, basic commands are not always supported, and speech recognition cannot be used with all programs. In general, you will have to enable speech recognition specifically for each program with which you wish to use it, and it will not be available in all programs.

Windows XP uses a speech recognition engine which comes with Office XP, though is not always installed by default. Open control panel, and from classic display, select the “speech” option. If you’re using the newer, categorical menu in XP, you’ll have to first select the “Sounds, speech, and audio devices” option.

Speech Recognition Icon in Windows XP

Youtube: [mickmoose429992] [Link]

If you see a “speech recognition” tab in “speech properties”, you’re ready to go, as the engine has already been installed.

Speech Properties in Windows XP

Youtube: [mickmoose429992] [Link]

If this option is missing, you’ll need to install it. From the control panel, select the “add or remove programs” option.

Add or Remove Programs in Windows XP

Youtube: [mickmoose429992] [Link]

Find Microsoft Office XP, and select the “change” option. Be careful not to uninstall!

Change Microsoft Office XP

Youtube: [mickmoose429992] [Link]

Find “features to install”, select the “alternative user input” option, followed by the “speech” option. Select “run from my computer” and click update. This automatically includes speech recognition in all Office programs, and makes the feature available to other programs.

Add Speech to Microsoft Office XPYoutube: [mickmoose429992] [Link]

Mac OS X

Apple was one of the first to come out with speech recognition – a crazy idea at the time. This was back in 1993. We’ve come a long way since then, from more fluid, user-friendly controls to the ability to perform almost any action without ever touching your keyboard. Setting up speech recognition in OS X is a breeze. Once you’ve got you mic ready, select “system preferences” from the Apple drop-down menu. From this menu, select the “speech” option.

Mac OS X Speech Feature

Youtube: [fifedjdomo] [Link]

Enabling “Speakable Items” will turn on the default commands, allowing you to perform most basic tasks.

Mac Speakable Items

Youtube: [fifedjdomo] [Link]

Through available options, you can set up your microphone and further customize the use of the program. The set of commands used to control your computer is fully customizable. Pair this with VoiceOver, a program designed for the blind, and you’ll hardly need to touch your computer in order to use it.

Linux Ubuntu

Linux does not currently have a complete solution for speech recognition. Though several projects have been started, none have been finished. There are several pieces of software that can perform some of the speech recognition tasks that Windows or Mac can accomplish, but nowhere near as thoroughly or easily. There is also no proprietary software for speech recognition with Linux, however, there are some partially-completed open source solutions for Ubuntu. Julius Speech Recognition engine is one of these utilities – a program used to interpret and execute a set of pre-determined voice commands. Detailed instructions for installation can be found [here].

Julius Main PageYoutube: [jgraves1141] [Link]

Documentation on the installation and use of Julius is very limited due to the fact that the program is not completely finished, so you may not want to attempt an install unless you are completely comfortable with the use of Linux. The Julius package available for download contains two parts – an installer, and the program. First run the installer which will take you through the installation of Julius.

Another great solution is to use a Windows-based program such as Dragon NatuallySpeaking in combination WineHQ, however, there are lapses in fluidity that often have to be worked-around. For example, in some cases, a basic paragraph must be narrated to Dragon’s text editor and then copy-pasted into the appropriate location rather than transcribed directly to the appropriate program.

WineHQWineHQ: [Link]


Top Third Party Proprietary Software

If you have an older operating system, or simply don’t like the speech recognition software included with your operating system, a third party program may be what you need. There are dozens of free and paid speech recognitions out there, customizable, non-customizable, open source, for business, for personal use, and more. With so many options, you’re sure to find exactly what you want at a cost you can afford. Some of the most popular:

Dragon NaturallySpeaking

Dragon is a name that pops up over and over when searching for speech recognition software. Made for PC, it’s highly regarded for its speed, accuracy, ease of use, and large number of commands. The basic version of Dragon for home use is around $100 US, though many versions are available with more specific features, such as packages for medical or legal offices. These packages can cost over a $1000 US, though are unnecessary for the basic user. Dragon NaturallySpeaking software packages also include a mic, so you won’t have to try and find your own.  In addition to Windows, many users have claimed great success with Dragon in combination with WineHQ for Ubuntu.

Dragon NaturallySpeakingDragon NaturallySpeaking: [Link]

MacSpeech Dictate

MacSpeech is produced by the same makers as Dragon NaturallySpeaking. It was built from the ground up, rather than being ported, so it is free of the bugs that typically come with adapted software. Similar to Dragon, MacSpeech offers not only dictation recognition, but customizable speech commands as well, and includes a mic in the package. Also following the Dragon theme, medical and legal versions are available, as well as an international edition which supports Italian, French, and German in addition to English. These speech recognition tools for Mac range from $150 US to $600 US.

MacSpeech DictateMacSpeech Dictate: [Link]

IBM ViaVoice

IBM’s ViaVoice recognition software is designed primarily for use with small mobile devices and vehicle automation systems, though it’s quite highly regarded amongst computer users as well. ViaVoice offers text-to-speech in addition to voice recognition. The command library is intuitive, and the user does not need to stick to a standard set of commands to make use of all the features – the program can interpret most commands as they are given. The speech library contains over 200 thousand words; far more than the average person’s vocabulary. Supported by IBM ViaVoice, in addition to many mobile OS’s, are standard Windows and Mac operating systems.

IBM ViaVoice

Third Party Open Source and Free Software

Open source or free voice recognition software that works well is extremely difficult to find – there is really no winner in the open source race for free voice software. In fact, there is hardly a race at all. Numerous open source Linux projects have been started, but due to the extreme scale, none have been finished. Below is a project you can contribute to in order to get the ball rolling on some great open source speech recognition software, as well as a toolkit for your own uses.

VoxForge

VoxForge is a project working to compile a collection of transcribed speech for use with both open source and free voice recognition engines. Upon the completion of this project, free open source speech recognition programs should be given the jumpstart to increase significantly. If you’d like to help the project, you can visit the VoxForge website [here].

VoxForge

VoxForge: [link]

CMUSphinx

Sphinx is now on version 4 (Sphinx 4). Perhaps the most (or only) popular open source speech recognition tool, Sphinx is licensed under BSD and is written in Java. Sphinx also offers a mobile version called “PocketSphinx”. This may be more useful for developers than the average user, but it’s one of the only solutions available, not to mention a versatile and thorough one. It does not come ready to go out of the box, but rather is a tool that can be utilized by developers. It certainly needs some work before it’s ready to go.

CMUSphinxCMUSphinx: [Link]


How to Install CMUSphinx

Setting up CMUSphinx is not the easiest task, but it is likely to pay off with a great product. This install needs to be done manually.

Before you get started, you’ll need a few things – Perl, in order to run the scripts, and a C complier for the source code. Perl is free, and included with most Linux distributions. GCC (GNU Compiler Collection) is a good tool for the C portion of the source code. A word alignment program is also necessary – CMU suggests “Sclite”, a tool specifically used for speech recognition programs.

The databases you will need are available [here] . You’ll need either AN4 or RM1. Next, you’ll need to set up the trainer. A trainer helps your computer interpret your commands. Set up the tutorial – this will include copying the scripts to the proper area. The decoder is next. Though you can pick any decoder you choose, CMU describes the installation with Sphinx 3, and encourages you to perform your testing with Sphinx 3. Once you have all of the appropriate files in the correct directory, it’s time to compile, and set up the tutorial. Perform a training run, and finally, perform a decode. This set-up is extremely complicated, as is likely best left to the professionals – certainly not something for most average users.

Full instructions can be found on the Carnegie Mellon University’s Sphinx website [here].

This demo shows Sphinx in action:

You’re Ready To Go!

Once you’ve got your mic functional and in-place, as your speech software set up and configured, you’ll be ready to get started! Sit back and get talkin’!

Recent Posts



Reader Comments

Go there...
http://hackaday.com/2010/07/09/get-started-with-speech-recognition/

I used MS Speech and Voice Recognition Software allot back in the early Win XP Days along with some other Freeware that I found. And I actually bought Dragon Naturally Speaking (can't remember the version number now). I found it all fun and good. But not very practical for daily use in operating my computer. I was hoping to never have to Type Again!;) Maybe it's because I grew up in TX... But that darn thing never could understand a word I said!;) Dragon Naturally Speaking was so hard to train that I finally gave up on it all together. I kept meaning to go back and try training it again. But I have been using Linux as my Main OS for 5 years now and just never really want to Boot up Win XP just to play with the voice stuff. Fedora and Debian Linux does have decent text to speech Apps available. But I haven't found anything good for speech to text, like they said in the article. Maybe if I liked talking allot more. I would have used it more. But, then again, theres the problem of everyone else in the house listening in while you talk to your Computer. I don't like typing out long sets of text either and honestly I got into the whole voice thing, because I have a hard time typing without making allot of mistakes and I'm pretty slow too. But, sadly speech to text produced even more errors than typing, for me. And since I registered my Dragon software right after purchasing it. I still get constant Adds from Nuance, the Co that Produces Dragon Naturally Speaking to this day!!!:( Here's an excerpt from one of their Adds... "Dragon NaturallySpeaking, the Nuance Speak & See suite includes speech recognition, a screen reader and magnifier, a reading and writing toolbar, and a language learning assistant. The benefits include: Easily use your computer with four unique accessibility tools. Includes speech recognition, a screen magnifier and more - in one package. Read and write documents or e-mail with greater independence. Get more done at work, home or school." Now a-days, you can get most all of that functionality for Free in Open Source and Freeware! Most of these are even free from MS! So, why would I pay $199 ("on sale") for their Software!!!? So, if you want to try out Voice Recognition. Just look into the Programs available for your OS and look for Open Source and Freeware... And if you are shopping for a Mic, headset or any other Electronics Stuff. Check out http://www.parts-express.com. I'm a Reseller for them. My site is at http://bishopco.com, but http://www.musiciansfriend.com/, pretty much put me out of business. They can buy and sell cheaper than I can...

By the way, good info in the article.

Thanks Rachel:)

Don

No comments: