Ease of use: speech recognition versus keyboard

Question

Ease of use: speech recognition versus keyboard

We see more and more speech recognition being implemented and requesting libraries that recognize speech well. What is the rationale (in terms of usability) behind a keyboard or keyboard? What reasons should you invest in this development?

Take call centers, for example. A few years ago, almost all call centers used IVR, which prompted the key for the menu. We now see more and more menus with a prompt for a spoken keyword and / or keyboard pressed: "Please say score or press' 1" to see your score. " Or we see the same thing in the phone book of companies: "Please say the name of the person you are trying to reach" ... "Franck Loyd" ... "Did you say Jack Freud? Please say yes if you want to reach that person or say no to try again. "

I guess that's a plus when you're in the car without taking your phone, but is it worth the extra time? Longer interaction for all options, longer time when trying to analyze, if something was said, and so on? Also, the reliability is better than it was, definitely, but once upon a time it looks more like a toy that someone decided to plug into a system to make it feel futuristic.

Any experience with IVR design or software that has used (or didn't choose) speech recognition?

Thanks!

0

speech-recognition usability voice ivr

lpfavreau May 22 '09 at 15:13

a source to share

4 answers

I believe that speech recognition, like any input method, has a pro and a con.

Pro

No learning curve, we've talked from a very early age.
Very intuitive user.
With your phone, you don't have to constantly move the headset out of your ear.

Con's

Longer waiting times
If the sound quality is poor, it takes several tries to get the right one.

+1

Dmitri Farkov May 22 '09 at 15:17

a source to share

In some cases, a company must handle rotary telephones. It can be found as a more costly affectivity to simply tune the recognition system instead of both.

Voice recognition has a lot more overhead than sensory tones. If you want the best results, you need to constantly tune the app and train the system for unrecognized pronunciations of words. You also need to be very detailed about how you are prompting the user for voice recognition, or you might get unexpected responses.

The overall sensory tone is much simpler as there are only a limited set of possible parameters at any given time.

If your application is straightforward enough, your voice repeats that many only complicate it. Press 2 for another language.

+1

cwhite May 23 '09 @ 2:46 am

a source to share

Speech recognition is definitely the wave of the future when combined with touchscreen technology. I am using Tazti speech recognition as an example. It is available in XP and Vista versions. Since the Microsoft Touchscreen "Surface" platform runs on Vista, I am confident that Tazti will work with touch technology. When I tried Tazti speech recognition the built-in commands worked great. It also allows me to create my own speech commands and they work great too. Google and Yahoo voice search, Wikipedia Youtube and many other search engines works great. Has many other features. But he has no dictate. I found that I am removing 70% or more of my clicks from the internet .... maybe more. NOTE. Tazti is a free download from their website.

+1

saraG May 24 '09 at 1:36

a source to share

Jim rush · Accepted Answer · 2009-08-17T13:09:55+0000

What is the rationale (in terms of usability) behind it compared to a keyboard or keyboard?

Usability is a very broad term. If I try to enter my address using the touchpad, it is not very convenient. Some argue that using a speech engine with an overall success rate of 70-80% is also not very convenient. As stated in other posts, hand entry can be much easier for those on mobile. However, the use of words versus digital input may be less intuitive than touchtone phones if the topic is somewhat off-kilter. Listening terms and phrases of a caller who are not very familiar may not memorize them during the 10-30 second prompt, but they may finger hover over the best sound selection or memorize the selection order.

What reasons would you have to invest in this development?

This is a strange question. Usually, the decision to use speech or not in an IVR environment is independent of the worldview in the world. Unless you have a specific requirement that really requires speech, you almost always lower your overall success rates. Speech is usually a factor in corporate image ... or has the latest technology toy.

I guess that's a plus when you're in the car without holding your phone, but is it worth the extra waiting time?

Speech recognition delays are not very high these days when using modern ASRs. In most cases, the input is processed in parallel with the speech and the time between the end of speech recognition is from 0.5 to 1 s. Keep in mind that many IVRs then have to scan the data after some logins, and this may look like a slower system. Normal input signals outside the 1s range are usually indicative of poor sweep.

It may not have been enabled when the original was implemented, but thanks to the tweaks, you make a lot of decisions about efficiency and accuracy. To get the next .1%, resources can be pushed beyond what should be at their peak.

Also, the reliability is better than it was, definitely, but once upon a time it looks more like a toy that someone decided to plug into the system to make it feel futuristic.

In general, yes. On a reliability note, you need to really look at the overall numbers to understand the system. This is a statistics battle where the person is not very important (unless they have a VP title or higher). Through input optimization (prompt offset), resource utilization, and other speech flow settings, you are trying to maximize accuracy. For basic natural language answers, you can get into the upper 90s. However, the overall success rate is much lower. Imagine 5 tips are all 98% (in fact, you usually have a beam of 99, and then a few mid-90s or just below): 98 * .98 * .98 * .98 * .98 = 90% ... This means 1 in 10 is unsuccessful. This is confusing and caller business rules. The DTMF input is usually close to 100% even after multiple inputs.

Any experience with IVR design or software that used (or didn't want) speech recognition? Yes. But I suspect this is really not the question you want. As someone on the technology side, this is usually not your decision and you have limited influence over it. If you're really looking for the pros and cons of speech:

Pros:

Cool / hip (mind you, speech is not enough, you need a lot of VUI talent and voice)
Good for the highly mobile crowd who avoids the earpieces. The future is supposed to mix speech with tactile input. May be. This will probably not happen from the IVR market.
Good for tasks that cannot be done with DTMF. Please note: Many of these problems tend to have low success rates in speech. Cost (compared to people) tends to be the driving force, not the usability. Disconnecting a call to the voicemail box for things like changing your address can be very cost effective.

Minuses:

Expensive to develop, deploy, and maintain. Adding new options can make a big difference in your success rates if you're not careful. Always track the impact of changes.
Often deployed inappropriately. For example, just indicate your choice on the digital menu. This almost often happens when we want calmness of speech, but cannot afford what we really need to achieve speech coolness.
Success rates will be lower and therefore call center costs will be higher.
Disclaimer usually focuses on specific prompts and individual callers. A caller who regularly experiences problems with your system will be very unhappy with you.
Callers get angry when they are not understood. Is your goal to identify a subset of your customer base and really piss them off?

Ease of use: speech recognition versus keyboard

More articles: