If you’ve ever used your voice to ask a computer a question – such as Siri, Alexa, Google or Cortana – then you know this technology is already amazing… and it’s getting better by the day.
That said, it’s also a bit mysterious. What’s going on with voice recognition, exactly? Do computers have ears now?!
In this article, we’ll explain how voice recognition technology works, along with highlighting a few of its most innovative applications.
By the end, you may not know EXACTLY what happens to your voice once it enters the machine – we’ll save that for the engineers – but you WILL know enough to understand how this breakthrough technology is changing the way we live for the better.
What is Voice Recognition?
Voice Recognition (or “Speech Recognition”) describes any technology that translates spoken language to computer data.
Voice Recognition requires both hardware and software to work. While your computer doesn’t need “ears” exactly, it DOES need a way to receive sound – like a microphone. That’s the hardware part.
Once it receives the sound, it uses software to analyze the sounds and transform them into data the computer can understand. Each voice recognition software is different, and some are more sophisticated than others. That means the sophistication of Voice Recognition depends on the software, not necessarily the hardware.
In other words, if you test two identical computers with different voice recognition software, one will be better than the other – even if the microphones (“ears”) are the same.
What is Voice Recognition used for?
These days, voice recognition software is everywhere!
The most common application is search, such as Google or Alexa. Instead of typing in a command, you activate the voice recognition software (often with a voice command like “OK Google” or “Hey Siri”) and then ask your question verbally.
Consumer research firm ComScore reports that by 2020, half of all our online searches will be done by voice. Good news for those with carpal tunnel, eh?
Of course, we’re discovering new uses for this tech beyond search on a regular basis – especially as it matures. We’ve found applications in banking, dispensing medications, and even reading bedtime stories to children!
When was Voice Recognition invented?
The technology dates back to the 1950s, when simple programs were able to identify simple speech patterns. These early programs worked well with the person who invented it… but of course, everyone speaks differently.
Early voice recognition software was trained to look for a single person’s speech patterns. However, when it tried to understand someone with an accent… embarrassing situations occurred. (“No, I said I want a new display… not a nudist play!”)
It’s taken a long time for the technology to reach its current state, but now that it’s here, most of the kinks have been worked out and everyone can make use of the functionality.
Okay, so one last question – how does the technology work?
How does voice recognition work?
Things may get a bit complicated here, so bear with us…
As you are probably aware, sound is transmitted in soundwaves. These analog signals can be picked up by a microphone, then converted into digital data. That’s where things get interesting…
The soundwaves are converted into a digital signal, which is then divided into small identifiable segments. (It learns how to identify these segments over time, with practice and input from humans.)
Those segments are then compared with “phonemes” – little bits of sound used in speech.
For example, “cat” has three phonemes: “/c/” “/a/” and “/t/”.
If we replace one of the phonemes in “cat” with another, such as replacing “/t/” with “/r/”, we get another word – “car”. This has a completely different meaning, and all it took was one changed phoneme.
To identify the correct phonemes, the software uses an advanced statistical model to compare a given phoneme with a large database of words and phrases.
The most famous way of doing this is called the Hidden Markov model, which uses probability scores and context from the phonemes before and after it to make guesses as to what the speaker meant.
Now it gets even more complicated…
Sadly, things don’t always go smoothly.
While us humans can easily tell the difference between the words “Car” and “Cat”, computers need a LOT of practice. They can only make educated guesses each time a word is used, and would often get it wrong in the early days.
You can imagine they run into a lot of confusion, too – especially when you consider words like “often” (which some pronounce as “offen” – see how it removes a phoneme?) or even words that are often mispronounced, like “bidet”.
And don’t even get us started on “they’re”, “their” and “there”!
Tech helping tech: Why voice recognition has matured so quickly in recent years
Think about it – even humans have trouble understanding each other sometimes. This means voice recognition software has had to account for TONS of variations across ALL types of people – plus it needs context.
But despite this seemingly impossible task, voice recognition has become extremely reliable in recent years. How, you ask? Simple: It learns from its mistakes – using the power of Artificial Intelligence!
Improvements in AI technology is the reason voice recognition has become ubiquitous today – and will continue to make our lives easier as it improves.
Those are the essentials when it comes to voice recognition. Of course, there’s more to know – and if you’d like to learn even more, you’ll love our article about Artificial Intelligence!