Voice commands are an accepted part of our daily lives — to the point where smart systems could replace the dog as a person’s best friend. When was the last time you heard someone tell their dog to fetch the newspaper? Today, you’re more likely to ask Alexa or Google to read you the news headlines.
Whether it’s opening an app, checking the weather, browsing news headlines, reading snippets of incoming emails, playing music, navigating around town, controlling IoT devices, checking the time, or even just using a timer, we are using voice commands more and more in our daily lives.
Amazon recently announced even more hardware with Alexa embedded, further increasing access to one of the most popular digital assistants available today. But do you know how a system like Amazon’s Alexa actually works?
What happens when I ask Alexa, “Where is the nearest taco?”
That question begins a series of steps, as outlined in a recent Slate podcast by one of Alexa’s Chief Engineers, and it’s fascinating. But as we will see, only the facts (and if you build a Skill, the Skills) are something you can impact. It’s important to note that you can’t control the User Interface and you can’t control the Artificial Intelligence – but you can control the knowledge and facts about your business.
When you ask Alexa to fetch an answer for you, this is what the process looks like.
“Hey Alexa, where’s the nearest taco?”
Hey Alexa invokes the system. This wakes it from a rest state, where it is listening for one single thing: its name. It generally doesn’t pay attention to anything else prior to hearing that name.
Upon hearing the correct invocation, the device in your room (or the app on your phone) connects to the web, and onto the cloud that Alexa lives in. You see, we don’t each get a copy of Alexa in our devices. There is only one Alexa, and we all access her directly through our devices. She can handle millions upon millions of requests at the same moment, though, so don’t worry that your question won’t be heard. When the connection between your device and the cloud is established, the device sends your request to the cloud system.
When the system in the cloud receives your request, it filters words to determine object, intent, and action. Its goal is to understand your question. it knows your location, can access local business data, reviews and pricing information, hours of operation, and so on. It does all this to determine the best answer for your unique request.
Once the components of the request are recognized and established as individual facts, they are handed off to internal Skills which Alexa can access to sort the data and formulate a response.
The response is then sent back to your local device, which speaks the answer to you out loud.
It takes a while to explain the operation. But in real life, the time from you asking the question to hearing an answer is usually less than two seconds. This should feel familiar, as it’s about the same response time humans take when asked a question that they need to think of an answer for.
The new normal.
Voice commands have grown popular because they are easy, accessible, and people have largely overcome the early aversion to speaking to their technology. There is a learning curve, but most people today are equipped with everything they need to use voice commands — they can speak, and know how to form questions. In the case of aging users, voice commands offer a way to avoid small keyboards and small on-screen fonts, which can be a significant hindrance if eyesight becomes an issue due to aging.
Voice commands are also gaining favor because of two other important factors. The first is the newest generations of digital natives are growing to become the primary purchasers of devices with voice access. Millenials and Gen Z expect devices to be voice-enabled and companies like Google and Amazon are doing their best to ensure their systems are embedded in a wide range of first- and third-party devices.
You’ve probably heard a story told by friends where their youngest child (from Gen Z) walked up to a device, like a landline telephone, and started asking it things, expecting it to respond. These stories, while circulating as urban legends, started from a kernel of truth. Somewhere, someone’s child tried to speak to a random item expecting a response.
Whether true or myth, the fact remains that for generations to come, starting with Gen Z, their expectations will drive behavior. They expect to have voice engagements and will look away from systems that don’t support this when they enter their prime earning years.
The second reason people are using these devices more is that the systems have become smarter — capable of doing more, of answering more questions, and in general, of being more useful. Google, Amazon, Microsoft, and every other player with an AI-powered digital assistant have been working consistently to build their systems to be smarter. For them, this means increasing the accuracy of voice recognition, the level of comprehension around each entity, and the breadth of answer coverage.
As we look back and recall the familiar jokes about the first generation of our digital assistants being dumb, we should be aware that the advances made in voice systems over these past few years will only continue as we move forward. Voice commands work today because a complex series of challenges have been overcome, ensuring that most people get accurate answers to most of their questions. In the next few years, everyone will get answers to all of their questions as the machine learning systems that power voice command systems fill in the remaining blanks with data, information, and answers.