<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=752538731515435&amp;ev=PageView&amp;noscript=1">

Let’s Talk About Talking: Digital Assistants & VUI

As we move closer towards a world filled with the Internet of Things (IOT), we will want to think about how we interact with these devices. Touchscreens and button presses are certainly one way to go, but who doesn’t want to be like Iron Man and communicate with their own J.A.R.V.I.S.?

Look who’s talking now

There are several big name companies that have had success with digital assistants, most notably Siri and her ability to tell wonderful jokes. In this section I will break down some of the more popular examples out there and what each one brings to the table.

Apple Siri. Siri does a great job of understanding natural speech patterns (no need to memorize commands) and recognizes different ways a person might ask the same question. The most sophisticated of the four major options with an impressive range of capabilities including: answering follow-up questions (remembering the context in which the original question was given), setting reminders/appointments, posting to Facebook and Twitter, and making dinner reservations.

Amazon Alexa. The best option for controlling a smart home with support for Samsung SmartThings, Philip Hue, fans, and thermostats. It also has an Alexa Skills Kit which is a collection of self-service APIs so you can add your own skills (voice commands) to Alexa controlled devices. One thing that sets Alexa apart from the other three on the list is that its main supporting device is not a smart phone or PC, but the Amazon Echo which is a hands-free speaker that has several mics so that it is able to pick up commands from anywhere in the room (think HAL 9000, but not evil).

Google Now. Google with a voice. Great at answering questions that you would normally type into Google. It is also great at mapping your normal traffic patterns and alerting you about traffic congestion and road construction delays. The major downfall (in my opinion) is that it more often than not will default to opening a browser with its response instead of telling you the answer, making it less handy for vocal conversation.

Microsoft Cortana. Those of you familiar with the Halo video game franchise will recognize the name Cortana and her voice actor Jen Taylor, who is also the voice you hear when speaking with Microsoft’s Cortana (imagine that?). The problem with having a real person voice the digital assistant is that the vocal responses are limited and most of the times (similar to Google) the responses are mainly handled through search results.

Not all talk

Wit.ai is a Y Combinator startup that was acquired by Facebook last year for its ability to build voice-activated interfaces exposed by APIs, Node.js, Python, and Ruby. In the following example, I will show a simple project created to work with a set of smart door locks (one for the front door and one for the garage).

I created a new project called, Door Helper. The story will start with the user wanting to know the current status of either the front or garage door. Each of the highlighted words are considered entities that will then be passed to a function that will interpret what is being asked of it.

1.png

 

 

 

 

 


In this first case, we want the function to check the status of the front door to see if it is unlocked or locked.

2.png

 

 

 

 



The action checkDoorStatus is not hooked up for the purpose of this example. In a real world application this could be handled in Node.js by adding the checkDoorStatus function to the actions constant:

3.png

 

 

 

 

 

 

 



I repeated the process above for a follow-up question and now we are ready to send our bot to school. You might have noticed that the door_status and door_type appear to be set to unlocked and front respectively, but we want them to be options we can at least somewhat control since we will need to still interpret what is being asked for; also we want them to be able to check the garage door and locked status as well. We will accomplish this by adding keywords for the different entities. The keywords for door_status are unlocked and locked, keywords for door_type are front and garage. Wit.ai gives us a nice text box to test out how our questions would look and if we would map properly.

4.png

 

 

 

 

 

 


We can see that our three entities mapped correctly, but what happens if I try to phrase the question differently?

5.png

 

 

 

 

 


Well, it isn’t a fan of that. So how do we get the bot to identify this as a valid request? Luckily, in wit.ai we can add synonyms to a keyword, that way it will understand that lock/locked have the same entity meaning.

6.png

 

 

 

 

 

 



Now it is happy.

So let’s try to talk with the bot and see what happens. Keep in mind that this example is strictly to prove out the conversation and would require at least a smart door lock call to test with a real life scenario.

7.png

 

 

 

 

 

 

 


What if the directions are not understood from lack of synonym mapping in the keywords? It is a good idea to always have a way to jump back to listening for a command after not being able to understand one. You certainly hope that every request is handled correctly, but there may be times where that is not the case and you don’t want the user met with silence or incorrect responses.

8.png

 

 

 

 

 

 

 

 

 



What to watch listen out for

• Background noise can make it hard for the microphone to pick up commands.
• Different ways to say the same thing, if not handled through the synonym keyword mapping.
• Having the response generate a large list or browser-based experience. For example, if I owned a warehouse with 40 doors and asked, what doors are locked? I wouldn’t want the Voice Response to list off every door and its current status, I would much rather have it open a dashboard showing the physical layout of the building and label which doors are locked/unlocked.
• Words that sound the same (their, they’re, and there) or (read vs. red).
• Accents
• Region specific dialect, for example pop vs soda vs coke

Over and Out

We have looked at how several companies are implementing digital assistants to make our lives easier, how we can build custom applications to interact with smart devices, and some pitfalls to be aware of when developing voice first user interfaces. I hope to hear more from digital assistants and smart devices in the future.

 

Share:
Lawrence Valiquette II

About Author Lawrence Valiquette II

Lawrence Valiquette is a former Software Engineer at Omni. He graduated from Lake Superior State University with a bachelor’s degree in Network Administration, soon after found his calling for application development where he has spent the past 8 years as a .NET Developer. He is always looking to learn new things and couldn’t think of a better field to be in for that. Outside of work, he enjoys spending time with his wife Danielle and their three daughters; Adilynn, Sophia and Aria. He also enjoys playing video games, board games and card games with friends.



Disclaimer:

Omni’s blog is intended for informational purposes only. Any views or opinions expressed on this site belong to the authors, and do not represent those held by people or organizations with which Omni is affiliated, unless explicitly stated.

Although we try to the best of our ability to make sure the content of this blog is original, accurate and up-to-date, we make no claims of complete accuracy or completeness of the information on this site/s to which we link. Omni is not liable for any unintended errors or omissions, or for any losses, injuries, or damages from the display or use of this information. We encourage readers to conduct additional research before making decisions based on the information in this blog.