How to Build an Amazon Echo Clone with IBM Watson and Intel Edison

Straight out of Kubrick's 2001: A Space Oddessy, Amazon's popular Echo AI device has brought artifical intelligence right into our living rooms. If you have Alexa in your home, you know how convenient it is to get the weather or hear a random joke on command (she's actually pretty funny). Here's a DIY version that's as fun to make as it is to interact with after it's done.

Overview

In this project we will create an Amazon Echo clone based on the Intel Edison hardware and IBM Watson platform. Note that your "Alexa" may not be as fully capable as Amazon's but it will be a whole lot cheaper and a lot more fun to build.

During the project we will covering the following topics:

  • Capturing audio with a USB microphone.
  • Sending audio to a Bluetooth speaker.
  • Using Johnny-Five to interface with the Edison's IO.
  • Using IBM's Watson Speech-to-Text and Text-to-Speech services.

Getting Started

What you'll need to complete this project:

If you haven't already done so, you'll need to setup your Edison and get the latest firmware flashed. You can follow our quick article on Getting Started with the Intel Edison or check out Intel's Getting Started Guide.

NOTE: I'm using the Intel XDK IoT Edition because it makes debugging and uploading code to the board very easy. To learn more about the IDE and how to get started using it check out Getting Started with the Intel XDK IoT Edition. It is not required for this project though.

Connect Bluetooth Speaker

IMG_20160126_105433996_HDR-139355-edited.jpgEstablish a terminal to your Edison using either of the guides above.

Make your Bluetooth device discoverable. In my case I needed to push the pair button on the back of the speaker.

In the terminal to your board type the following:

root@edison:~# rfkill unblock bluetooth
root@edison:~# bluetoothctl
[bluetooth] scan on

This starts the Bluetooth Manager on the Edison and starts scanning for devices. The results should look something like:

Discovery started
[CHG] Controller 98:4F:EE:06:06:05 Discovering: yes
[NEW] Device A0:E9:DB:08:54:C4 OontZ Angle

Find your device in the list and pair to it.

[bluetooth] pair A0:E9:DB:08:54:C4

In some cases, the device may need to connect as well.

[bluetooth] connect A0:E9:DB:08:54:C4

Exit the Bluetooth Manager.

[bluetooth] quit

Let's verify that your device is recognized in pulse audio:

root@edison:~# pactl list sinks short

If all is good, you should see your device listed as a sink device and the name should start with bluez_sink like the example output below.

0  alsa_output.platform-merr_dpcm_dummy.0.analog-stereo  module-alsa-card.c  s16le 2ch 48000Hz  SUSPENDED
1  alsa_output.0.analog-stereo  module-alsa-card.c  s16le 2ch 44100Hz  SUSPENDED
2  bluez_sink.A0_E9_DB_08_54_C4  module-bluez5-device.c  s16le 2ch 44100Hz  SUSPENDED

Now let's set our Bluetooth device as the default sink for the pulse audio server:

root@edison:~# pactl set-default-sink bluez_sink.A0_E9_DB_08_54_C4

Connect USB Microphone

IMG_20160126_105516118_HDR-362475-edited.jpgThe Edison has two USB modes: host mode and device mode. To use a USB microphone you'll need to switch the Edison into host mode by flipping the microswitch, located between the standard sized USB port and the micro USB port, towards the large USB port. You will also need to power the Edison with an external DC power supply and not through the micro USB.

Then simply plug your microphone in the large USB port.

Let's make sure the Edison recognizes our microphone as an audio source by using the arecord command.

root@edison:~# arecord -l

The output contains all of the hardware capture devices available. Locate your USB Audio device and make note of its card number and device number. In the example output below my mic is device 0 on card 2.

...
card 2: DSP [Plantronics .Audio 655 DSP], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Let's Get Coding

In less than 200 lines of code (including comments) we'll have a system that will:

  1. Listen for a button press
  2. Play a sound to let the user know it's listening
  3. Capture 5 seconds of audio input
  4. Convert the audio input to text
  5. Perform a command or search on the input text
  6. Convert the text results to speech
  7. Play the speech audio to the user

I've broken the code up into easy to understand blocks. Let's walk through them and explain along the way.

Requires and Globals

Nothing special here. Just require the modules we need and declare some vars to use a little later.

Initialize Watson Services

Another simple block of code but this one requires a little pre-work. IBM Watson Cloud Services requires credentials for each specific service used. Follow the Obtaining credentials for Watson services guide to get credentials for both the Speech-To-Text and the Text-To-Speech services.

Text-to-Speech and Speech-to-Text Magic

First let's take a look at the Text-to-Speech (TTS) function. There are two parts to TTS: 1) Converting the text to audio and 2) Playing the audio.

For the first, we are obviously using the IBM Watson Cloud Services which couldn't make it any easier. All we need to do is pass the text we would like converted and the audio format we would like back into the synthesize method and it returns a Stream.

For the second, we are using GStreamer. More specifically gst-launch. We take the Stream returned from synthesize and pipe it directly into the stdin on the child process of gst-launch-1.0. GStreamer then processes it as a wav file and sends it to the default audio output.

Next let's look at the Speech-to-Text (STT) function. As with the TTS function, there are two main parts.

The first is capturing the audio. To capture the audio we are using arecord. arecord is fairly straightforward with the exception of the -D option. Earlier when we set up the USB microphone, we used arecord -l to confirm the system saw it. That also gave us the card and device numbers associated with the mic. In my case, the mic is device 0 on card 2. Therefor, the -D option is set to hw:2,0 (hardware device, card 2, device 0.) By not providing a file to record the audio to, we are telling arecord to send all data to its stdout.

Now we take the stdout from arecord and pass that into the recognize method on the STT service as the audio source. The arecord process will run forever unless will kill it. So we set a timeout for five seconds then kill the child process.

Once we get the STT result back, we grab the first transcript from the response, trim it and return it.

Play Local Wav File

We have already covered using GStreamer but to play a local the args are a little different.

Setup Johnny-Five

IMG_20160126_105804448_HDR-544907-edited.jpgHere we setup the Edison IO in Johnny-Five and listen for the board to complete initialization. Then attach a button to GPIO pin 4 and an LED to GPIO 6. To do this I used a Grove Base Shield along with the Grove button and LED modules. You could attach a button and LED using a breadboard instead.

Last we add a listener on the button press event which will call the main function that we will look at next.

Main

We now have all the supporting pieces so let's put together the main application flow. When main is run, we first play a chime sound to let the user know we are listening by using the playWav defined earlier. You can download the wav file I used from the projects repo. We then listen for a command, perform the search, and play the results which we will all look at next.

Last we handle any errors that may have happened and get ready to do it all again.

The Bread-and-Butter

The listen function simply turns on the LED to show we are listening then calls stt to capture the command.

The search function uses the Duck Duck Go Instant Answer API to perform the search. Then returns the best answer.

Last we have the speak function that takes the search results and passes that into the tts function.

Wrap Up

Deploy the code to your Edison and run it. Wait a few seconds for the app to initialize then press the button. You'll hear a sound and the LED will light up. Speak your search phrase clearly into the mic then sit back and enjoy your new toy.

IMG_20160126_104245854.jpg

You'll find it's great at handling single words and simple phrases. You can also use it to do simple math problems by starting your phrase with "calculate", like "calculate five plus five."

Below you'll find a list of additional resources used while making this project but not linked to above. I encourage you to take a look at them to learn a little more about the technologies used. You can also find all the code for this project at https://github.com/Losant/example-edison-echo.

Enjoy!

Additional Resources

 

There are 4 Comments

What is Losant?

Losant is an enterprise IoT platform that makes it easy to build secure, complex connected experiences.

Losant’s platform enables you to intelligently manage your devices, collect and visualize data, and ultimately take action on data through highly customizable workflows - all in real-time.

Get Started For Free