6.894 PSET #4: Working with SpeechBuilder

 

As you guys probably saw in class last Thursday, SpeechBuilder is essentially a server that processes speech, once you specify a domain that it can understand. The objective of this problem set is to incorporate speech, and allow anyone to just talk to your pystatasearch python program. So instead of having to type in “G842” and “G804”, you are going to build a domain on SpeechBuilder and incorporate speech commands in the program. So, you should be able to say, “Please take me from G832 to G809,” and SpeechBuilder will parse that command, and draw the path in for you. This will make interacting with the program much easier, since it is difficult and sometimes frustrating to type in the room numbers in those little boxes.

 

The first thing you need to do is go to http://ozone.csail.mit.edu/SpeechBuilder/SpeechBuilder.cgi

And register your username. Once you do this, you can log in, and start editing a domain.

Create a new domain called search. Within this domain, you want to specify what kinds of inputs to expect while this is running. So take a minute to think about what you would be saying to the search program. Obviously you want it to recognize the different rooms that you could be in (attributes), and you want it to recognize certain commands (actions), like “take me from” or “load a new map”, or maybe even “scroll down the map” and stuff like that. You can be creative. Just try to imagine that you are incredibly lazy and you want to completely interact with this program through speech only. What kind of commands would you support? Build these phrases into your domain, and make sure to compile.

 

Testing it out:

Once you build the domain, and it compiles correctly, the next step is to get galaudio working on your ipaq. The first thing you want to do is to edit /etc/ipkg.conf. Add the following lines to the bottom of the file:

 

“src base http://familiar.handhelds.org/releases/v0.7.2/base/armv4l” and

“src/gz oxygen http://oxy.lcs.mit.edu/feeds/6.893_spring_2004

 

Since we’re working with GPE, we want to make the same change in /etc/ipkg/gpe.conf.

Then do:

 

“ipkg update”

 

This command will update the list of available packages to be downloaded onto the ipaq. Once this is done, you can begin downloading new ipkgs automatically. The first thing we need to do is

 

“ipkg install mit-galaudio”

 

This should be located in the oxygen feed you added into ipkg.conf above, and should begin downloading automatically. After this is successful, run:

“ipkg install sound-modules”

 

This enables sound on the ipaq. Once this is working, galaudio should be able to work successfully.

 

The next step is to test out the system to see if galaudio is in fact working correctly. You can do this by tying the SpeechBuilder server to an echo script, which will pretty much just repeat whatever you say. So, go back to the SpeechBuilder website, and click on the drop down list “Edit backend URL script”. Change whatever is there to:

 

http://ozone.csail.mit.edu/cgi-bin/echo.cgi

 

and hit Apply, then Compile. Make sure this all successfully works. What this is going to do is once you start Galaudio, your speech will be parsed by SpeechBuilder, and whatever it parses will be sent to the echoscript above. This will just repeat whatever you say. So to check that this is working, ssh into ozone.csail.mit.edu.


Once you’re sshed into ozone, go to /usr/sls/Galaxy/SpeechBuilder/users/USERNAME/DOMAIN.speech

 

That’s if you named your new domain speech. If you named it something different, go to that DOMAIN.different. Now, check out the file called menu. It will just be a string of a bunch of things. Look at the very last part of the string. It should be a 5-digit number. Write this number down, as this is the port that you want to talk to. Once you have this number, you can run the command:

 

“./oxclass.cmd yes yes”

 

The first yes tells the script to pop up windows so that you can look at the output from the servers that will run, which is a useful debug feature, and the second yes tells the domain to use Festival, which is an open source speech synthesizer.

 

Once this is running, you can run over to your ipaq and start up galaudio:

 

“galaudio /dev/sound/dsp ozone.csail.mit.edu username portnumber push none 16000 8000”

 

/dev/sound/dsp is the Linux audio input/output devices that galaudio uses to talk to the microphone and the speaker. Ozone is the location of the speech server, and username identifies the domain. Portnumber is the number you were supposed to write down from menu; push tells the audio app to record while you are pushing down the record button, which is located at the left-hand side of the ipaq, near the top. The last few parameters work to tell galaudio how frequently to sample speech and play speech at.

 

Once you have gotten this far, you should be able to talk to your domain. After you execute the line above, this should connect galaudio to the server that ran when you executed the oxclass command.

 

Look at your screen. It should pop up a new GUI and say “Welcome to echo-script. I will repeat whatever you say.” At this point, hold down the record button and say something you specified in your search domain, like “Take me from G802 to G804”. Since this is something you specified, the echoscript should be able to recognize it, and if it recognizes everything successfully, it will repeat back “I think you said… take me from G802 to G804.” If it didn’t recognize what you said, it will say so. If it is having a hard time understanding you, just talk louder and slower, and it should work. Once you get this part to work, congratulations! You are now one big step to incorporating your program to work with speech commands.