Nuance - Dragon Mobile SDK - Speech Kit Library Guide (for Android)-CFANZ编程社区

Speech Kit Library Guide¶

The Speech Kit library provides the classes necessary to perform network-based speech recognition and text-to-speech synthesis. This library provides a simple, high-level speech service API that automatically performs all the tasks necessary for speech recognition or synthesis, including audio recording, audio playback, and network connection management.

Organization of This Document

The following sections describe how to connect to a speech server and perform speech recognition or synthesis:

“Speech Kit Basics ” provides an overview of the Speech Kit library.
“Connecting to a Speech Server ” details the top-level server connection process.
“Recognizing Speech ” describes how to use a network recognizer to transcribe speech.
“Converting Text to Speech ” shows how to use the network-based vocalizer to convert text to speech.

Speech Kit Basics

The Speech Kit library allows you to add voice recognition and text-to-speech services to your applications easily and quickly. This library provides access to speech processing components hosted on a server through a clean asynchronous network service API, minimizing overhead and resource consumption. The Speech Kit library lets you provide fast voice search, dictation, and high-quality, multilingual text-to-speech functionality in your application.

Speech Kit Architecture

The Speech Kit library is a full-featured, high-level library that automatically manages all the required low-level services.

Speech Kit Architecture

At the application level, there are two main components available to the developer: the recognizer and the text-to-speech synthesizer.

In the library there are several coordinated processes:

The library fully manages the audio system for recording and playback.
The networking component manages the connection to the server and, at the start of a new request, automatically re-establishes connections that have timed-out.
The end-of-speech detector determines when the user has stopped speaking and automatically stops recording.
The encoding component compresses and decompresses the streaming audio to reduce bandwidth requirements and decrease latency.

The server system is responsible for the majority of the work in the speech processing cycle. The complete recognition or synthesis procedure is performed on the server, consuming or producing the streaming audio. In addition, the server manages authentication as configured through the developer portal.

Using Speech Kit

To use Speech Kit, you will need to have the Android SDK installed. Instructions for installing the Android SDK can be found at http://developer.android.com/sdk/index.html . You can use the Speech Kit library in the same way that you would use any of the standard jar library.

To start using the Speech Kit library, add it to your new or existing project, as follows:

Copy the libs folder into the root of the project folder for your android project. The libs folder contains an armeabi subfolder that contains the file libnmsp_speex.so
From the menu select Project ‣ Properties....
In the popup menu, select Java Build Path from the menu at the left.
In the right panel of the popup menu, select the Libraries tab.
Use the Add External JARs button to add nmdp_speech_kit.jar

Enabling Javadoc for the Speech Kit Library in Eclipse

To view the Javadoc for Speech Kit in Eclipse, you must tell Eclipse where to find the class documentation. This can be done with the following steps:

In the Package Explorer tab for your project, a Referenced Libraries
Right click nmdp_speech_kit.jar and select Properties
In the popup menu, select Javadoc Location from the menu at the left.
In the right panel of the popup menu, select the Javadoc URL
Click the Browse button to the right of Javadoc location path
Browse to and select the Speech Kit javadoc

You also need to add the necessary permissions to AndroidManifest.xml

In the Package Explorer tab for your project, open AndroidManifest.xml
Add the following lines immediately before the end of the manifest tag.

<uses-permission android:name= "android.permission.ACCESS_NETWORK_STATE" ></uses-permission> <uses-permission android:name= "android.permission.INTERNET" ></uses-permission> <uses-permission android:name= "android.permission.RECORD_AUDIO" ></uses-permission> <uses-permission android:name= "android.permission.READ_PHONE_STATE" ></uses-permission> ... </manifest>

If you want to use prompts that vibrate, you will need to include the following additional permission:

<uses-permission android:name= "android.permission.VIBRATE" ></uses-permission>

You are now ready to start using recognition and text-to-speech services.

Speech Kit Errors

While using the Speech Kit library, you will occasionally encounter errors. In this library the SpeechError class includes SpeechError.Codes

There are effectively two types of errors that can be expected in this framework.

The first type are service connection errors and include the SpeechError.Codes.ServerConnectionError and SpeechError.Codes.ServerRetryError
The second type are speech processing errors and include the SpeechError.Codes.RecognizerError and SpeechError.Codes.VocalizerError

It is essential to always monitor for errors, as signal conditions may generate errors even in a correctly implemented application. The application’s user interface needs to respond appropriately and elegantly to ensure a robust user experience.

Connecting to a Speech Server

The Speech Kit library is a network service and requires some basic setup before you can use either the recognition or text-to-speech classes.

This setup performs two primary operations:

First, it identifies and authorizes your application.
Second, it optionally establishes a connection to the speech server immediately, allowing for fast initial speech requests and thus enhancing the user experience.
Note
This network connection requires authorization credentials and server details set by the developer. The necessary credentials are providedthrough the Dragon Mobile SDK portal at http://dragonmobile.nuancemobiledeveloper.com .

Speech Kit Setup

The application key SpeechKitApplicationKey

Your unique credentials, provided through the developer portal, include the necessary line of code to set this value. Thus, this process is as simple as copying and pasting the line into your source file. You must set this key before you initialize the Speech Kit system. For example, you configure the application key as follows:

static
 final
 byte
[]
 SpeechKitApplicationKey
 =
 {(
byte
)
0x12
,
 (
byte
)
0x34
,
 ...,
 (
byte
)
0x89
};

The setup method, SpeechKit.initialize()

An application Context (Android.content.Context)
An application identifier
A server address
A port
The SSL setting
The application key defined above.

The appContext

Context
 context
 =
 getApplication
().
getApplicationContext
();

The ID

The host and port

The ssl

The applicationKey

The library is configured in the following example:

SpeechKit
 sk
 =
 SpeechKit
.
initialize
(
context
,

                                    speechKitAppId
,

                                    speechKitServer
,

                                    speechKitPort
,

                                    speechKitSsl
,

                                    speechKitApplicationKey
);

Note

This method is meant to be called one time per application execution to configure the underlying network connection. This method does not attempt to establish the connection to the server.

At this point the speech server is fully configured. The connection to the server will be established automatically when needed. To make sure the next recognition or vocalization is as fast as possible, connect to the server in advance using the optional connect

sk
.
connect
();

Note

This method does not indicate failure. Instead, the success or failure of the setup is known when the Recognizer and Vocalizer classes are used.

When the connection is opened, it will remain open for some period of time, ensuring that subsequent speech requests are served quickly as long as the user is actively making use of speech. If the connection times out and closes, it will be re-opened automatically on the next speech request or call to connect

The application is now configured and ready to recognize and synthesize speech.

Recognizing Speech

The recognizer allows users to speak instead of type in locations where text entry would generally be required. The speech recognizer returns a list of text results. It is not attached to any UI object in any way, so the presentation of the best result and selection of alternative results is left up to the UI of application.

Speech Recognition Process

Initiating a Recognition

Before you use speech recognition, ensure that you have set up the core Speech Kit library with the SpeechKit.initialize
Then create and initialize a Recognizer

recognizer
 =
 sk
.
createRecognizer
(
Recognizer
.
RecognizerType
.
Dictation
,

                                 Recognizer
.
EndOfSpeechDetection
.
Short
,

                                 "en_US"
,
 this
,
 handler
);

The SpeechKit.createRecognizer

The type parameter is a String , generally one of the recognition type constants defined in the Speech Kit library and available in the class documentation for Recognizer . Nuance may provide you with a different value for your unique recognition needs, in which case you will enter the raw String
The detection parameter determines the end-of-speech detection model and must be one of the Recognizer.EndOfSpeechDetection
The language
Note
For example, the English language as spoken in the United States is en_US . An up-to-date list of supported languages for recognition is available on the FAQ at http://dragonmobile.nuancemobiledeveloper.com/faq.php .
The this parameter defines the object to receive status, error, and result messages from the recognizer. It can be replaced with any object that implements the RecognizerListener
handler should be an android.os.Handler

Handler handler = new Handler();

Handler

Start the recognition by calling start
The Recognizer.Listener passed into SpeechKit.createRecognizer

Using Prompts

Prompts are short audio clips or vibrations that are played during a recognition. Prompts may be played at the following stages of the recognition:

Recording start: the prompt is played before recording. The moment the prompt completes, recording will begin.
Recording stop: the prompt is played when the recorder is stopped.
Result: the prompt is played if a successful result is received.
Error: the prompt is played if an error occurs.

The SpeechKit.defineAudioPrompt method defines an audio prompt from a raw resource ID packaged with the Android application. Audio prompts may consume significant system resources until release is called, to try to minimize the number of instances. The Prompt.vibrate

Call SpeechKit.setDefaultRecognizerPrompts to specify default audio or vibration prompts to play during all recognitions by default. To override the default prompts in a specific recognition, call setPrompt prior to calling start

Receiving Recognition Results

To retrieve the recognition results, implement the Recognizer.Listener.onResults

public
 void
 onResults
(
Recognizer
 recognizer
,
 Recognition
 results
)
 {

        String
 topResult
;

if
 (
results
.
getResultCount
()
 >
 0
)
 {

    topResult
 =
 results
.
getResult
(
0
).
getText
();

    // do something with topResult...

}

}

This method will be called only on successful completion, and the results list will have zero or more results.

Even in the absence of an error, there may be a suggestion, present in the recognition results object, from the speech server. This suggestion should be presented to the user.

Handling Errors

To be informed of any recognition errors, implement the onError method of the Recognizer.Listener interface. In the case of errors, only this method will be called; conversely, on success this method will not be called. In addition to the error, a suggestion, as described in the previous section, may or may not be present. Note that both the Recognition and the SpeechError class have a getSuggestion

public
 void
 onError
(
Recognizer
 recognizer
,
 SpeechError
 error
)
 {

    // Inform the user of the error and suggestion

}

Managing Recording State Changes

Optionally, to be informed when the recognizer starts or stops recording audio, implement the onRecordingBegin and onRecordingDone methods of the Recognizer.Listener interface. There may be a delay between initialization of the recognizer and the actual start of recording, so the onRecordingBegin

public
 void
 onRecordingBegin
(
Recognizer
 recognizer
)
 {

    // Update the UI to indicate the system is now recording

}

The onRecordingDone

public
 void
 onRecordingDone
(
Recognizer
 recognizer
)
 {

    // Update the UI to indicate that recording has stopped and the speech is still being processed

}

This message is sent both with and without end-of-speech detection models in place. The message is sent regardless, whether recording was stopped due to calling the stopRecording

Power Level Feedback

In some scenarios, especially for longer dictations, it is useful to provide a user with visual feedback of the volume of their speech. The Recognizer interface supports this feature by use of the method getAudioLevel , which returns the relative power level of the recorded audio in decibels. The range of this value is a float between 0.0 and -90.0 dB where 0.0 is the highest power level and -90.0 is the lowest level. This method should be accessed during recordings, specifically in the time between receiving the messages onRecordingBegin and onRecordingDone

Converting Text to Speech

The Vocalizer

Text-to-Speech Process

Initiating Text-To-Speech

Before you use speech synthesis, ensure that you have setup the core Speech Kit library with the SpeechKit.initialize

Then create and initialize a Vocalizer

Vocalizer
 voc
 =
 sk
.
createVocalizerWithLanguage
(
"en_US"
,
 this
,
 handler
);

The Vocalizer.createVocalizerWithLanguage

The language parameter is a String that defines the spoken language in the format of the ISO 639 language code, followed by an underscore “_”, followed by the ISO 3166-1 country code. For example, the English language as spoken in the United States is en_US
Note
An up-to-date list of supported languages for text-to-speech is available at http://dragonmobile.nuancemobiledeveloper.com/faq.php . The list of supported languages will be updated when new language support is added. The new languages will not necessarily require updating an existing Dragon Mobile SDK.
The this parameter defines the object to receive status and error messages from the speech synthesizer. It can be replaced with any object that implements the Vocalizer.Listener
handler should be an android.os.Handler

Handler handler = new Handler ();

Handler

The Vocalizer.createVocalizerWithLanguage method uses a default voice chosen by Nuance. To select a different voice, use the createVocalizerWithVoice

The voice parameter is a String that defines the voice model. For example, the female US English voice is Samantha
Note
The up-to-date list of supported voices is provided with the supported languages at http://dragonmobile.nuancemobiledeveloper.com/faq.php .

To begin converting text to speech, you must use either the speakString or speakMarkupString

voc
.
speakString
(
"Hello world."
,
 context
);

Note

The speakMarkupString method is used in exactly the same manner as speakString except that it takes a String filled with SSML, a markup language tailored for use in describing synthesized speech. An advanced discussion of SSML is beyond the scope of this document, however you can find more information from the W3C at http://www.w3.org/TR/speech-synthesis/ .

As speech synthesis is a network-based service, these methods are all asynchronous, and in general an error condition is not immediately reported. Any errors are reported as messages to the Vocalizer.Listener that was passed to createVocalizerWithLanguage or createVocalizerWithVoice

The speakString and speakMarkupString methods may be called multiple times for a single Vocalizer instance. To change the language or voice without having to create a new Vocalizer , call setLanguage or setVoice

Managing Text-To-Speech Feedback

The synthesized speech will not immediately start playback. Rather there will be a brief delay as the request is sent to the speech server and speech is streamed back. For UI coordination, to indicate when audio playback begins, the optional method Vocalizer.Listener.onSpeakingBegin

public
 void
 onSpeakingBegin
(
Vocalizer
 vocalizer
,
 String
 text
,
 Object
 context
)
 {

        // update UI to indicate that text is being spoken

}

The context in the message is a reference to the context that was passed to one of the speakString or speakMarkupString

On completion of the speech playback, the Vocalizer.Listener.onSpeakingDone message is sent. This message is always sent on successful completion and on error. In the success case, error is null

public
 void
 onSpeakingDone
(
Vocalizer
 vocalizer
,
 String
 text
,
 SpeechError
 error
,
 Object
 context
)
 {

    if
 (
error
 !=
 null
)
 {

        // Present error dialog to user

    }
 else
 {

        // Update UI to indicate speech is complete

    }

}