From Olympus
(Redirected from Audio Server)
Jump to: navigation, search

The multi-decoder is not a program, but a collection of programs that serve as a way of simultaneously passing audio streams into several decoding engines. The audio streams are captured by the Audio Server, which requests decodings of the streams by the decoding engines. There are two kinds of decoding engines in the multi-decoder, the DTMF Engine, and the Sphinx Engine. The DTMF Engine uses the DTMF Decoder to detect and decipher DTMF signals in the audio stream. The Sphinx Engine uses Sphinx2 to decode speech from the audio signal. In a typical setup, in addition to the the DTMF Engine, the Audio Server serves also requests decodings from two running copies of the Sphinx Engine, one configured for male speech and another configured for female speech.


Audio Server

The audio server captures audio signals from an audio source, e.g. a microphone or a Gentner device (see the Gentner component). It then sends these processed signal segments to one or more decoding engines. The decoding engines act as decoding servers for the Audio Server, returning the decoded hypotheses of the speech signal.

The Audio Server is a Galaxy Communicator server.

Being the interface between the audio device and the Olympus architecture, the Audio Server does not accept input in the form of a Galaxy frame (it only generates output frames). However, the Audio Server has to be able to handle a number of frames that control its state and are emitted by the Session Manager or the Interaction Manager.

See Configuring the Audio Server.

Input Frames

Output Frames

DTMF Engine

The DTMF Engine recieves audio segments on a port from an Audio Server and uses the DTMF Decoder to detect and decode DTMF signals.

Sphinx Engine

The Sphinx Engine (also known as the PocketSphinxEngine) receives audio segments on a port from an Audio Server and uses PocketSphinx to decode speech. The pocketsphinx libraries are compiled into the Sphinx Engine.

This Engine is also responsible for computing some features that are used by Helios, including the per-word language model backoff score. This scoring is based on Rong Zhang and Alexander Rudnicky's 2001 Eurospeech paper (see Publications#Helios.)

Personal tools