The multi-decoder is not a program, but a collection of programs that serve as a way of simultaneously passing audio streams into several decoding engines. The audio streams are captured by the Audio Server, which requests decodings of the streams by the decoding engines. There are two kinds of decoding engines in the multi-decoder, the DTMF Engine, and the Sphinx Engine. The DTMF Engine uses the DTMF Decoder to detect and decipher DTMF signals in the audio stream. The Sphinx Engine uses Sphinx2 to decode speech from the audio signal. In a typical setup, in addition to the the DTMF Engine, the Audio Server serves also requests decodings from two running copies of the Sphinx Engine, one configured for male speech and another configured for female speech.
The audio server captures audio signals from an audio source, e.g. a microphone or a Gentner device (see the Gentner component). It then sends these processed signal segments to one or more decoding engines. The decoding engines act as decoding servers for the Audio Server, returning the decoded hypotheses of the speech signal.
The Audio Server is a Galaxy Communicator server.
Being the interface between the audio device and the Olympus architecture, the Audio Server does not accept input in the form of a Galaxy frame (it only generates output frames). However, the Audio Server has to be able to handle a number of frames that control its state and are emitted by the Session Manager or the Interaction Manager.
The Sphinx Engine (also known as the PocketSphinxEngine) receives audio segments on a port from an Audio Server and uses PocketSphinx to decode speech. The pocketsphinx libraries are compiled into the Sphinx Engine.
This Engine is also responsible for computing some features that are used by Helios, including the per-word language model backoff score. This scoring is based on Rong Zhang and Alexander Rudnicky's 2001 Eurospeech paper (see Publications#Helios.)