Configuring the Audio Server

From Olympus
Jump to: navigation, search

The list of the configuration properties and valid values that the Audio Server accepts are below. See Agent Runtime Configuration Overview for the general configuration file format.

Contents

sps

The number of samples that the Audio Server is receiving each second.

The default value is 8000.

engine_list

Comma separted list of the name, host, and port number of Sphinx recognition engines separated by colons.

If this is not specified, the engine list is read from sphinx_engines.txt in the current configuration directory.

Example

 engine_list = oneEngineName:localhost:9990, anEngineName:localhost:9991, theEngineName:localhost:9992

log_full_session_input

Boolean integer value for whether the Audio Server should log the full session audio. The name of the file with the full session audio will appear in the audio server log as full_session_audio_file= |path\filename.raw|

Default value is 0 (disable the full session audio log).

log

Boolean integer value about whether or not to log all of the messages to a log file.

Default value is 1 (enable the log).

verbosity

Verbosity defines what what types of log messages should be displayed in the console by the audio server. The default value is nodisplay.

all

All messages are displayed and logged.

stdonly

Do not display debugging messages.

nodisplay

Don't display any messages.

run_mode

Specifies the run mode of the audio server. The default value is live.

live

In live mode the audio server records utterances, endpoints them, and returns the result of the utterances to the rest of the system.

batch_vad

The batch_vad mode is used for testing a VAD (voice activity detector) on a set of given RAW audio files that are specified in a CTL (control) file.

  • It reads in RAW audio files that are each listed on their own line in a control file.
  • It then prints out information from the VAD into the specified output file for each file specified in the CTL file.

- The output file information is presented in the following tab delimited format:

 <filename>  <total bytes read>  <time stamp of the event>  <speech level>  <noise level>  <smoothed speech level>  <smoothed noise level>  <new dialog state>

batch_asr

NOTE: This mode currently does nothing. It is disabled in Olympus 2.0 due not being updated to interact with Apollo and the various VADs in the audio server.

In batch ASR (automated speech recognition mode) mode in Olympus 1.0 the audio server reads in RAW audio files whose location is specified, and sends the decoding of the results to the system.

root_dir

The root directory where the RAW audio files are stored, when running in batch mode.

ctl_file

ctrl file for batch mode

output_file

output file for batch mode

vad

The type of VAD (voice activity detection) algorithm to use.

Possible Values

Default value is power.

power

Use a simple energy/power based VAD.

vad_config Parameters

window_width

The number of milliseconds that the VAD works with at a time.

Default value is 800.

power_threshold

The threshold which the average power must surpass to be speech.

Default value is 20,000.

gmm

vad_config Parameters

no_model
window_width
energy_threshold
model_dir
fe_frame_rate
fe_window_length
fe_fb_type
fe_num_cepstra
fe_num_filters
fe_fft_size
fe_lower_filter_freq
fe_upper_filter_freq
fe_pre_emphasis_alpha
prior_noise_level
prior_speech_level
prior_noise_level_weight
prior_speech_level_weight
snr_estimation_step
fe_normalize_c0
log_frame_info

sphinx

New in Olympus 2.5: Use the energy/power VAD that is offered by PocketSphinx's 0.5 SphinxBase.

vad_config Parameters

delta_silence

Integer that represents the necessary energy change to go from speech to silence.

delta_speech
min_noise

Minimum background noise energy level.

max_noise

Maximum background noise energy level.

window_width

The number of milliseconds that the VAD works with at a time.

speech_onset

Change in power

silence_onset
leader

Number of expected leading (beginning) frames to the add once speech starts.

trailer

Number of expected trailing (ending) frames to the add once speech starts.

adapt_rate

Float that represents the adapt rate.

vad_config

A comma separated list of all of the configuration parameters for the current VAD in use. See vad for the configuration parameters appropriate for each VAD.

Default value is empty (no parameters).

Example

 vad_config = no_model=true, energy_threshold=7, sampling_rate=16000
Personal tools