Go to the first, previous, next, last section, table of contents.


Playing Audio Output

One (and some might say the most important) aspect of the system is to allow the playing of synthesized utterances. This unfortunately is not that easy. Different machines have very different hardware for playing audio signals. CHATR attempts to support as many methods as possible.

Ideally we should have a uniform, machine independent method of playing waveforms. Consider for one moment the topic of computer displays, in contrast to computer audio. A long time ago, computer graphics required user programs to access device registers to control line drawing and text placement. Now we have systems like X-Windows, which abstract the user from the hardware so that the same graphics program can run on many different types of hardware. The same convenience should also be available for audio input and output. An audio server would offer a uniform way of taking a waveform and playing it. The server would deal with encoding forms and byte order. If sample rate conversion was necessary, it would do the best it could. Unfortunately we are not yet there with a standard audio server system. There are, however, at least two systems available which offer some of the desired capabilities, and CHATR supports both. A number of other methods are also offered, as well as a generic command-driven one where the user can specify an arbitrary command to play the waveform.

Audio Systems

The audio output method is selected using the Audio command. The syntax is

     (Audio Device name)

where name is one of the following

     AU_CONN
     DATLINK
     NA_CONN
     SUN_AU
     AU_COMMAND

Each method invoked by the above names will be briefly described in the following sections.

DEC AudioFile

This system is network transparent. It will connect to servers across the ethernet and not require CHATR to be restricted to playing the waveform on the machine it is actually running on. AudioFile only works an a few particular pieces of hardware. It is currently supported by Suns, DEC (some proprietary hardware devices) and DEC Alphas. You must have an audio server running on the machine you wish to connect to before you can use AudioFile (Asparc on Suns). AudioFile output is selected using the command

     (Audio Device AU_CONN)

You can select which audio server to connect to by setting the UNIX environment variable AUDIOSERVER. (Environment variables can be set within CHATR using the setenv command.) If that is unset, it uses the X Windows DISPLAY variable. This is typically what you want as the graphics box you are sitting next to is typically the one the speaker is next to.

On Suns, AudioFile only accepts waveforms in 8K ulaw which makes it less than useful.

DAT-link

CHATR can directly access a DAT-Link audio server if compiled with such support. To select direct DAT-Link access use

     (Audio Device DATLINK)

If DAT-Link support is not compiled into your version of CHATR, but you still want CHATR's output to be sent to the DAT-Link, you can use the AU_COMMAND option as in

     (Audio Device AU_COMMAND)
     (Audio Command "naplay -f raw -e Linear -o mono -s $SR
$FILE")

Note this will always output messages to the terminal.

NCD NetAudio

This audio server is also network transparent. Unlike AudioFile, it accepts waveforms at any sample rate and encoding as it will perform conversion. The distributed system is only supported by Suns using 8K ulaw as the actual output to the hardware device, though it will convert any sample given to it. A local ATR modification has made the output 16K linear which sounds much better. The sample rate conversion is not very good so CHATR uses its own when outputting to NetAudio.

A NetAudio server (`ausun' on sparcs) must be running before this will work. To select NetAudio for output use

     (Audio Device NA_CONN)

Hopefully the problems with NetAudio will be solved soon. A start would be improved hardware support, better sample rate conversion, and even the introduction of dynamic sample rate selection.

Sun /dev/audio

Sun's `/dev/audio' offers an audio output device. On older Suns this device only supports ulaw at 8kHz. CHATR has a method for directly using this form, which is also compatible with newer Sparc10's. To use the audio directly on the Sun, select

     (Audio Device SUN_AU)

CHATR will automatically resample (if necessary) at 8K and change the wave to ulaw before writing to the device directly.

Later Suns (Sparc10s) offer a more powerful audio device with a number of different sample rates and encodings. CHATR does not currently support these directly. It is possible to dump the waveform at any sample rate (see section Using UNIX Commands), but the user must provide their own program to send it to /dev/audio. This is usually not a problem as Sun provide such programs in their audio distribution.

Using UNIX Commands

Since servers are not available on all machines, a more generic method of audio output is also available. The system will dump a file (in raw, shorts, native byte order), and the user can specify a command that will play the file.

The skeleton command may use the shell variables $FILE and $SR to refer to the file and its sample rate. The sample rate is given in Hz, so you may need to divide it by 1000 for commands that use kHz. As an example, for a DASBOX on a DEC MIPS, the command string should be set as

     (Audio Command "daout $FILE -f `expr $SR / 1000`")

and the selection of output command is

     (Audio Device AU_COMMAND)

Some systems may not have dynamically switchable sample rates in their audio hardware, so CHATR provides a sample rate converter. See section Wave Modification, for more details.

Audio Modes

CHATR offers two output modes, synchronous and asynchronous. The difference is in whether CHATR should wait for a waveform to be finished playing before continuing execution of other commands. In synchronous mode, waveforms must be fully played(7) before CHATR can continue. In asynchronous mode, the command to play the waveform is queued in a separate audio spool process and CHATR can continue executing. Asynchronous mode is useful when generating a large amount of continuous speech (e.g. in tts mode).

Modes are changed by a simple command. To change to asynchronous mode, use

     (Audio Mode Async)

To change back to synchronous mode, use

     (Audio Mode Sync)

The overall system default is synchronous. However, in tts mode, asynchronous output is automatically selected.

If resampling is to be performed on a waveform before playing, it is advisable to do this within CHATR before sending it to the spooler. As resampling can take a 10% (or more) of the time it takes to say an utterance, it is wise to do this asynchronously. See section Wave Modification, for information on getting CHATR to resample.

Wave Modification

A limited form of wave modification is available in CHATR. It is not really CHATR's job, but as many people do not have audio tools and hardware can be pretty primitive, CHATR goes some way to offer support.

When a specific sample rate is required for output, CHATR can resample the data to alternative frequencies. The command is

     (Audio Required_Rate number)

Where number is an integer specifying the frequency in Hz. When in audio command mode, this will cause the saved file to be at that specified frequency.

As the sample rate conversion may take a significant amount of time, CHATR allows specification of a conversion-quality factor. The command is

     (Audio Resample_Quality number)

number may take one of three values; 0, 1 or 2. The default 0 is the highest quality but takes the longest time. 2 is quickest but the worst quality (though quite acceptable).

CHATR supports several forms of output encoding, which are specified using the command

     (Audio Required_Form type)

type may be one of ulaw, lin16MSB or lin16LSB.


Go to the first, previous, next, last section, table of contents.