Go to the first, previous, next, last section, table of contents.


Graphic Display

Although audio output is the main purpose of a speech synthesis system, seeing a graphical representation of an utterance can be useful during development or research. CHATR supports such a system. Rather than include a whole graphics functionality directly within CHATR, software has been written to interface with existing graphics packages.

Note that color-hungry applications, such as some Internet browsers, may leave too little resources for these graphics packages to run efficiently or at all. If this occurs the simplest solution is to exit that application while using the graphics.

XWAVES

XWAVES is the name of a waveform display package developed by Entropics. It has its own excellent `help' facility, so only the method of invocation will be described here.

Basic Use

XWAVES is started from within CHATR using the command

     (Display Open mode)

where `mode' is one of

XWAVES
Displays Wave, Voicing Probability, Target F0, Segment Stream, Intone Stream, and the Word Stream.
XWAVES+
Displays Wave, ac Peak, rms Value, Voicing Probability, Actual F0, and the Segment Stream.
XWAVES2
Displays as per XWAVES and XWAVES+ simultaneously.

Initially two windows will open, a `Signal Display' control window and a `Miscellaneous Controls' panel. Somewhat generalizing, the former is used to determine what gets displayed while the later how it is displayed. `Help' may be obtained by clicking on the `xwaves MANUAL' button in the `Signal Display' window. Note no waveform or stream display will take place yet.

Once an utterance has been defined and synthesized, it may be displayed using the command

      (Display utt1)

Several windows will now open, displaying various waveforms and streams of the specified utterance. The number and type are dependent on the display `mode' selected, as described earlier.

If no argument is given with the `Display' command, the last synthesized utterance will be displayed. If no utterance exists, a non-fatal error message to that effect is returned. If an utterance has been changed but not re-synthesized, a `No currrently generated wave' error message is returned.

To finish with XWAVES, click on the `QUIT!' button in the `Signal Display' window. Future versions of CHATR may support use of the `Display Close' command, a function not currently implemented.

A library file may be loaded before using XWAVES which sets up appropriate paths, etc. An example can be found at `chatr/lib/data/xwaves.ch'. To use this facility, issue the command

     (load_library "xwaves.ch")

This line may of course be added to your `.chatrrc' file.

Specific Display Selection

By default the parts of an utterance which are displayed are determined by the choice of `Display Method'; However, an optional second argument to the `Display' command allows the user to select what is displayed. This argument may be a single atom, any one of

     wave
     f0
     segment
     word
     intone
     unit

or a list of these, for example

     (Display utt1 (wave intone segment))

The `unit' argument will display marks of the selected units and the file name in the database where they were selected.

Displayed Unit Alignment

As the selected units will in general not be the same length as the targets specified in the Segment Stream, the Unit Stream and Segment Stream will not line up. If no signal processing is done on the waveform after unit selection (i.e. `DUMB' concatenation method) the Unit Stream will line up while the Segment Stream will not. If signal processing is done (e.g. `PS_PSOLA' or `NUUCEP'), the Segment Stream will (generally) line up, while the Unit Stream will not. Be aware that all durations are recommendations and not absolutes--boundaries may often not be exact.

XMG

XMG is a graphic display system developed in the Centre for Speech Technology Research at the University of Edinburgh. It is available free at time of writing - see section Glossary of Terms and Acronyms, for url. A `help' facility is included, so only the method of invocation will be described here.

XMG is started from within CHATR using the command

     (Display Open XMG)

Two windows will now open, a `command' window and a display called `graph0'. Note that no waveform or stream display will yet take place.

Once an utterance has been defined and synthesized, it may be displayed using the command

      (Display utt1)

Features displayed are: Wave, Target F0, Segment Stream, the Word Stream and elapsed time.

In server mode, XMG is started using the command xmg - server.

To send additional commands to the display server, the command Display Command is used. This command does not evaluate any of its arguments. As an example, to start a new window issue the command

     (Display Command new)

Inspector

An X-windows program is included within CHATR to look at the internal format of an utterance. It is not graphical as such but should not be dismissed lightly for that; for any utterance, all existing streams, their cells and cell contents are shown textually. By this means much information which would normally require searching of databases and opening of many files can be instantly displayed at the click of a window `button'.

With advanced versions of Inspector, cell contents are displayed superimposed on buttons. By merely clicking on these buttons, users may change CHATR-selected units to an alternative in the candidate list and have the utterance re-synthesized and played, either phoneme-by-phoneme or in its entirety.

There are currently three versions of Inspector.

     Inspect
     Inspect2
     Inspect3

Features of each version and method of use will now be explained.

Inspect

Inspect is called from CHATR using the command

     (Inspect utt1)

If no argument is given, the most recently synthesized utterance is displayed. If no synthesis has taken place, just the utterence will be displayed.

A command window named `xchatr' will now open containing `buttons' representing streams. Clicking on a particular button opens another window which displays that stream. If the stream is made up of a series of concatenated elements, each will be represented by another button which must be clicked to view details of that particular element.

A command line in the command window allows selection and loading of a different but already synthesized utterance.

This is a fairly basic version of Inspector and is really only kept for compatability with previous versions of CHATR.

To finish with Inspect, click on the `Quit' button in the command window.

Inspect2

Inspect2 is called from CHATR using the command

     (Inspect2 utt1)

If no argument is given, the most recently synthesized utterance is displayed.

A window named `Inspect2' will now open to display the WordStream, PhonemeStream and directory path of the database where units were selected from. All elements are superimposed on `buttons'. Clicking on an individual word gives a list of phonemes making up that word, plus details of chosen unit, index number, start/stop timings and selection/joint costs. Clicking on a syllable gives a list of unit candidates with similar information. Clicking on the `wave file' or `start-end' buttons results in the playing of either that portion of the original corpus recording or that particular phoneme respectively. Other buttons at the top of the window allow the playing of the utterance.

Perhaps the most powerful function of all is that by clicking on a button, alternative units may be selected from the candidate list and the utterance re-synthesized and played with those units in place. The result of this operation can be saved.

To finish with Inspect2, click on the `Quit' button in the top left-hand corner of the `Inspect2' window.

Inspect3

Inspect3 is called from CHATR using the command

     (Inspect3 utt1)

If no argument is given, the most recently synthesized utterance is displayed.

A window named `inspect.tcl' will now open which displays utterance phoneme, unit cost, joint cost, index, wave file directory path, start/length and F0, in vertical columns. A scroll-bar on the right allows viewing of the whole utterance. All elements of each stream are in the form of clickable buttons. Clicking on a phoneme will cause another window to open which displays the unit candidate list, and other details as in the previous window. Clicking on the wave file or start/length buttons results in the playing of either that portion of the original corpus recording or that particular phoneme respectively.

The most powerful function is run by clicking on a candidate phoneme; that phoneme is automatically cocatenated into the utterance currently displayed. The new version may be saved.

To finish with Inspect3, click on the `quit' button in the top riqht-hand corner of the `inspect.tcl' window.


Go to the first, previous, next, last section, table of contents.