Go to the first, previous, next, last section, table of contents.


CHATR Interaction

The CHATR system can be communicated with in a number of ways. It can be used interactively via a command line interface, in batch mode, via a pipe, or as a server mode dealing with multiple requests from a network. Another possible mode of communication could be as a sub-module of some larger system. CHATR C functions would be called directly by linking your executable with `libchatr.a'. Such an interface has not yet been fully determined--it can be done now, but it is not as clean as it should be.

CHATR Interaction Modes

The four currently available interaction modes are: interactive, pipe, batch and server. In each of these cases there are two distinct interpretation modes: command mode or text-to-speech (tts) mode. The default is interactive, command mode.

Interactive Mode

When interactive, a prompt is given and the user can type either commands to the command line interpreter, or text to be spoken in tts mode. The command line interpreter is based on the GNU readline library and hence allows command line editing and history (as in all good shells). The edit commands are EMACS-like. CHATR offers command completion, argument completion, variable name completion, and filename completion. TAB is the default completion key.

To exit interactive mode type either end-of-file (typically ctrl-D) or the letter `q' and return. If the variable chatr_confirm_exit is set to a non-nil, confirmation is asked for before CHATR exits.

Commands may go over onto other lines. A secondary prompt is given when a command is incomplete (no closing bracket(s)). At any time the interrupt character (typically ctrl-C) will interrupt whatever CHATR is doing and return to the top level.

Pipe Mode

In pipe mode CHATR will read commands (or text in tts mode) from standard input without prompting. No command line editing is available. End of stream is signaled by end of file. This mode is designed for use in communicating with other programs that generate CHATR commands or text to be spoken. It is not intended to be used for typed (keyboard) input.

Batch Mode

In batch mode CHATR does not read commands (or text) from its input channel. Only files specified on the command line are read and processed. This is designed for long jobs where user interaction is not required.

Server Mode

CHATR has a server mode based on a BSD socket. CHATR may be run as a server on a known host and process commands or text received over the network. This version can have many preloaded databases, and particular files can be loaded before server mode is initiated. This allows faster speech synthesis than if the user had to start a new version of CHATR each time--client programs do not need to wait for requested synthesizers to become available. See section Using Server Mode, for a full explanation and example.

In server mode, CHATR first processes its command line arguments (thus allowing configuration) then loops waiting for connections on a known socket. By default this is port number 2234, but it may be user selected by setting the variable chatr_server_portnum. When a connection occurs, a new version of CHATR is forked giving the client a version of CHATR with much information already preloaded.

One audio output mode that is specifically designed for server use is the socket mode. This allows synthesized audio output to be given back to the client machine. When a client first connects, it should identify a socket that is waiting for (ulaw 8k) data. This is done by the Audio command. For security reasons that command may not be available to random network client programs, so a single high level function is provided. Once connected, a client program should send something like

     (Output_To `as71.itl.atr.co.jp' 4444)

Note the apostrophes. The machine name may be either a name or a number (e.g. IP address `133.186.36.171'). There should be a server on that machine waiting to receive connections on the specified socket number. The audio output may then be received.

Next, a few safe commands may be given. CHATR offers a number of commands which would allow clients access to the server's filesystem. From the server's point of view it wishes to restrict which commands are available to random clients. Therefore, for security reasons, the server side may set the variable chatr_secure_functions with a list of functions which a client is allowed to call. These functions may call other functions, but in that case the server is responsible for their content, so they can be assured safe. This may have to change as security becomes a bigger issue.

Basically, the commands available are Output_To and tts commands. Speaker selection should be available too. The exact availability depends on how the server is started, but it does offer control over the services being offered.

The program chatr_pipe offers a very simple example of a CHATR client program. It opens a connection to a server and sends down some initialization commands, then reads text from standard input and sends it to the server to be synthesized. chatr_pipe also starts a server to receive the synthesized waveform and writes the ulaw 8k data directly to standard out. Commands can then take the form

     echo hello world | chatr_pipe -h as71 >/dev/audio

CHATR Interpretation Modes

In any of the four interaction modes the input data may be interpreted in one of two modes.

Command Mode

In command mode everything given to CHATR is treated as a CHATR command from CHATR's Lisp-like command language. Commands are of the general form

     ( <command name> <arg~1> <arg~2> ... )

Commands start with an opening bracket which may seem awkward for those not used to Lisp. Single (unbracketed) atoms are treated as variables and their value returned. There is one special command which does not require parentheses: `q', which quits the system. See section CHATR Command Language, for more details of the language.

Text-to-Speech Mode

In tts mode, everything that is given to CHATR is treated as text to be rendered as speech. There are a number of options available depending on what form the text is presented in. Some exist to deal with Japanese written in romaji or Kanji/Kana. See section Making CHATR Speak, for some examples. Files containing mixed English/Japanese can also be used as input. The appropriate language synthesis system will be selected. See section Multi-lingual Text Processing, for further details.

A simple method is provided for including commands embedded within the text, allowing more control for the user without having to use the Lisp-based command interface. Basically, key words may be defined which denote CHATR commands. These are typically selection of different speakers etc. In the default system, all these commands start with the character `@', but users may define any symbol to be command. Embedded commands are defined in the variable tts_esc in the library file `lib/data/tts.ch'. They include speaker selection commands such as @f2b, @wnc600, @MHT etc.

Command Line Options

When started, unless -q is specified, CHATR will first load the user's `.chatrrc' file. Then, if files have been specified as arguments, each will be loaded and evaluated in turn. If interactive or pipe, CHATR will then read from standard input.

The command line syntax is

     chatr [options] file~0 file~1 ...

Where options are

-i or --interactive
Force CHATR to be interactive, giving a prompt and printing resulting values of command executions using readline (if installed with that option).
-p or --pipe
Force CHATR to act as a pipe, irrespective of the type of standard input. This causes commands to be read from standard input, without prompts or printing values of evaluated commands.
-b or --batch
Force CHATR to work in batch mode. This option means that no input from standard input will be read, but files specified as arguments are loaded and evaluated.
--server
Start in server mode. No interaction is made through a terminal, but named files are processed. CHATR waits for clients to attach on a socket, by default socket 2234. Connections and disconnections are displayed on standard out.
--libdir Library-Directory-Name
Start with library directory set to Library-Directory-Name. This allows a version of CHATR to run with a different library directory than the one named at compile time.
-h or --help
Print summary of CHATR's command line options.
-q
Do not load user's `.chatrrc' file or system initialization files.
-v or --version
Print current version number of CHATR and then exit.
-tts
Run in text-to-speech mode. By default this means say the contents of each file given as an argument (batch mode). The -i and -p options may be used to override this.
-jtts
Run in Japanese (Kana/Kanji) text-to-speech mode. By default this means `say the contents of each file given as an argument - in batch mode'. The -i and -p options may be used to override this. The synthesizer will use the synthesis methods defined in the user's startup file. It is the user's responsibility to ensure a suitable method exists. The input may in Kana and/or Kanji. If any English text is encountered it will be spoken letter-by-letter in Japanese phonetics.
-mtts
Run in mixed language text-to-speech mode. Kana/Kanji will be rendered in Japanese and alpha-numerics in English. Note this may not always be what is desired. In addition to automatic language detection, tts commands may be inserted in-line in the text. Such commands typically start with an @. See variable tts_esc in `lib/data/tts.ch' for a list of defined include commands.

After every option is processed, remaining arguments are treated as filenames and loaded as CHATR command files. However, if the file name starts with a left parenthesis, it is treated as a CHATR command and interpreted as such(2). In tts mode the files are treated as text and spoken. Commands (in non-tts mode) may be specified on the command line. For example, suppose we have a file with many commands in it and wish to run the function test2 after loading. This may be done in batch mode on the command line as in

     chatr -b tests.ch "(test2)"

CHATR Command Language

The command language for CHATR is basically a small Lisp system, though it is in fact more like `Scheme'. As has already been shown, commands can be called directly. In addition to top level commands that call CHATR internal C functions, the input language also supports variables and functions. It also supports the basic library functions one would expect from a tiny Lisp interpreter.

Variables may be set using the set command. This more like the Lisp setq command (or Scheme set!), as it does not evaluate its first argument. Variables are often used to hold utterances, though they can hold any value, including lists.

Function definitions have the following syntax

     (define name arg-list comm~0 ...)

If the first command in a function is a string, it is treated as a document string and returned by various help functions.

For those interested, CHATR is dynamically scoped. A variable name will evaluate to its most locally named occurrence, set will set its most locally named variable--that is argument names may be the same as global variables, but you will always refer to the argument while within that function. Also, because it is dynamically scoped, you may refer to argument names in the scope of the function caller. Anonymous functions (cf. lambda functions) are available via the function function.

Basic standard flow of control, logical operators and equal are included. Two looping functions are available, for and mapc. A simple so-called `naive reverse' of lists can be defined within CHATR as

     (define append (a b)
       (if (not a)
           b
         (cons (car a) 
               (append (cdr a) b))))
   
     (define reverse (a)
       (if (not a)
           a
         (append (reverse (cdr a))
                 (cons (car a) nil))))

A complete list of functions can be obtained by typing Help. Also see section CHATR Commands.

Another example more appropriate to CHATR shows how the command language can be used make CHATR synthesize a number utterances and save them in a particular directory. For example

     (set outdir "/tmp/")
     (set examples '("MHT_0001" "MHT_0002" "MHT_0003")')
     (speaker_MHT)

     (define synth_and_save (name)
        (set utt1 (test_seg name))
        (Save Wave utt1 (strcat outdir name)))

     (mapc synth_and_save examples)

The function test_seg is defined for every database. It loads in a segmental description of the given utterances, excludes that from the database and then synthesizes it from the remaining units. The function mapc applies the first argument (a function) to each member of the list given as the third argument. The function test_txt is also defined for many (though not all) databases and offers a textual representation of an utterance. Again, it loads the description, excludes that example from the database, and synthesizes from text. Another option is test_pf, which loads a detailed structure representation of an utterance, excludes it from the database and synthesizes it. However, only a few databases have this defined.

There is currently no comprehensive automatic garbage collection in this Lisp. However, utterances (typically the biggest structures) do have reference counts allowing them to be garbage collected. It is possible to free data by hand via a free function, but this is difficult to do safely. Later versions will require a real garbage collector. However, for normal use, the small amount of garbage generated will not be a problem. Note the tts function generates no garbage. It has been used for hours (can even be days if you are willing to listen) at a time.

Some Common Commands

CHATR offers what may appear to be a bewildering set of commands. This section gives a brief summary of some of the more commonly used. Functions may be built in to CHATR, i.e. the function names are directly linked to functions written in C. Also, many user functions have already been defined in Lisp (some are loaded automatically at startup time) to make CHATR easier to use.

Each definition is followed by a few examples.

set
Used to set a value to a variable.
     (set radiodir "/usr/pi/BU-RADIO/f2b/")
     (set test_utts '("f3ast01p1" "f3ast01p3"))
     (set utt1 (Utterance Text "A simple test"))
load
Loads a file of CHATR commands and interprets them. This allows you to put a set of commands in a file and load that each time you want those commands. Can also be text to be spoken by CHATR, either plain or coded (e.g. HLP, PhonoWord or similar).
     (load "cep_setup.ch")
     (load "f2b_stats.ch")
     (load "HLP_coded_announcement")
load_library
Loads a file by searching the paths in the variable load_path. This works in the same way as EMACS library access. The standard library includes a number of useful files.
     (load_library "xwaves.ch")            - sets up for Xwaves use
     (load_library "f2b_dur_nnet.ch")      - f2b durations
Utterance
This function builds the basic utterance, returning an utterance structure. It's simplest use is for text, but it supports many different types of input. (See section Basic Utterance Types, for a full list). Note this function only builds the basic input structure, it does not do any synthesis. However, see section Basic Utterance Types, for information about the variable utt_hook.
     (set utt1 (Utterance Text "This is a simple utterance."))
Synth
This function does the real work. It will generate a waveform from the input values in an utterance, using the current speaker parameters.
     (Synth utt1)
Say
If you want to hear a synthesized utterance, you must send the waveform to your audio device. Say does just that.
     (Say utt1)
As Lisp is a functional language, the above three steps may be combined.
     (Say (Synth (Utterance Text "This is a simple utterance.")))
Of course if this is still too difficult you can use the SayText function.
SayText
This puts the above steps together and takes a string as an argument. It constructs an utterance, synthesizes it and sends it to the audio device.
     (SayText "This is an even simpler utterance.")
Save
This function allows a user to save parts of an utterance to files. Save takes three arguments, a type, (e.g. Wave, Segments, UnitLabels, F0 or other), an utterance, and a filename. If the filename is `"-"', the output is sent to stdout, though you probably don't want this option for waveforms.
     (Save Wave utt1 "ex1")
     (Save UnitLabels utt1 "-")
Waveforms are saved in the format specified by the Wave_FileType command. Possible values are RAW, NIST, ULAW or ESPS. In the case of ESPS, the command btosps (binary convert to sps) must be in the user's path.
tts
The function takes a single argument, a file name, and will say the contents of the file. If the argument is "-", it will read from standard input. It will then say everything you type at the prompt. Note that this is performed sentence-by-sentence, so you must complete a sentence before it will do anything. Sentences are terminated by a full stop, question mark, exclamation mark, or blank line. The standard input mode is terminated by an empty sentence, best accomplished by finishing a sentence and then entering a single full stop on a new line.
     (tts "war_and_peace")

               or

     (tts "~/RMAIL")

               or

     (tts "-")
     Hello, my name is Albert Einstein.
     Can you tell me how this system
     works? I would be interested to know.
     .
speaker_<ref number>
A number of functions are available to select different speakers. Various parameters may need to be changed when switching between speakers, so functions are provided to automatically select the appropriate values. The number and type of speakers available constantly increases such that any list will be out of date. See section Speakers at ITL - CHATR Version 0.9.1 Alpha, for a reasonably up to date list. Here we will detail only a few core speakers
speaker_f2b
Female American English speaker built from news stories.
speaker_wnc600
Male British English speaker (Nick Campbell) built from 650 phonetically balanced sentences.
speaker_MHT
Male Japanese speaker built from ATR 503 sentences (Bset).
speaker_fmp559
Female Japanese speaker built from stories and conference registration dialogues.
These functions load the necessary databases and intonation statistics etc. If called repeatedly they do minimal work to reselect the speaker, thus databases are only loaded once.
     (speaker_f2b)
     (Say (Synth (Utterance Text "And here is the news")))
     (speaker_wnc600)
     (Say (Synth (Utterance Text "But it is not the BBC")))

Library Load-path

CHATR has a notion of a library directory or directories. The value of the CHATR variable load-path is a list of directory names. By default it has the name of the CHATR library directory set at installation time. The CHATR library directory contains a number of CHATR command files you will find useful: phoneme set definitions, duration models, intonation statistics, etc. The load-path variable is used in the same way GNU EMACS uses its load-path variable. Certain CHATR commands (particularly load_library) search for file names relative to the directories listed in load-path. The given file name is appended to the values of load-path and the file is searched for. The first occurrence is used. You may set load-path to include path names of your own private CHATR libraries too. You may also set the initial run time form of load-path to point to a library directory other than that which was set at compile time. The command line version --libdir allows an alternative initial library directory.

EMACS Interface

To make the use of CHATR more convenient an EMACS interface is provided. It allows buffers (and regions) to be selected and spoken by CHATR. The interface is menu driven and hence requires EMACS version 19 or later (or MULE-2.1 or later). Japanese is also supported in normal EMACS but of course it will not be displayed properly.

The interface falls into two categories, a menu driven tts mode and an EMACS mode for editing CHATR source files. The second of these is basically a modification of lisp-mode but with a few CHATR specific functions and facilities defined. The menu driven tts mode allows users to select regions (or whole buffers) of text (may be mixed Japanese/English) and have them rendered as speech. Crude control is also available so that various voices may be selected and commands to CHATR explicitly given. Note you must have set up your audio device in your `.chatrrc' file before this interface will work.

To use the EMACS interface, first you must ensure that the file `chatr.el' is in your EMACS load-path, and that the appropriate EMACS Lisp CHATR functions are loaded. To do this add the following lines to the end of your `.emacs' file in your home directory.

     ;;; Add chatr.el to your load-path 
     (setq load-path (cons "/DB/PI/chatr/lib/etc" load-path))
     ;;; Add chatr-menu to top menu-bar 
     (autoload 'chatr-minor-mode "chatr" "Menu for using chatr." t) 
     ;;; Switch chatr-menu on always 
     (chatr-minor-mode 1) 
     ;;; run chatr as inferior process
     (autoload 'run-chatr "chatr" "CHATR as inferior process." t) 

More advanced users may also wish to add the following which defines a CHATR mode for editing CHATR source files.

     ;;; Lispish mode for editing CHATR command files 
     (autoload 'chatr-mode "chatr" "Mode for editing chatr files." t)
     (setq auto-mode-alist 
           (append '(("\\.utt$" . chatr-mode) 
                     ("\\.chatrrc$" . chatr-mode) 
                     ("\\.tts$" . chatr-mode)
                     ("\\.ch$" . chatr-mode)) auto-mode-alist))

A further variable you may wish to set identifies the CHATR binary. By default this is set to chatr (which should be in your path). If you wish to use a more recent version of CHATR (that is a less stable one, but one that will offer more options, and possibly better synthesis), add the following line to your `.emacs' too.

     ;;; To get the latest *dangerously exciting* version of CHATR
     (setq chatr-program-name "chatr-alpha")

It is also possible to talk to CHATR as an EMACS inferior process. In fact that is how the chatr-minor-mode (menu mode) works. You may start CHATR by the EMACS command run-chatr. This will start CHATR in a buffer called `*chatr*'.


Go to the first, previous, next, last section, table of contents.