Go to the first, previous, next, last section, table of contents.


Glossary of Terms and Acronyms

Programmers and engineers of all disciplines or nationalities love their TLAs; speech synthesis is no different. We hope to have covered all those used in this manual, and perhaps a few more. If you find any we missed (or got wrong!), please let us know for future versions...

ACL-DCI
Association for Computational Linguistics - Data Collection Initiative.
ACPA
Audio Capture and Playback Adapter.
ANN
Artificial Neural Network.
ASCII
American Symbolic Code for Information Interchange.
ASR
Automatic Speech Recognition.
ATR
Advanced Telecommunication Research.
http://www.atr.co.jp/
BEEP
British English Example Pronunciation.
btosps
Binary TO Signal Processing System.
car
Not an acronym. `Lisp' expression which refers to (and selects) the first item of a list held in a variable. See also `cdr'.
CAT
Not an acronym. CATegory of a word. HLP tags used to categorize words into Nouns, Verbs, Prepositions, etc. See NP, VP and PP.
cdr
Not an acronym. `Lisp' expression which refers to (and selects) the items of a list held in a variable, less the first item. See also `car'.
CELP
Code-book Excited Linear Prediction.
CEPLMA
CEpstral Resynthesis using a Logarithmic Moving Average filter.
CHATR
Collective Hacks from the Advanced Telecommunications Research laboratories. Well you did ask...
http://www.itl.atr.co.jp/chatr/
CMU
Carnegie Mellon University.
http://www.contrib.andrew.cmu.edu/
CSTR
Centre for Speech Technology Research. A department of Edinburgh University, UK.
http://www.ed.ac.uk/
CSLU
Center for Spoken Language Understanding. A department of Oregon Graduate Institute of Science and Technology, USA.
http://www.cse.ogi.edu/CSLU/
CVS
Concurrent Versions System. CVS is a front end to the RCS revision control system. It extends the notion of revision control from a collection of files in a single directory, to a hierarchical collection of directories consisting of revision controlled files. These directories and files can be combined together to form a software release. CVS provides the functions necessary to manage these software releases and to control the concurrent editing of source files among multiple software developers. CVS keeps a single copy of the master sources. This copy is called the source `repository'; it contains all the information to permit extracting previous software releases at any time based on either a symbolic revision tag, or a date in the past.
darpa
Defense Advanced Research Projects Administration. The central research and development organization for the Department of Defense (DoD), USA.
http://www.darpa.mil/
DTW
Dynamic Time Warping.
EGG
Electro-Glottal Graph. Device for measuring throat movement caused by speaking.
EMACS
Editor MACroS. A Macro-based editor and complete computing task environment.
ESPS
Entropic Signal Processing System.
FSF
Free Software Foundation.
http://www.gnu.ai.mit.edu/fsf/
HLCB
High Low Continuation Boundary. Tags used to mark intonation on syllables.
HLP
High Level Phrasing. Method of tagging speech with prosodic information.
HMM
Hidden Markov Model.
Holmes
John Holmes, one of the founders of speech synthesis.
HTK
Hidden (Markov model) Tool Kit. A product of Entropic Research Laboratory, Inc.
http://www.entropic.com
IFT
Illocutionary Force Type. Strength or emphasis put on a phrase. Speech act information - meaning you want to convey above and beyond just the words spoken. As an example, the English phrase `I understand' can mean `Thank you for informing me (I'm happy)' or `Now I know what you intend I'm not happy' or even `I heard what you said but haven't a clue what you mean' depending on how and when it's said. That's IFT at work. The simplest case is the difference between a question and a statement using the same words.
IntoneStream
Series of symbols representing the intonation required on an utterance. Attached to the WordStream.
IPA
International Phonetic Association. Representative organisation for phoneticians.
http://www.arts.gla.ac.uk/IPA/ipa.html
JToBI
Japanese Tones and Break Indices.
jtts
Japanese Text-To-Speech.
LDC
Linguistic Data Consortium. A group established to broaden the collection and distribution of speech and natural language databases for the purposes of research and technology development in automatic speech recognition, natural language processing and other areas where large amounts of linguistic data are needed.
http://www.ri.cmu.edu/comp.speech/Section1/Data/ldc.html
LFG
Lexical Functional Grammar.
LISP
LISt Processing language. A programming language originally developed for Artificial Intelligence (AI) but now used mainly in the speech synthesis field.
LMA
Logarithmic Moving Average. Mathematical reference to a method used in audio filtering. See CEPLMA.
LPC
Linear Predictive Coding.
LTS
Letter To Sound.
LVQ
Learned Vector Quantization.
M-ACPA
Multimedia - Audio Capture Playback Adapter.
MARSEC
MAchine-Readable Spoken English speech Corpus.
MFCC
Mel Feature Cepstral Co-efficients.
mtts
Multi-lingual Text-To-Speech.
mrpa
Machine Readable Phonetic Alphabet.
Mu-law
Not an acronym. Pronounced `mew-LAW' - the `Mu' is actually the Greek letter `Mu'. An 8-Bit compression code for audio signals including speech. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data. It is a companding technique. That means it carries more information about the smaller signals than the larger. Sometimes appears in documents written as `ULAW'.
MULE
MUlti Language Editor. Extended part of EMACS.
NFS
Network File System. A distributed file system that provides transparent access to files residing on remote disks. Developed at Sun Microsystems in the early 1980's.
NIST
(American) National Institute STandards.
NLP
Natural Language Processing.
NN
Neural Network.
PN
Noun Phrase. HLP tag used to denote an input word as a Noun.
nus
Non-Uniform (unit) Selection.
nuuph
Not an acronym. The `nuu' is the Greek letter `Nuu'. Japanese phoneme set.
NUUCEP
Not an acronym. The `NUU' is the Greek letter `Nuu'. NUUtalk CEPstral synthesis routines.
OAPD
Oxford Acoustic Phonetic Database. Contains data on vowel-consonant and consonant-vowel combinations in both stressed and unstressed locations.
PhoneStream
Series of symbols representing the phonemes of an utterance. Attached to the WordStream.
PhonoWord
Type of input accepted by CHATR. Allows specification of prosodic phrases and intonation features. Utterance is tagged with four letters (D=Discourse, S=Sentence, C=Clause and P=Phrase) to specify phrase levels, and other letters (e.g. H and L) to indicate emphasis and accent.
PP
Preposition Phrase. HLP tag used to denote an input word as a Preposition.
PphraseStream
Series of symbols representing the prosodic phrases of an utterance. Attached to the WordStream.
PSOLA
Pitch Synchronous Over-Lap and Add. Algorithm to indepenently modify the fundamental frequency and duration of a speech signal. Used during concatenation of selected units from a finite speech database such that minimal prosodic damage occurs due to target/selected unit mismatch.
RCS
Revision Control System. A system that keeps track of different versions of files. If one person is editing a source no other developer may do so. Thus all sources are by default read-only. When a file is checked out by a developer, they may change it but no other developer may check it out at the same time. When a developer is finished, they may check in the file thus allowing others to check it out.
RFC
Rise Fall Continuation. A now become dated method of tagging phoneme-sized segments with duration and frequency values.
SegStream
Series of symbols representing the segments of an utterance. Attached to the WordStream.
SGML
Standard Generalized Markup Language.
SphraseStream
Series of symbols representing the syntactic phrases of an utterance. Attached to the WordStream.
Stream
One of a sequence of cells containing symbols generated and/or interpreted by CHATR and linked to an utterance (and other streams). Causes changes in the timing, intonation and prosody of the synthesized output.
SylStream
Series of symbols representing the syllables of an utterance. Attached to the WordStream.
TIMIT
A large speech corpus from TI and MIT.
TLA
Three (or sometimes more or less) Letter Acronym. Initials represent a well (or often un-) known title or description.
ToBI
Tones and Break Indices.
tts
Text-To-Speech.
ULAW
Not an acronym. Pronounced `mew-LAW' - the `U' is actually the Greek letter `Mu'. An 8-Bit compression code for audio signals including speech. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data. It is a companding technique. That means it carries more information about the smaller signals than the larger. Sometimes appears in documents written as `Mu-law'
utterance
A series of words you wish CHATR to synthesize as speech. Basically the input to CHATR, in whichever form it may take.
VP
Verb Phrase. HLP tag used to denote an input word as a Verb.
VQ
Vector Quantization.
WordStream
Series of words to be `spoken' by CHATR, derived from the utterance.
XMG
X Multi-Graph. A graphics display program written at CSTR, Edinburgh University, UK.
http://www.ed.ac.uk/
XWAVES
Not an acronym. A graphics display program from Entropic Research Laboratory, Inc.
http://www.entropic.com


Go to the first, previous, next, last section, table of contents.