Go to the first, previous, next, last section, table of contents.

Glossary of Terms and Acronyms

Programmers and engineers of all disciplines or nationalities love their TLAs; speech synthesis is no different. We hope to have covered all those used in this manual, and perhaps a few more. If you find any we missed (or got wrong!), please let us know for future versions...

ACL-DCI: Association for Computational Linguistics - Data Collection Initiative.
ACPA: Audio Capture and Playback Adapter.
ANN: Artificial Neural Network.
ASCII: American Symbolic Code for Information Interchange.
ASR: Automatic Speech Recognition.
ATR: Advanced Telecommunication Research.
http://www.atr.co.jp/
BEEP: British English Example Pronunciation.
btosps: Binary TO Signal Processing System.
car: Not an acronym. `Lisp' expression which refers to (and selects) the first item of a list held in a variable. See also `cdr'.
CAT: Not an acronym. CATegory of a word. HLP tags used to categorize words into Nouns, Verbs, Prepositions, etc. See NP, VP and PP.
cdr: Not an acronym. `Lisp' expression which refers to (and selects) the items of a list held in a variable, less the first item. See also `car'.
CELP: Code-book Excited Linear Prediction.
CEPLMA: CEpstral Resynthesis using a Logarithmic Moving Average filter.
CHATR: Collective Hacks from the Advanced Telecommunications Research laboratories. Well you did ask...
http://www.itl.atr.co.jp/chatr/
CMU: Carnegie Mellon University.
http://www.cs.cmu.edu/People/air/consortium/description.html
CSTR: Centre for Speech Technology Research. A department of Edinburgh University, UK.
http://www.cstr.ed.ac.uk/
CSLU: Center for Spoken Language Understanding. A department of Oregon Graduate Institute of Science and Technology, USA.
http://www.cse.ogi.edu/CSLU/
CVS: Concurrent Versions System. CVS is a front end to the RCS revision control system. It extends the notion of revision control from a collection of files in a single directory, to a hierarchical collection of directories consisting of revision controlled files. These directories and files can be combined together to form a software release. CVS provides the functions necessary to manage these software releases and to control the concurrent editing of source files among multiple software developers. CVS keeps a single copy of the master sources. This copy is called the source `repository'; it contains all the information to permit extracting previous software releases at any time based on either a symbolic revision tag, or a date in the past.
darpa: Defense Advanced Research Projects Administration. The central research and development organization for the Department of Defense (DoD), USA.
http://www.darpa.mil/
DTW: Dynamic Time Warping.
EGG: Electro-Glottal Graph. Device for measuring throat movement caused by speaking.
EMACS: Editor MACroS. A Macro-based editor and complete computing task environment.
ESPS: Entropic Signal Processing System.
FSF: Free Software Foundation.
http://www.gnu.ai.mit.edu/fsf/
HLCB: High Low Continuation Boundary. Tags used to mark intonation on syllables.
HLP: High Level Phrasing. Method of tagging speech with prosodic information.
HMM: Hidden Markov Model.
Holmes: John Holmes, one of the founders of speech synthesis.
HTK: Hidden (Markov model) Tool Kit. A product of Entropic Research Laboratory, Inc.
http://www.entropic.com
IFT: Illocutionary Force Type. Strength or emphasis put on a phrase. Speech act information - meaning you want to convey above and beyond just the words spoken. As an example, the English phrase `I understand' can mean `Thank you for informing me (I'm happy)' or `Now I know what you intend I'm not happy' or even `I heard what you said but haven't a clue what you mean' depending on how and when it's said. That's IFT at work. The simplest case is the difference between a question and a statement using the same words.
IntoneStream: Series of symbols representing the intonation required on an utterance. Attached to the WordStream.
IPA: International Phonetic Association. Representative organization for phoneticians.
http://www.arts.gla.ac.uk/IPA/ipa.html
JToBI: Japanese Tones and Break Indices.
jtts: Japanese Text-To-Speech.
LDC: Linguistic Data Consortium. A group established to broaden the collection and distribution of speech and natural language databases for the purposes of research and technology development in automatic speech recognition, natural language processing and other areas where large amounts of linguistic data are needed.
http://www.ri.cmu.edu/comp.speech/Section1/Data/ldc.html
LFG: Lexical Functional Grammar.
LISP: LISt Processing language. A programming language originally developed for Artificial Intelligence (AI) but now used mainly in the speech synthesis field.
LMA: Logarithmic Moving Average. Mathematical reference to a method used in audio filtering. See CEPLMA.
LPC: Linear Predictive Coding.
LTS: Letter To Sound.
LVQ: Learned Vector Quantization.
M-ACPA: Multimedia - Audio Capture Playback Adapter.
MARSEC: MAchine-Readable Spoken English speech Corpus.
MFCC: Mel Feature Cepstral Co-efficients.
mtts: Multi-lingual Text-To-Speech.
mrpa: Machine Readable Phonetic Alphabet.
Mu-law: Not an acronym. Pronounced `mew-LAW' - the `Mu' is actually the Greek letter `Mu'. An 8-Bit compression code for audio signals including speech. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data. It is a companding technique. That means it carries more information about the smaller signals than the larger. Sometimes appears in documents written as `ULAW'.
MULE: MUlti Language Editor. Extended part of EMACS.
NFS: Network File System. A distributed file system that provides transparent access to files residing on remote disks. Developed at Sun Microsystems in the early 1980's.
NIST: (American) National Institute STandards.
NLP: Natural Language Processing.
NN: Neural Network.
PN: Noun Phrase. HLP tag used to denote an input word as a Noun.
nus: Non-Uniform (unit) Selection.
nuuph: Not an acronym. The `nuu' is the Greek letter `Nuu'. Japanese phoneme set.
NUUCEP: Not an acronym. The `NUU' is the Greek letter `Nuu'. NUUtalk CEPstral synthesis routines.
OAPD: Oxford Acoustic Phonetic Database. Contains data on vowel-consonant and consonant-vowel combinations in both stressed and unstressed locations.
PhoneStream: Series of symbols representing the phonemes of an utterance. Attached to the WordStream.
PhonoWord: Type of input accepted by CHATR. Allows specification of prosodic phrases and intonation features. Utterance is tagged with four letters (D=Discourse, S=Sentence, C=Clause and P=Phrase) to specify phrase levels, and other letters (e.g. H and L) to indicate emphasis and accent.
PP: Preposition Phrase. HLP tag used to denote an input word as a Preposition.
PphraseStream: Series of symbols representing the prosodic phrases of an utterance. Attached to the WordStream.
PSOLA: Pitch Synchronous Over-Lap and Add. Algorithm to independently modify the fundamental frequency and duration of a speech signal. Used during concatenation of selected units from a finite speech database such that minimal prosodic damage occurs due to target/selected unit mismatch.
RCS: Revision Control System. A system that keeps track of different versions of files. If one person is editing a source no other developer may do so. Thus all sources are by default read-only. When a file is checked out by a developer, they may change it but no other developer may check it out at the same time. When a developer is finished, they may check in the file thus allowing others to check it out.
RFC: Rise Fall Continuation. A now become dated method of tagging phoneme-sized segments with duration and frequency values.
SegStream: Series of symbols representing the segments of an utterance. Attached to the WordStream.
SGML: Standard Generalized Markup Language.
SphraseStream: Series of symbols representing the syntactic phrases of an utterance. Attached to the WordStream.
Stream: One of a sequence of cells containing symbols generated and/or interpreted by CHATR and linked to an utterance (and other streams). Causes changes in the timing, intonation and prosody of the synthesized output.
SylStream: Series of symbols representing the syllables of an utterance. Attached to the WordStream.
TIMIT: A large speech corpus from TI and MIT.
TLA: Three (or sometimes more or less) Letter Acronym. Initials represent a well (or often un)-known title or description.
ToBI: Tones and Break Indices.
tts: Text-To-Speech.
ULAW: Not an acronym. Pronounced `mew-LAW' - the `U' is actually the Greek letter `Mu'. An 8-Bit compression code for audio signals including speech. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data. It is a companding technique. That means it carries more information about the smaller signals than the larger. Sometimes appears in documents written as `Mu-law'
utterance: A series of words you wish CHATR to synthesize as speech. Basically the input to CHATR, in whichever form it may take.
VP: Verb Phrase. HLP tag used to denote an input word as a Verb.
VQ: Vector Quantization.
WordStream: Series of words to be `spoken' by CHATR, derived from the utterance.
XMG: X Multi-Graph. A graphics display program written at CSTR, Edinburgh University, UK.
http://www.cstr.ed.ac.uk/
XWAVES: Not an acronym. A graphics display program from Entropic Research Laboratory, Inc.
http://www.entropic.com

Go to the first, previous, next, last section, table of contents.