One of the uses of CHATR is to take high-level, linguistically labeled output from a language generation module and produce natural sounding speech. We can expect to receive information about syntax, semantic relations, focus and speech act type, albeit at a high level. However, we cannot expect a language generation module to know about F0 or durations, so those must be generated within CHATR.
This type of input takes the form of a tree labeled with several feature structures, a common method used to represent various linguistic properties in natural language processing systems. The basic syntax is
feature-structure: (feature-pair feature-pair ...) feature-pair: (feature-name feature-value) feature-name: atom feature-value: atom | feature-structure
This facilitates graphical representation of tree structures from the simple to complex.
The HLP input is itself such a tree structure, taking the form
input: (feature-structure daughter daughter ...) daughter: feature-structure
The structure of the HLP input may contain any features, and represent any structure. It will probably contain syntactic information but should not necessarily be thought of as simply a syntactic tree.
The HLP module performs the following actions
In order to make the above possible, a number of reserved feature names are used. These do not all need be specified, but any that are will be used by this module in a pre-determined way. The reserved names are
LEX
NAccent
PhraseLevel
IFT
PitchRange
one
, two
, three
or four
.
These are directly related to the pitch range statistics -- see
section Intonation.
Two forms of rule may be specified within CHATR that affect the HLP process: default rules and pattern rules.
Default rules specify default features based on those existing either by default or after definition. They are applied to all features structures in the input. They have the following form
def-rule: ( pattern => action ) pattern: feature-structure action: feature-structure
These rules are not as elaborate as they should be, and more complex patterns are probably required. This will be addressed in later versions of CHATR.
The rules are defined by simply setting a particular CHATR variable. An example set of rules is
(set HLP_Rules '( ( ((Focus +)) => ((NAccent +)) ) ( ((Focus ++)) => ((NAccent ++)) ) ( ((Contrastive +)) => ((NAccent ++)) ) ( ((Focus -)) => ((NAccent -)) ) ( ((CAT S)) => ((PhraseLevel :S)) ) ))
These rules are applied to all categories in an input tree. If all features on the left hand side of a default rule exist in a feature category, the features on the right hand side are added if they do not already exist.
There are currently three types of accent assignment algorithms
available in CHATR. These strategies assign information
primarily about position of accents rather than type.
These rules add features from which later rules derive actual accent
types. The split is important as it means these prediction
algorithms are independent of the underlying intonation theory that
is to be used. Selection between these strategies is by setting the
Lisp variable HLP_prosodic_strategy
. It may take one of the
following values
Hirschberg
Monaghan
DiscTree
Each of these assigns new features to the tree structure which can then be realized as intonation accents by the tune pattern rules.
These three strategies are compared in detail in Black 95a.
After the above rules are applied, the input structure is mapped to a prosodic structure. It is important to realise that the input hierarchy structure is not (necessarily) the same as the prosodic structure. The dominant relations between nodes in the tree may be different in the prosodic tree from those in the original tree. All terminal nodes will, of course, remain terminal nodes.
Two methods for predicting prosodic phrase breaks are included. The first is an automatic prosodic mapping from syntactic structure to prosodic structure. The mapping is based on syntactic information, grammatic function and constituent length as described in Bachenko 86). The second method is based on a CART decision tree inspired by the work of Hirschberg 94.
Selection between the two methods is made by setting the variable
HLP_phrase_strategy
. Three values are in fact possible
Bachenko_Fitzpatrick
DiscTree
HLP_phr_disc_tree
. It should return values 0, 1, 2, 3 or 4.
See section Decision Trees, for the format of the tree. An example is
shown in `$CHATR_ROOT/lib/data/tobi.ch'.
None
Currently the decision tree method seems most promising, especially for raw text-to-speech.
Once the prosodic tree has been generated the `tune' can be applied. The tune is specified in terms of intonation features--HLCB, Tilt, ToBI, or whatever. Each prosodic phrase has an Illocutionary Force Type (statement, question, etc.) feature. A set of pattern rules are defined in CHATR that relate IFT to tune. For each IFT, the intonation features for the start and tail of a prosodic phrase (marked `:S') can be specified, along with any features that appear in words in that phrase.
If HLP_realise_strategy
is set to Simple_Rules
, the
following rules apply, otherwise ToBI and JToBI effectively ignore
these predictions and do their own.
Pattern rules are of the following general form
rule: ( ift-type (START intonation-features) (<feature> choice-features) ... (TAIL intonation-features) ) ift-type: * | <ift-feature-value> intonation-features: /* empty */ | intonation-feature intonation-features intonation-feature: <valid intone features> choice-features: /* empty */ | choice-feature choice-features choice-features: (feature-value intonation-features )
Pattern rules are set by the variable HLP_Patterns
. The
<intonation-features>
may be any valid intonation
parameters--though the same intonation system must be followed
throughout a set of rules. The <choice-features>
enables
inclusion of different intonation features.
The following small example realizes accents and boundary tones using
the ToBI intonation system. (It is assumed
HLP_realise_strategy
has been explicitly set to
Simple_Rules
.)
(set HLP_Patterns '( (Statement (START ) (HAccent (+ (H*)) (++ (L+H*))) (PHRASE (H-)) (TAIL (L-L%))) (YNQuestion (START ) (HAccent (+ (L*))) (TAIL (H-H%))) (Question (START ) (HAccent (+ (L*))) (TAIL (L-L%))) (* (START ) (HAccent (+ (H*))) (PHRASE (H-)) (TAIL (H-L%))) ))
With sets of rules defined for defaults and patterns, we can now generate tune, prosody (and pitch range) from high level representations. An example input (assuming the above rules) is
(set utt1 (Utterance HLP (((CAT S) (IFT Statement)) (((CAT NP) (LEX you))) (((CAT VP)) (((CAT Aux) (LEX can))) (((CAT V) (LEX pay))) (((CAT PP)) (((CAT Prep) (LEX for))) (((CAT NP)) (((CAT Det) (LEX the))) (((CAT N) (LEX hotel) (Focus +)))) (((CAT PP)) (((CAT Prep) (LEX with))) (((CAT NP)) (((CAT Det) (LEX a))) (((CAT Adj) (LEX credit) )) (((CAT N) (LEX card))))))))))
This input is a simple syntactic tree with the speech act labeled in the IFT as a statement. The `Focus' feature is used from which the defaults add the `NAccent' feature.
The results of generation of low level intonation and prosody will
not be as good as would be achieved if individual words were labeled
and the prosodic phrasing explicitly stated (as in the
PhonoWord
input). Even more control over the intonation can
be achieved by using the RFC input method. That is, it is not
CHATR that restricts the control of intonation, but the HLP
module itself.
Go to the first, previous, next, last section, table of contents.