Go to the first, previous, next, last section, table of contents.

HLP Processing

One of the uses of CHATR is to take high-level, linguistically labeled output from a language generation module and produce natural sounding speech. We can expect to receive information about syntax, semantic relations, focus and speech act type, albeit at a high level. However, we cannot expect a language generation module to know about F0 or durations, so those must be generated within CHATR.

This type of input takes the form of a tree labeled with several feature structures, a common method used to represent various linguistic properties in natural language processing systems. The basic syntax is

     feature-structure: (feature-pair feature-pair ...)
     feature-pair:      (feature-name feature-value)
     feature-name:      atom
     feature-value:     atom | feature-structure

This facilitates graphical representation of tree structures from the simple to complex.

The HLP input is itself such a tree structure, taking the form

     input:    (feature-structure daughter daughter ...)
     daughter: feature-structure

The structure of the HLP input may contain any features, and represent any structure. It will probably contain syntactic information but should not necessarily be thought of as simply a syntactic tree.

The HLP module performs the following actions

Apply default features to the given input.
Predict accent position.
Predict prosodic phrase position.
Map the input structure to a prosodic phrase structure.
Realise accent predictions as actual intonation labels.

In order to make the above possible, a number of reserved feature names are used. These do not all need be specified, but any that are will be used by this module in a pre-determined way. The reserved names are

LEX: The word to be synthesized.
NAccent: Identifies the nuclear accent of a phrase. Only used in some prediction algorithms.
PhraseLevel: Identifies a phrase level boundary. This is independent of the syntactic (or other) structure in the original input.
IFT: Illocutionary Force Type identifies the speech act of the sentence. It takes atomic values such as `Statement', `Question', `Interjection' etc. The exact values are not fixed. Each utterance should be specified with an IFT of the category dominating a (large) prosodic phrase. The IFT Pattern rules (see section Tune Pattern Rules) specifically use the value of this feature.
PitchRange: Takes values one, two, three or four. These are directly related to the pitch range statistics -- see section Intonation.

Two forms of rule may be specified within CHATR that affect the HLP process: default rules and pattern rules.

Feature Default Rules

Default rules specify default features based on those existing either by default or after definition. They are applied to all features structures in the input. They have the following form

     def-rule: ( pattern => action )
     pattern:  feature-structure
     action:   feature-structure

These rules are not as elaborate as they should be, and more complex patterns are probably required. This will be addressed in later versions of CHATR.

The rules are defined by simply setting a particular CHATR variable. An example set of rules is

     (set HLP_Rules
           '( ( ((Focus +)) => ((NAccent +)) )
              ( ((Focus ++)) => ((NAccent ++)) )
              ( ((Contrastive +)) => ((NAccent ++)) )
              ( ((Focus -)) => ((NAccent -)) )
              ( ((CAT S)) => ((PhraseLevel :S)) )
            ))

These rules are applied to all categories in an input tree. If all features on the left hand side of a default rule exist in a feature category, the features on the right hand side are added if they do not already exist.

Accent Assignment

There are currently three types of accent assignment algorithms available in CHATR. These strategies assign information primarily about position of accents rather than type. These rules add features from which later rules derive actual accent types. The split is important as it means these prediction algorithms are independent of the underlying intonation theory that is to be used. Selection between these strategies is by setting the Lisp variable HLP_prosodic_strategy. It may take one of the following values

Hirschberg: A heuristic-based algorithm based on Hirschberg's assignment algorithm. See
Hirschberg 92.
Monaghan: An assignment algorithm based on a theory of phrase structure. See Monaghan 91.
DiscTree: A decision tree method as specified in a separate decision tree.

Each of these assigns new features to the tree structure which can then be realized as intonation accents by the tune pattern rules.

These three strategies are compared in detail in Black 95a.

Prosodic phrases

After the above rules are applied, the input structure is mapped to a prosodic structure. It is important to realise that the input hierarchy structure is not (necessarily) the same as the prosodic structure. The dominant relations between nodes in the tree may be different in the prosodic tree from those in the original tree. All terminal nodes will, of course, remain terminal nodes.

Two methods for predicting prosodic phrase breaks are included. The first is an automatic prosodic mapping from syntactic structure to prosodic structure. The mapping is based on syntactic information, grammatic function and constituent length as described in Bachenko 86). The second method is based on a CART decision tree inspired by the work of Hirschberg 94.

Selection between the two methods is made by setting the variable HLP_phrase_strategy. Three values are in fact possible

Bachenko_Fitzpatrick: No further parameters are necessary.
DiscTree: The decision tree should be set in the Lisp variable HLP_phr_disc_tree. It should return values 0, 1, 2, 3 or 4. See section Decision Trees, for the format of the tree. An example is shown in `$CHATR_ROOT/lib/data/tobi.ch'.
None: No prediction is made.

Currently the decision tree method seems most promising, especially for raw text-to-speech.

Tune Pattern Rules

Once the prosodic tree has been generated the `tune' can be applied. The tune is specified in terms of intonation features--HLCB, Tilt, ToBI, or whatever. Each prosodic phrase has an Illocutionary Force Type (statement, question, etc.) feature. A set of pattern rules are defined in CHATR that relate IFT to tune. For each IFT, the intonation features for the start and tail of a prosodic phrase (marked `:S') can be specified, along with any features that appear in words in that phrase.

If HLP_realise_strategy is set to Simple_Rules, the following rules apply, otherwise ToBI and JToBI effectively ignore these predictions and do their own.

Pattern rules are of the following general form

     rule:                ( ift-type (START intonation-features)
                                     (<feature> choice-features)
                                      ...
                                     (TAIL intonation-features) )

     ift-type:            * | <ift-feature-value>
     intonation-features: /* empty */ |
                          intonation-feature intonation-features
     intonation-feature:  <valid intone features>
     choice-features:     /* empty */ |
                          choice-feature choice-features
     choice-features:     (feature-value intonation-features )

Pattern rules are set by the variable HLP_Patterns. The <intonation-features> may be any valid intonation parameters--though the same intonation system must be followed throughout a set of rules. The <choice-features> enables inclusion of different intonation features.

The following small example realizes accents and boundary tones using the ToBI intonation system. (It is assumed HLP_realise_strategy has been explicitly set to Simple_Rules.)

     (set HLP_Patterns
          '(
            (Statement (START ) 
                       (HAccent (+ (H*)) 
                                (++ (L+H*)))
                       (PHRASE (H-))
                       (TAIL (L-L%)))
            (YNQuestion (START ) 
                        (HAccent (+ (L*)))
                        (TAIL (H-H%)))
            (Question (START ) 
                      (HAccent (+ (L*)))
                      (TAIL (L-L%)))
            (* (START ) 
               (HAccent (+ (H*)))
               (PHRASE (H-))
               (TAIL (H-L%)))
           ))

With sets of rules defined for defaults and patterns, we can now generate tune, prosody (and pitch range) from high level representations. An example input (assuming the above rules) is

     (set utt1
          (Utterance HLP
           (((CAT S) (IFT Statement))
            (((CAT NP) (LEX you)))
            (((CAT VP))
             (((CAT Aux) (LEX can)))
             (((CAT V) (LEX pay)))
             (((CAT PP))
              (((CAT Prep) (LEX for)))
              (((CAT NP))
               (((CAT Det) (LEX the)))
               (((CAT N) (LEX hotel) (Focus +))))
             (((CAT PP))
              (((CAT Prep) (LEX with)))
              (((CAT NP))
               (((CAT Det) (LEX a)))
               (((CAT Adj) (LEX credit) ))
               (((CAT N) (LEX card))))))))))

This input is a simple syntactic tree with the speech act labeled in the IFT as a statement. The `Focus' feature is used from which the defaults add the `NAccent' feature.

The results of generation of low level intonation and prosody will not be as good as would be achieved if individual words were labeled and the prosodic phrasing explicitly stated (as in the PhonoWord input). Even more control over the intonation can be achieved by using the RFC input method. That is, it is not CHATR that restricts the control of intonation, but the HLP module itself.

Go to the first, previous, next, last section, table of contents.