CHATR is a vast program containing a large number of parallel modules, any of which may or may not be selected by the user at any one time. For the newcomer to CHATR, it is difficult to establish where the start or beginning of the system is. With that in mind the following sections describe a possible route taken through the software, module by module, by several different inputs.
The following diagram shows each module referred to by software name
---------------- ----------- ----------
|PhonoWord Tagged| |Plain Text| |HLP Tagged|
| Utterance | | Input | |Utterance |
---------------- ----------- ----------
| | |
| | |
| | |
PhonoWord_input text_input hlp_input
| | |
| ----- -----
| | |
| hlp_module
| |
------------ ---------
| |
word_module
|
|
phonology_module
|
|
intone_module
|
|
duration_module
|
|
int_target_module
|
\|/
synthesis
By default CHATR will eventually run any utterance through the same five modules. Prior to this, the input will be applied to either a function or a function followed by a module, depending on the input type. See section Design Philosophy, for a definition of the difference between function and module.
The PhonoWord_input function is in file
$CHATR_ROOT/src/input/pw_input.c
This function creates two streams
The PhonoWord_input function calls the
build_phrase_tree function. This in turn calls the
make_new_word and build_sub_phrases functions which
cycle as often as required. Each time the build_phrase_tree
function encounters a word in the given utterance, the
make_new_word function is called . This adds the new word
(plus the intones and features if present) to the WordStream. Thus
for the PhonoWord input
(Utterance
PhonoWord
(:D ()
(:S ()
(:C ()
(you (H*))
(can)
(pay))
(:C ()
(for)
(the)
(hotel (H*)))
(:C ()
(with)
(a)
(credit)
(card (H*) (L-L%)))))),
execution of the PhonoWord_input function creates the
WordStream
(<you> -------- intones ------ ((H*)) <can> <pay> <for> <the> <hotel> ------ intones ------ ((H*)) <with> <a> <credit> <card>) ------ intones ------ ((H*)(L-L%))
The information contained in the intones field of each word cell is used later to build the IntoneStream.
The text_input function is in file
$CHATR_ROOT/src/input/hlp_input.c
This function creates two streams
It also converts the text input into an HLP input.
The text_input function calls two further functions:
hlp_build_sphrase (cycling as often as required), which in
turn calls the function hlp_make_word; and text_to_hlp,
which in turn calls the functions text_read_sentence and
text_build_phrased. These functions are in file
$CHATR_ROOT/src/text/text.c
The text_to_hlp function converts the text input into an HLP
input. First the input text is read, sentence by sentence, by the
text_read_sentence function. The text_build_phrased
function then builds LEX word-cells for each sentence, and adds an
IFT type to the beginning. Thus for the text input
(Utterance Text
"You can pay for the hotel with a credit card."),
execution of the text_to_hlp function creates the HLP input
(Utterance HLP
(((CAT D))
(((CAT S) (IFT Statement))
(((LEX You)))
(((LEX can)))
(((LEX pay)))
(((LEX for)))
(((LEX the)))
(((LEX hotel)))
(((LEX with)))
(((LEX a)))
(((LEX credit)))
(((LEX card))))))
The WordStream is created by the function hlp_make_word.
Since there is no other information provided by plain text input
besides words, the intones fields of each word are set to nil.
Similarly, the features fields are filled with only a LEX word-cell.
Thus for the same text input as above, execution of the
hlp_make_word function creates the WordStream
---> intones ----> NIL
<you> ----<
---> features ---> ((LEX you))
---> intones ----> NIL
<can> ----<
---> features ---> ((LEX can))
---> intones ----> NIL
<pay> ----<
---> features ---> ((LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((LEX the))
---> intones ----> NIL
<hotel> --<
---> features ---> ((LEX hotel))
---> intones ----> NIL
<with> ---<
---> features ---> ((LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((LEX credit))
---> intones ----> NIL
<card> ---<
---> features ---> ((LEX card))
Finally, the hlp_build_sphrase function builds the
SphraseStream.
The hlp_input function is in file
$CHATR_ROOT/src/input/hlp_input.c
This function creates two streams
The hlp_input function calls the hlp_build_sphrase_
function which cycles as often as required. This in turn calls the
hlp_make_word function. Since HLP input does not include
intonation information, the intones field of the WordStream is set to
nil. The features field is filled with all the syntactic and
prosodic information of the relevant word. Thus for the HLP input
(Utterance HLP
(((CAT S) (IFT Statement))
(((CAT NP) (LEX you) (Focus ++)))
(((CAT VP))
(((CAT Aux) (LEX can)))
(((CAT Verb) (LEX pay)))
(((CAT PP))
(((CAT Prep) (LEX for)))
(((CAT NP))
(((CAT Det) (LEX the)))
(((CAT Noun) (LEX hotel) (Focus ++)))))
(((CAT PP))
(((CAT Prep) (LEX with)))
(((CAT NP))
(((CAT Det) (LEX a)))
(((CAT Adj) (LEX credit) (Focus ++)))
(((CAT Noun) (LEX card)))))))),
execution of the hlp_input function creates the WordStream
---> intones ----> NIL
<you> ----<
---> features ---> ((CAT NP)(LEX you)(Focus ++))
---> intones ----> NIL
<can> ----<
---> features ---> ((CAT Aux)(LEX can))
---> intones ----> NIL
<pay> ----<
---> features ---> ((CAT Verb)(LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((CAT Prep)(LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((CAT Det)(LEX the))
---> intones ----> NIL
<hotel> --<
---> features ---> ((CAT Noun)(LEX hotel)(Focus ++))
---> intones ----> NIL
<with> ---<
---> features ---> ((CAT Prep)(LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((CAT Det)(LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((CAT Adj)(LEX credit))
---> intones ----> NIL
<card> ---<
---> features ---> ((CAT Noun)(LEX card)(Focus ++))
Finally, the hlp_build_sphrase function builds the
SphraseStream.
The `hlp' module is in file
$CHATR_ROOT/src/input/hlp.c
hlp_module calls five functions and one module in the
following order
hlp_apply_default_rules
hlp_phr_module
hlp_predict_pros_events
hlp_rephrase
add_boundaries
hlp_realise_accents
Each of the above will now be described in detail.
The hlp_apply_default_rules function calls the
hlp_traverse_add_defaults function which further calls the
hlp_apply_rule function. Both called functions cycle as often
as necessary.
The function hlp_apply_default_rules sequences through the
HLP tree input (an HLP input can be seen as a tree) and
tries to apply the user defined rules. An example of such rules
(contained in the HLP_Rules variable) is
( ( ((Focus +)) => ((NAccent +)) )
( ((Focus ++)) => ((NAccent ++)) )
( ((Contrastive +)) => ((NAccent ++)) )
( ((Focus -)) => ((NAccent -)) )
( ((CAT S)) => ((PhraseLevel :S)) ) )
Every element contained in each features field is looked at. If an
element matches an expression on the left side of the HLP_Rules list,
the expression on the right is added to the features field by the
hlp_apply_rule function.
For Text input, since there is no information in the features fields to which to apply rules, execution of this function will not change the WordStream.
For HLP input, the WordStream becomes
---> intones ----> NIL
<you> ----<
---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))
^^^^^^^^^^
---> intones ----> NIL
<can> ----<
---> features ---> ((CAT Aux)(LEX can))
---> intones ----> NIL
<pay> ----<
---> features ---> ((CAT Verb)(LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((CAT Prep)(LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((CAT Det)(LEX the))
---> intones ----> NIL
<hotel> --<
---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))
^^^^^^^^^^
---> intones ----> NIL
<with> ---<
---> features ---> ((CAT Prep)(LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((CAT Det)(LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((CAT Adj)(LEX credit))
---> intones ----> NIL
<card> ---<
---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++))
^^^^^^^^^^
hlp_phr_module predicts phrasing using either the default
or user-selected method. The two presently available are
It will be assumed that the default DiscTree method is selected.
The module hlp_phr_module calls the disc_tree_phrase
function which in turn calls the function dt_decide.
The `break index' is a measure of how strongly a particular word is linked to the previous. The DiscTree phrasing prediction method takes each word and determines the break index. Possible values are 1, 2, 3 or 4. A break index of 1 indicates the two words are closely linked--such as the `the' and `hotel' in the example currently being used. A break index of 4 means that the words are very dissociated. These are usually (but not solely) the ending and beginning words of successive sentences.
The dt_decide function returns the break index for each word.
It looks at the type of preceding and succeeding words and uses a
decision tree to determine a value. Currently only values 1 or 4 are
utilized. Thus 4 does not only indicate the end of a sentence, but
also marks pauses within sentences.
When a break index 4 is returned, the disc_tree_phrase
function adds `PhraseLevel :C' to the features field of the relevant
word.
For Text input, execution of this module changes the WordStream to
---> intones ----> NIL
<you> ----<
---> features ---> ((LEX you))
---> intones ----> NIL
<can> ----<
---> features ---> ((LEX can))
---> intones ----> NIL
<pay> ----<
---> features ---> ((LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((PhraseLevel :C)(LEX for))
^^^^^^^^^^^^^^
---> intones ----> NIL
<the> ----<
---> features ---> ((LEX the))
---> intones ----> NIL
<hotel> --<
---> features ---> ((LEX hotel))
---> intones ----> NIL
<with> ---<
---> features ---> ((PhraseLevel :C)(LEX with))
^^^^^^^^^^^^^^
---> intones ----> NIL
<a> ------<
---> features ---> ((LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((LEX credit))
---> intones ----> NIL
<card> ---<
---> features ---> ((LEX card))
For HLP input, the WordStream becomes
---> intones ----> NIL
<you> ----<
---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))
---> intones ----> NIL
<can> ----<
---> features ---> ((CAT Aux)(LEX can))
---> intones ----> NIL
<pay> ----<
---> features ---> ((CAT Verb)(LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((PhraseLevel :C)(CAT Prep)(LEX for))
^^^^^^^^^^^^^^
---> intones ----> NIL
<the> ----<
---> features ---> ((CAT Det)(LEX the))
---> intones ----> NIL
<hotel> --<
---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))
---> intones ----> NIL
<with> ---<
---> features ---> ((PhraseLevel :C) (CAT Prep) (LEX with))
^^^^^^^^^^^^^^
---> intones ----> NIL
<a> ------<
---> features ---> ((CAT Det)(LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((CAT Adj)(LEX credit))
---> intones ----> NIL
<card> ---<
---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++))
The hlp_predict_pros_events function calls
hlp_phr_module. This module decides which prosodic prediction
strategy to use and applies it. The three presently available are
It will be assumed that the default Hirschberg strategy is selected.
hlp_phr_module causes hlp_predict_pros_events to call
hlp_addacc_module. This module is in file
$CHATR_ROOT/src/hlp/hlp_addacc.c
The module hlp_addacc_module calls three functions;
hlp_mark_aux, aa_complex_nominals and
aa_assign_accents. These functions perform three actions
Each time the hlp_mark_aux function finds a verb, it is tested
to determine if it may actually be an auxiliary.(5) If this proves so, a `(CAT Aux)' is added to the
features field and the `(CAT Verb)' (if it exists) removed. In our
present example the auxiliary verb `can' has been correctly tagged
(this is tough enough already without adding problems for effect!),
so this function will not need to make any changes.
The aa_complex_nominals function calls two further functions;
aa_cn_simple_assign and aa_cn_assign. Their purpose is
to assign the correct stress to complex nominals. A complex nominal
is a noun and adjective pair which forms a single concept, such as
`credit card'. For each word of a complex nominal, the
aa_cn_assign function decides which one has to be stressed and
which one unstressed. The former have a `(CN Stress)' added to the
features field, and the latter a `(CN Unstress)'.
The aa_assign_accents function calls aa_accent_assign
which calls the function aa_aaa. This in turn calls two
further functions, hlp_closed_deaccented and
hlp_closed_accented. Influenced by pre-existing features and
those added since the start of processing, these functions decide the
type of accents required (`(HAccent +)', `(HAccent -)', `(HAccent
++)' or `(HAccent c)') and add them to the features fields. Should a
`HAccent' or `NAccent' already exist in a features field, none is
added. The IntoneStream will be built from these features later.
For Text input, execution of this module changes the WordStream to
---> intones ----> NIL
<you> ----<
---> features ---> ((HAccent +)(LEX you))
^^^^^^^^^
---> intones ----> NIL
<can> ----<
---> features ---> ((HAccent -)(LEX can))
^^^^^^^^^
---> intones ----> NIL
<pay> ----<
---> features ---> ((HAccent +)(LEX pay))
^^^^^^^^^
---> intones ----> NIL
<for> ----<
---> features ---> ((HAccent -)(PhraseLevel :C)(LEX for))
^^^^^^^^^
---> intones ----> NIL
<the> ----<
---> features ---> ((HAccent -)(LEX the))
^^^^^^^^^
---> intones ----> NIL
<hotel> --<
---> features ---> ((HAccent +)(LEX hotel))
^^^^^^^^^
---> intones ----> NIL
<with> ---<
---> features ---> ((HAccent -)(PhraseLevel :C)(LEX with))
^^^^^^^^^
---> intones ----> NIL
<a> ------<
---> features ---> ((HAccent -)(LEX a))
^^^^^^^^^
---> intones ----> NIL
<credit> -<
---> features ---> ((HAccent +)(LEX credit))
^^^^^^^^^
---> intones ----> NIL
<card> ---<
---> features ---> ((HAccent +)(LEX card))
^^^^^^^^^
For HLP input, the WordStream becomes
---> intones ----> NIL
<you> ----<
---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++))
---> intones ----> NIL
<can> ----<
---> features ---> ((HAccent -)(CAT Aux)(LEX can))
^^^^^^^^^
---> intones ----> NIL
<pay> ----<
---> features ---> ((HAccent +)(CAT Verb)(LEX pay))
^^^^^^^^^
---> intones ----> NIL
<for> ----<
---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
^^^^^^^^^ (LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((HAccent -)(CAT Det)(LEX the))
^^^^^^^^^
---> intones ----> NIL
<hotel> --<
---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))
---> intones ----> NIL
<with> ---<
---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
^^^^^^^^^ (LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((HAccent -)(CAT Det)(LEX a))
^^^^^^^^^
---> intones ----> NIL
<credit> -<
---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit))
^^^^^^^^^ ^^^^^^^^^^^
---> intones ----> NIL
<card> ---<
---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX card)
^^^^^^^^^ (Focus ++))
Comparing WordStreams, it can be seen that the one generated from HLP input contains far more accurate features than that from Text. This is a direct result of the superior information offered by HLP input.
The hlp_rephrase function calls the hlp_phrase_flatten
function which in turn calls the hlp_remove_empty_phrase
function which then calls the hlp_rebuild_phrase function.
The last three functions cycle as often as required.
The hlp_rephrase function operates on the SphraseStream. (`S'
stands for `Syntax'.) Three tasks are performed. Referring to the
SphraseStream from the HLP input of the current example
(((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
(((NAccent ++) (CAT NP) (LEX you) (Focus ++)))
(((CAT VP))
(((HAccent -) (CAT Aux) (LEX can)))
(((HAccent +) (CAT V) (LEX pay)))
(((CAT PP))
(((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX for)))
(((CAT NP))
(((HAccent -) (CAT Det) (LEX the)))
(((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++)))))
(((CAT PP))
(((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX with)))
(((CAT NP))
(((HAccent -) (CAT Det) (LEX a)))
(((HAccent -) (CN Unstress) (CAT Adj) (LEX credit)))
(((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))))))),
The hlp_phrase_flatten function deletes the HLP nodes
(viz. `(CAT NP)', `(CAT VP)' or `(CAT PP)') since they have served
their purpose and are no longer useful. If the HLP input is viewed
as a tree in which the leaves are words, this function puts the
leaves all at the same level. The `tree' becomes
(((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
((NAccent ++) (CAT NP) (LEX you) (Focus ++))
((HAccent -) (CAT Aux) (LEX can))
((HAccent +) (CAT V) (LEX pay))
((PhraseLevel :C))
((HAccent -) (CAT Prep) (LEX for))
((HAccent -) (CAT Det) (LEX the))
((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++))
((PhraseLevel :C))
((HAccent -) (CAT Prep) (LEX with))
((HAccent -) (CAT Det) (LEX a))
((HAccent -) (CN Unstress) (CAT Adj) (LEX credit))
((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))),
The hlp_remove_empty_phrase function cleans the SphraseStream
by locating empty phrases and removing them. In the current example
there are none present, so nothing will change.
The hlp_rebuild_phrase function rebuilds the SphraseStream
into a tree form by extracting the `PhraseLevel' features and making
nodes of them. For HLP input the SphraseStream becomes
((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
(((NAccent ++) (CAT NP) (LEX you) (Focus ++)))
(((HAccent -) (CAT Aux) (LEX can)))
(((HAccent +) (CAT V) (LEX pay)))
(((PhraseLevel :C))
(((HAccent -) (CAT Prep) (LEX for)))
(((HAccent -) (CAT Det) (LEX the)))
(((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++))))
(((PhraseLevel :C))
(((HAccent -) (CAT Prep) (LEX with)))
(((HAccent -) (CAT Det) (LEX a)))
(((HAccent -) (CN Unstress) (CAT Adj) (LEX credit)))
(((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))))))
For Text input (already having a flat HLP tree), the SphraseStream changes to
((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement))
(((HAccent +) (LEX you)))
(((HAccent -) (LEX can)))
(((HAccent +) (LEX pay)))
(((PhraseLevel :C))
(((HAccent -) (LEX for)))
(((HAccent -) (LEX the)))
(((HAccent +) (LEX hotel))))
(((PhraseLevel :C))
(((HAccent -) (LEX with)))
(((HAccent -) (LEX a)))
(((HAccent +) (LEX credit)))
(((HAccent +) (LEX card))))))
The add_boundaries function is in file
$CHATR_ROOT/src/lex
This function calls two further functions, find_left_boundary
and find_right_boundary.
The purpose of these functions is to locate and mark the left and
right boundaries between each word. Remember that speech will
eventually be formed by concatenation of phonemes to form words
and the spaces (silence) between them. So not just the
position of break is noted; a value is assigned which indicates the
unit space to be allocated later between those words. The figures
are based on the break indexes already determined by
hlp_phr_module. These values are adjusted, however; a break
index of 1 becomes a boundary value of 0, and a break index of 4
becomes a boundary of 2. In case of conflict the highest value is
chosen. The left boundary of the first word and the right boundary
of the last are set to 4.
The boundary values for the WordStream of the present example are
4 you 0
0 can 0
0 pay 2
2 for 0
0 the 0
0 hotel 2
2 with 0
0 a 0
0 credit 0
0 card 4
Boundary values are kept in the left_boundary and right_boundary fields of each word.
The hlp_realise_accents function calls the
hlp_apply_patterns function which in turn calls the
hlp_apply_pattern function. This function cycles as often as
necessary and calls the function hlp_apply_actions which
cycles too. Finally the hlp_apply_actions function calls
hlp_apply_simple_actions.
The hlp_realise_accents function applies the pattern rules
stored in the HLP_Patterns variable. These rules take the form
(Statement (START )
(HAccent (+ (H*))
(++ (L+H*)))
(PHRASE (H-))
(TAIL (L-L%)))
(YNQuestion (START )
(HAccent (+ (L*)))
(TAIL (H-H%)))
(Question (START )
(HAccent (+ (L*)))
(TAIL (L-L%)))
(* (START)
(HAccent (+ (H*)))
(PHRASE (H-))
(TAIL (H-L%)))
Some actions, like START, PHRASE or TAIL, are considered special
because they concern phrases. These are applied by the
hlp_apply_actions function. Others, like HAccent, are said to
be simple because they concern words. They are applied by the
function hlp_apply_simple_actions.
The current example is a `Statement' utterance type, so the part of the pattern rules which are going to be used is
(Statement (START )
(HAccent (+ (H*))
(++ (L+H*)))
(PHRASE (H-))
(TAIL (L-L%)))
)))
The hlp_realise_accents function is the first to affect the
`intones' field of the WordStream. If a word has an `(HAccent +)'
feature, a `(H*)' intone will be added to it's intones field. If it
is the last word of a phrase, a `(H-)' intone will also be added.
For Text input, execution of this module changes the WordStream to
---> intones ----> ((H*))
<you> ----< ^^
---> features ---> ((HAccent +) (LEX you))
---> intones ----> NIL
<can> ----<
---> features ---> ((HAccent -) (LEX can))
---> intones ----> ((H*) (H-))
<pay> ----< ^^ ^^
---> features ---> ((HAccent +) (LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((HAccent -) (PhraseLevel :C) (LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((HAccent -) (LEX the))
---> intones ----> ((H*) (H-))
<hotel> --< ^^ ^^
---> features ---> ((HAccent +) (LEX hotel))
---> intones ----> NIL
<with> ---<
---> features ---> ((HAccent -) (PhraseLevel :C) (LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((HAccent -) (LEX a))
---> intones ----> ((H*))
<credit> -< ^^
---> features ---> ((HAccent +) (LEX credit))
---> intones ----> ((H*) (L-L%))
<card> ---< ^^ ^^^^
---> features ---> ((HAccent +) (LEX card))
For HLP input, the WordStream becomes
---> intones ----> NIL
<you> ----<
---> features ---> ((NAccent ++) (CAT NP) (LEX you) (Focus ++))
---> intones ----> NIL
<can> ----<
---> features ---> ((HAccent -) (CAT Aux) (LEX can))
---> intones ----> ((H*) (H-))
<pay> ----< ^^ ^^
---> features ---> ((HAccent +) (CAT Verb) (LEX pay))
---> intones ----> NIL
<for> ----<
---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
(LEX for))
---> intones ----> NIL
<the> ----<
---> features ---> ((HAccent -) (CAT Det) (LEX the))
---> intones ----> ((H-))
<hotel> --< ^^
---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++))
---> intones ----> NIL
<with> ---<
---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep)
(LEX with))
---> intones ----> NIL
<a> ------<
---> features ---> ((HAccent -) (CAT Det) (LEX a))
---> intones ----> NIL
<credit> -<
---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit))
---> intones ----> ((L-L%))
<card> ---< ^^^^
---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX
card)
(Focus ++))
The `word' module is in file
$CHATR_ROOT/src/lex/word.c
word_module calls two functions, add_boundaries and
add_intonation, and two modules, lexicon_module and
reduce_module.
The add_intonation function is in file
$CHATR_ROOT/src/intonation/intonation.c
The `reduce' module is in file
$CHATR_ROOT/src/lex/reduce.c
This module creates three streams
Appropriate cells of each of these streams are linked to those of the WordStream.
For text or HLP input, the add_boundaries function has already
been called once by hlp_module. See section The add_boundaries Function, for a description.
lexicon_module calls three functions, lex_lookup,
add_syllables and add_phonemes. This module consults
the lexicon for each word of the WordStream. The lexicon contains all
the words CHATR can utter, with their decomposition into
syllables and phonemes. The SylStream and PhoneStreams are built
from this information. The SylStream associated with the current
example is
(<y uu> --------- lex_stress ------ (0)
<k @ n> -------- lex_stress ------ (0)
<p ei> --------- lex_stress ------ (1)
<f @> ---------- lex_stress ------ (0)
<dh @> --------- lex_stress ------ (0)
<h ou> --------- lex_stress ------ (0)
<t e l> -------- lex_stress ------ (1)
<w i th> ------- lex_stress ------ (0)
< @ > ---------- lex_stress ------ (0)
<k r e> -------- lex_stress ------ (1)
<d i t> -------- lex_stress ------ (0)
<k aa d>) ------ lex_stress ------ (1)
[(0) represents `unstressed', a (1) `stressed']
The add_intonation function calls the function
make_intonation_cell. These functions build the IntoneStream
from information in the intones field of the WordStream. Depending
on the input type, the cells of the intone fields have been filled in
different ways. This results in three quite different IntoneStreams.
For text input, the IntoneStream will be
(<you> <=====================> (<H*>
<can>
====================> <H*>
<pay> <
====================> <H->
<for>
<the>
====================> <H*>
<hotel> <
====================> <H->
<with>
<a>
<credit> <=====================> <H*>
====================> <H*>
<card>) <
====================> <L-L%>)
and for PhonoWord input
(<you> <=====================> (<H*>
<can>
<pay>
<for>
<the>
<hotel> <=====================> <H*>
<with>
<a>
<credit>
====================> <H*>
<card>) <
====================> <L-L%>)
finally, for HLP input
(<you> (
<can>
====================> <H*>
<pay> <
====================> <H->
<for>
<the>
<hotel> <=====================> <H->
<with>
<a>
<credit>
<card>) <=====================> <L-L%>)
The HLP input IntoneStream may appear rather sparse, especially when considering the length of code that produced it compared to the size of other modules. This is in fact due to the HLP_Patterns variable. In the one used for this example, there were no mappings for (NAccent ++) or (HAccent -) features. Had those been added, making the `statement' part of the variable
(Statement (START )
(HAccent (+ (H*))
(++ (L+H*))
(- (L*)))
(NAccent (++ (H+!H*)))
(PHRASE (H-))
(TAIL (L-L%)))
the text input IntoneStream would have been
(<you> <=====================> (<H*>
<can> <=====================> <L*>
====================> <H*>
<pay> <
====================> <H->
<for> <=====================> <L*>
<the> <=====================> <L*>
====================> <H*>
<hotel> <
====================> <H->
<with> <=====================> <L*>
<a> <=====================> <L*>
<credit> <=====================> <H*>
====================> <H*>
<card>) <
====================> <L-L%>)
and for HLP input
(<you> <=====================> (<H+!H*>
<can> <=====================> <L*>
====================> <H*>
<pay> <
====================> <H->
<for> <=====================> <L*>
<the> <=====================> <L*>
<hotel> <=====================> <H->
<with> <=====================> <L*>
<a> <=====================> <L*>
<credit> <=====================> <L*>
<card>) <=====================> <L-L%>)
So for little improvement in high intonation, the modified variable
has introduced a lot of low intonation `clutter'. With regards to
HLP input, although `hotel' and `card' are tagged (Focus ++) like
`you', there is no (H+!H*) Intone cell aligned with these words.
This is because CHATR only accepts one (Focus ++) marked word
for any one sentence; the following occurrences are ignored. This
part of the code could of course easily be changed in the
hlp_realise_accent function. For text input, the IntoneStream
is further affected; too many (H*) intones have been added (usually
on every noun, pronoun and verb). Important words are therefore
hidden among not so important ones.
With HLP Input, it is not arduous for the user to mark the important words (if actually found necessary) by adding a (Focus ++) label. If (Focus ++) matches with a (H+!H*) accent, these words will sound different.
For PhonoWord Input, the user is expected to supply all the accents. CHATR will not add new ones. While it may be quick to add accents like (H*) or (H+!H*) to important words (usually not of a great number), it rapidly becomes tedious to add accents like (L*) to unimportant words (usually numerous).
As far as the IntoneStream is concerned, HLP input has to be the best method. It automatically finds the accents for each word, and gives the user the capability to make changes--indicate words that need to be focused on, for instance. The only drawback being that an HLP input is quite long to write.
reduce_module calls two functions, contract_word and
reduce_syls.
This module detects and performs contractions. For the grammatically challenged, this means it turns "would have" into "would've" or "he is" into "he's". A word must satisfy several criteria to be contracted: it must be in a list of contractable words (contained in the `contract_words' variable), such as have, has, are, am, would, etc.; both left and right boundaries must be zero; it must have no intone. If these criteria are met, the word will be removed and the phoneme it cross-references to in the `contract_words' variable added to the previous word. Contents of and links between streams are of course modified too.
The `phonology' module is in file
$CHATR_ROOT/src/phoneme/phonology.c
The module phonology_module calls three functions;
fill_phoneme, phone_to_segment and phrase_pause.
Depending on the pause prediction method selected, the last function
calls either pp_disctree or insert_phrase_pause. If
called, insert_phrase_pause further calls insert_pause.
The insert_phrase_pause and insert_pause functions are
in file
$CHATR_ROOT/src/intonation/phrase_int.c
This module affects two streams
The fill_phoneme function fills the features of the
PhoneStream that was previously created by the `word' module.
See section The word Module, for creation information.
The phone_to_segment function builds the SegStream.
The phrase_pause function inserts silence segments where
needed, according to the chosen pause prediction method. If
pp_disctree is selected, a silence segment is inserted after
every comma, colon, question mark or full stop (period), if
the following phrase contains at least one stressed syllable. If
insert_phrase_pause is selected, a silence segment is inserted
at every phrase break (phrase_level :C).
For new or basic users, the insert_phrase_pause method is
recommended as a start, since it utilizes work done by previous
modules.
The `intone' module is in file
$CHATR_ROOT/src/intonation/intonation.c
intone_module calls the function tobi_intonation. This
function is in file
$CHATR_ROOT/src/intonation/ToBI.c
This module fills the IntoneStream. The method currently used is the
ToBI intonation method (H*, L*, H-,...). For each syllable of
the SylStream, tobi_intonation predicts pitch accents (H*,
L*,...), phrase accents (H-, L-,...) and boundaries tones
(L%,...).
However, for text or HLP input it is recommended that this module is either bypassed or at least run with pitch accent prediction switched off. The reason for this is that the HLP Module (see section The hlp Module) has already supplied sufficient intonation information; more is not necessary and possibly counterproductive.
For PhonoWord input this module is useful. It supplies further information about phrase accents and boundaries. Using the current example, the WordStream becomes
(<you> -------- intones ------ ((H*))
<can>
<pay> -------- intones ------ ((H-))
<for> ^^^^
<the>
<hotel> ------ intones ------ ((H-))((H*))
<with> ^^^^
<a>
<credit>
<card>) ------ intones ------ ((H*)(L-L%))
and the IntoneStream
(<you> <=====================> (<H*>
<can>
<pay> <=====================> <H->
<for>
<the>
====================> <H*>
<hotel> <
====================> <H->
<with> ^^^^
<a>
<credit>
====================> <H*>
<card>) <
====================> <L-L%>)
Note that other intone cells already present, such as the `<L-L%>' on `card' are further added by this module.
The `duration' module is in file
$CHATR_ROOT/src/duration/duration.c
duration_module calls three functions, lr_dur,
dur_mark_all_segs and dur_mark_all_syls. The function
lr_dur calls two further functions, lrd_segs_durations
and lrd_pause_durations. The last function calls one more,
pause_duration.
The `duration' module determines the finite timing for the
constituents of an utterance. The lrd_segs_durations function
determines the duration of the segments. The function
lrd_pause_durations determines that of the pauses, in
accordance with the PhraseLevel type. The dur_mark_all_segs
function marks absolute starts for segments. The
dur_mark_all_syls function does the same for syllables.
For the current example, the segments and their resulting absolute start timings (in mS) for each input method considered are
Segs PhonoWord Text HLP
y 0 0 0
uu 66 64 61
k 118 117 89
@ 237 235 208
n 276 274 247
p 317 315 288
ei 419 423 396
# 538 565 538
f 638 665 638
@ 737 764 737
dh 790 817 790
@ 837 864 837
h 893 920 893
ou 985 1012 979
t 1111 1138 1083
e 1205 1232 1169
l 1333 1360 1278
# 1421 1448 1361
w 1521 1548 1461
i
th
@ 1664 1691 1604
k 1716 1743 1656
r
e
d 1926 1987 1866
i
t
k 2112 2198 2052
aa
d
[`#' indicates a pause]
Syllables and absolute start timings (again in mS) are
Syllables PhonoWord Text HLP
(y uu) 0 0 0
(k @ n) 118 117 89
(p ei) 317 315 288
(f @) 638 665 638
(dh @) 790 817 790
(h ou) 893 920 893
(t e l) 1111 1138 1083
(w i dh) 1521 1548 1461
(@) 1664 1691 1604
(k r e) 1716 1743 1656
(d i t) 1926 1987 1866
(k aa d) 2112 2198 2052
Since prosody is different for each type of input, durations are correspondingly different.
The `int_target' module is in file
$CHATR_ROOT/src/intonation/intonation.c
The module int_target_module calls the function
tobi_make_targets which in turn calls
tobi_make_targets_lr. This function calls three further
functions, lr_predict, f0_normalize and
tobi_add_target_seg.
The tobi_make_targets and tobi_make_targets_lr
functions are in file
$CHATR_ROOT/src/intonation/ToBI.c
The int_target_module function determines the f0 value for
each syllable of the SylStream. The method presently used is linear
regression for ToBI intonation. Three f0 values (normalized) are
given for each syllable by the function make_targets_lr. The
first is for the first segment of the syllable, the second for the
nucleus segment (the vowel) and the third for the last segment. The
f0 values for the current example are
PhonoWord Text HLP
* (y uu) (syllable)
196.917068 192.78118 176.45158 (f0 before normalization)
y 142.290955 140.17704 131.83081 (1st segment + f0
256.088196 252.19085 205.92985 normalized)
uu 172.533966 170.54199 146.89747 (nucleus segment)
242.849289 239.84178 209.83558
uu 165.767410 164.23024 148.89373 (last segment)
* (k @ n)
204.067780 201.49116 173.36627
k 145.945755 144.62881 130.25387
228.281631 228.18071 215.93901
@ 158.321716 158.27014 152.01327
213.495285 231.47927 225.05279
n 150.764252 159.95608 156.67143
* (p ei)
192.711884 209.04147 201.94967
p 140.141632 148.48786 144.86317
181.688629 227.94963 229.74664
ei 134.507523 158.15203 159.07051
161.305099 191.31132 196.99272
ei 124.089272 139.42578 142.32962
* (f @)
169.542007 199.16011 201.15170
f 128.299255 143.43739 144.45532
151.233627 170.03631 171.95320
@ 118.941635 128.55189 129.53163
151.086487 161.10058 164.70037
@ 118.866425 123.98474 125.82463
* (dh @)
151.363708 164.91391 171.62640
dh 119.008118 125.93377 129.36460
156.121429 165.93020 169.92852
@ 121.439842 126.45321 128.49679
173.493652 187.45507 172.47857
@ 130.318985 137.45481 129.80015
* (h ou)
171.729584 190.19567 176.44268
h 129.417343 138.85557 131.82626
210.256027 230.74173 184.58163
ou 149.108643 159.57910 135.98617
218.643097 226.80609 178.81588
ou 153.395355 157.56755 133.03923
* (t e l)
223.215790 227.31828 178.08351
t 155.732513 157.82934 179.66101
e 165.714127 163.17337 133.47117
204.412720 193.96093 163.17901
l 146.122055 140.78002 125.04705
* (w i dh)
202.834274 196.86778 163.32327
w 145.315292 142.26574 125.12078
165.008423 156.45224 141.01254
i 125.982086 121.60892 113.71752
159.509415 153.40560 150.44239
dh 123.171478 120.05175 118.53722
* (@)
162.552399 155.83990 151.50260
@ 124.726784 121.29595 119.07910
155.529129 151.53080 150.36933
@ 121.137108 119.09352 118.49988
157.041901 172.01840 155.39329
@ 121.910309 129.56495 121.06768
* (k r e)
163.318283 177.07127 163.75167
k 125.118233 132.14753 125.33974
137.365234 183.52533 139.22203
e 111.853340 135.44627 112.80236
137.160309 185.15049 137.73902
e 111.748604 136.27691 112.04439
* (d i t)
134.786163 186.64566 133.66029
d 110.535149 137.04110 109.95970
121.638107 184.22201 119.69841
i 103.815033 135.80236 102.82363
126.368195 167.65847 105.95538
t 106.232635 127.33655 95.799423
* (k aa d)
145.771805 190.14331 125.73971
k 116.150032 138.82879 105.91140
119.275711 136.78218 70.974113
aa 102.607590 111.55533 77.920105
52.435909 65.044205 20.000900
d 68.445023 74.889259 51.867126
As with durations, f0 values are different for each type of input, since they have different prosodies.
Now that values of duration and f0 have been determined, f0 contours can be built.
Go to the first, previous, next, last section, table of contents.