CHATR is a vast program containing a large number of parallel modules, any of which may or may not be selected by the user at any one time. For the newcomer to CHATR, it is difficult to establish where the start or beginning of the system is. With that in mind the following sections describe a possible route taken through the software, module by module, by several different inputs.
The following diagram shows each module referred to by software name
---------------- ----------- ---------- |PhonoWord Tagged| |Plain Text| |HLP Tagged| | Utterance | | Input | |Utterance | ---------------- ----------- ---------- | | | | | | | | | PhonoWord_input text_input hlp_input | | | | ----- ----- | | | | hlp_module | | ------------ --------- | | word_module | | phonology_module | | intone_module | | duration_module | | int_target_module | \|/ synthesis
By default CHATR will eventually run any utterance through the same five modules. Prior to this, the input will be applied to either a function or a function followed by a module, depending on the input type. See section Design Philosophy, for a definition of the difference between function and module.
The PhonoWord_input
function is in file
$CHATR_ROOT/src/input/pw_input.c
This function creates two streams
The PhonoWord_input
function calls the
build_phrase_tree
function. This in turn calls the
make_new_word
and build_sub_phrases
functions which
cycle as often as required. Each time the build_phrase_tree
function encounters a word in the given utterance, the
make_new_word
function is called . This adds the new word
(plus the intones and features if present) to the WordStream. Thus
for the PhonoWord input
(Utterance PhonoWord (:D () (:S () (:C () (you (H*)) (can) (pay)) (:C () (for) (the) (hotel (H*))) (:C () (with) (a) (credit) (card (H*) (L-L%)))))),
execution of the PhonoWord_input
function creates the
WordStream
(<you> -------- intones ------ ((H*)) <can> <pay> <for> <the> <hotel> ------ intones ------ ((H*)) <with> <a> <credit> <card>) ------ intones ------ ((H*)(L-L%))
The information contained in the intones field of each word cell is used later to build the IntoneStream.
The text_input
function is in file
$CHATR_ROOT/src/input/hlp_input.c
This function creates two streams
It also converts the text input into an HLP input.
The text_input
function calls two further functions:
hlp_build_sphrase
(cycling as often as required), which in
turn calls the function hlp_make_word
; and text_to_hlp
,
which in turn calls the functions text_read_sentence
and
text_build_phrased
. These functions are in file
$CHATR_ROOT/src/text/text.c
The text_to_hlp
function converts the text input into an HLP
input. First the input text is read, sentence by sentence, by the
text_read_sentence
function. The text_build_phrased
function then builds LEX word-cells for each sentence, and adds an
IFT type to the beginning. Thus for the text input
(Utterance Text "You can pay for the hotel with a credit card."),
execution of the text_to_hlp
function creates the HLP input
(Utterance HLP (((CAT D)) (((CAT S) (IFT Statement)) (((LEX You))) (((LEX can))) (((LEX pay))) (((LEX for))) (((LEX the))) (((LEX hotel))) (((LEX with))) (((LEX a))) (((LEX credit))) (((LEX card))))))
The WordStream is created by the function hlp_make_word
.
Since there is no other information provided by plain text input
besides words, the intones fields of each word are set to nil.
Similarly, the features fields are filled with only a LEX word-cell.
Thus for the same text input as above, execution of the
hlp_make_word
function creates the WordStream
---> intones ----> NIL <you> ----< ---> features ---> ((LEX you)) ---> intones ----> NIL <can> ----< ---> features ---> ((LEX can)) ---> intones ----> NIL <pay> ----< ---> features ---> ((LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((LEX the)) ---> intones ----> NIL <hotel> --< ---> features ---> ((LEX hotel)) ---> intones ----> NIL <with> ---< ---> features ---> ((LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((LEX credit)) ---> intones ----> NIL <card> ---< ---> features ---> ((LEX card))
Finally, the hlp_build_sphrase
function builds the
SphraseStream.
The hlp_input
function is in file
$CHATR_ROOT/src/input/hlp_input.c
This function creates two streams
The hlp_input
function calls the hlp_build_sphrase_
function which cycles as often as required. This in turn calls the
hlp_make_word
function. Since HLP input does not include
intonation information, the intones field of the WordStream is set to
nil. The features field is filled with all the syntactic and
prosodic information of the relevant word. Thus for the HLP input
(Utterance HLP (((CAT S) (IFT Statement)) (((CAT NP) (LEX you) (Focus ++))) (((CAT VP)) (((CAT Aux) (LEX can))) (((CAT Verb) (LEX pay))) (((CAT PP)) (((CAT Prep) (LEX for))) (((CAT NP)) (((CAT Det) (LEX the))) (((CAT Noun) (LEX hotel) (Focus ++))))) (((CAT PP)) (((CAT Prep) (LEX with))) (((CAT NP)) (((CAT Det) (LEX a))) (((CAT Adj) (LEX credit) (Focus ++))) (((CAT Noun) (LEX card)))))))),
execution of the hlp_input
function creates the WordStream
---> intones ----> NIL <you> ----< ---> features ---> ((CAT NP)(LEX you)(Focus ++)) ---> intones ----> NIL <can> ----< ---> features ---> ((CAT Aux)(LEX can)) ---> intones ----> NIL <pay> ----< ---> features ---> ((CAT Verb)(LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((CAT Prep)(LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((CAT Det)(LEX the)) ---> intones ----> NIL <hotel> --< ---> features ---> ((CAT Noun)(LEX hotel)(Focus ++)) ---> intones ----> NIL <with> ---< ---> features ---> ((CAT Prep)(LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((CAT Det)(LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((CAT Adj)(LEX credit)) ---> intones ----> NIL <card> ---< ---> features ---> ((CAT Noun)(LEX card)(Focus ++))
Finally, the hlp_build_sphrase
function builds the
SphraseStream.
The `hlp' module is in file
$CHATR_ROOT/src/input/hlp.c
hlp_module
calls five functions and one module in the
following order
hlp_apply_default_rules hlp_phr_module hlp_predict_pros_events hlp_rephrase add_boundaries hlp_realise_accents
Each of the above will now be described in detail.
The hlp_apply_default_rules
function calls the
hlp_traverse_add_defaults
function which further calls the
hlp_apply_rule
function. Both called functions cycle as often
as necessary.
The function hlp_apply_default_rules
sequences through the
HLP tree input (an HLP input can be seen as a tree) and
tries to apply the user defined rules. An example of such rules
(contained in the HLP_Rules variable) is
( ( ((Focus +)) => ((NAccent +)) ) ( ((Focus ++)) => ((NAccent ++)) ) ( ((Contrastive +)) => ((NAccent ++)) ) ( ((Focus -)) => ((NAccent -)) ) ( ((CAT S)) => ((PhraseLevel :S)) ) )
Every element contained in each features field is looked at. If an
element matches an expression on the left side of the HLP_Rules list,
the expression on the right is added to the features field by the
hlp_apply_rule
function.
For Text input, since there is no information in the features fields to which to apply rules, execution of this function will not change the WordStream.
For HLP input, the WordStream becomes
---> intones ----> NIL <you> ----< ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++)) ^^^^^^^^^^ ---> intones ----> NIL <can> ----< ---> features ---> ((CAT Aux)(LEX can)) ---> intones ----> NIL <pay> ----< ---> features ---> ((CAT Verb)(LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((CAT Prep)(LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((CAT Det)(LEX the)) ---> intones ----> NIL <hotel> --< ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++)) ^^^^^^^^^^ ---> intones ----> NIL <with> ---< ---> features ---> ((CAT Prep)(LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((CAT Det)(LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((CAT Adj)(LEX credit)) ---> intones ----> NIL <card> ---< ---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++)) ^^^^^^^^^^
hlp_phr_module
predicts phrasing using either the default
or user-selected method. The two presently available are
It will be assumed that the default DiscTree method is selected.
The module hlp_phr_module
calls the disc_tree_phrase
function which in turn calls the function dt_decide
.
The `break index' is a measure of how strongly a particular word is linked to the previous. The DiscTree phrasing prediction method takes each word and determines the break index. Possible values are 1, 2, 3 or 4. A break index of 1 indicates the two words are closely linked--such as the `the' and `hotel' in the example currently being used. A break index of 4 means that the words are very dissociated. These are usually (but not solely) the ending and beginning words of successive sentences.
The dt_decide
function returns the break index for each word.
It looks at the type of preceding and succeeding words and uses a
decision tree to determine a value. Currently only values 1 or 4 are
utilized. Thus 4 does not only indicate the end of a sentence, but
also marks pauses within sentences.
When a break index 4 is returned, the disc_tree_phrase
function adds `PhraseLevel :C' to the features field of the relevant
word.
For Text input, execution of this module changes the WordStream to
---> intones ----> NIL <you> ----< ---> features ---> ((LEX you)) ---> intones ----> NIL <can> ----< ---> features ---> ((LEX can)) ---> intones ----> NIL <pay> ----< ---> features ---> ((LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((PhraseLevel :C)(LEX for)) ^^^^^^^^^^^^^^ ---> intones ----> NIL <the> ----< ---> features ---> ((LEX the)) ---> intones ----> NIL <hotel> --< ---> features ---> ((LEX hotel)) ---> intones ----> NIL <with> ---< ---> features ---> ((PhraseLevel :C)(LEX with)) ^^^^^^^^^^^^^^ ---> intones ----> NIL <a> ------< ---> features ---> ((LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((LEX credit)) ---> intones ----> NIL <card> ---< ---> features ---> ((LEX card))
For HLP input, the WordStream becomes
---> intones ----> NIL <you> ----< ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++)) ---> intones ----> NIL <can> ----< ---> features ---> ((CAT Aux)(LEX can)) ---> intones ----> NIL <pay> ----< ---> features ---> ((CAT Verb)(LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((PhraseLevel :C)(CAT Prep)(LEX for)) ^^^^^^^^^^^^^^ ---> intones ----> NIL <the> ----< ---> features ---> ((CAT Det)(LEX the)) ---> intones ----> NIL <hotel> --< ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++)) ---> intones ----> NIL <with> ---< ---> features ---> ((PhraseLevel :C) (CAT Prep) (LEX with)) ^^^^^^^^^^^^^^ ---> intones ----> NIL <a> ------< ---> features ---> ((CAT Det)(LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((CAT Adj)(LEX credit)) ---> intones ----> NIL <card> ---< ---> features ---> ((NAccent ++)(CAT Noun)(LEX card)(Focus ++))
The hlp_predict_pros_events
function calls
hlp_phr_module
. This module decides which prosodic prediction
strategy to use and applies it. The three presently available are
It will be assumed that the default Hirschberg strategy is selected.
hlp_phr_module
causes hlp_predict_pros_events
to call
hlp_addacc_module
. This module is in file
$CHATR_ROOT/src/hlp/hlp_addacc.c
The module hlp_addacc_module
calls three functions;
hlp_mark_aux
, aa_complex_nominals
and
aa_assign_accents
. These functions perform three actions
Each time the hlp_mark_aux
function finds a verb, it is tested
to determine if it may actually be an auxiliary.(5) If this proves so, a `(CAT Aux)' is added to the
features field and the `(CAT Verb)' (if it exists) removed. In our
present example the auxiliary verb `can' has been correctly tagged
(this is tough enough already without adding problems for effect!),
so this function will not need to make any changes.
The aa_complex_nominals
function calls two further functions;
aa_cn_simple_assign
and aa_cn_assign
. Their purpose is
to assign the correct stress to complex nominals. A complex nominal
is a noun and adjective pair which forms a single concept, such as
`credit card'. For each word of a complex nominal, the
aa_cn_assign
function decides which one has to be stressed and
which one unstressed. The former have a `(CN Stress)' added to the
features field, and the latter a `(CN Unstress)'.
The aa_assign_accents
function calls aa_accent_assign
which calls the function aa_aaa
. This in turn calls two
further functions, hlp_closed_deaccented
and
hlp_closed_accented
. Influenced by pre-existing features and
those added since the start of processing, these functions decide the
type of accents required (`(HAccent +)', `(HAccent -)', `(HAccent
++)' or `(HAccent c)') and add them to the features fields. Should a
`HAccent' or `NAccent' already exist in a features field, none is
added. The IntoneStream will be built from these features later.
For Text input, execution of this module changes the WordStream to
---> intones ----> NIL <you> ----< ---> features ---> ((HAccent +)(LEX you)) ^^^^^^^^^ ---> intones ----> NIL <can> ----< ---> features ---> ((HAccent -)(LEX can)) ^^^^^^^^^ ---> intones ----> NIL <pay> ----< ---> features ---> ((HAccent +)(LEX pay)) ^^^^^^^^^ ---> intones ----> NIL <for> ----< ---> features ---> ((HAccent -)(PhraseLevel :C)(LEX for)) ^^^^^^^^^ ---> intones ----> NIL <the> ----< ---> features ---> ((HAccent -)(LEX the)) ^^^^^^^^^ ---> intones ----> NIL <hotel> --< ---> features ---> ((HAccent +)(LEX hotel)) ^^^^^^^^^ ---> intones ----> NIL <with> ---< ---> features ---> ((HAccent -)(PhraseLevel :C)(LEX with)) ^^^^^^^^^ ---> intones ----> NIL <a> ------< ---> features ---> ((HAccent -)(LEX a)) ^^^^^^^^^ ---> intones ----> NIL <credit> -< ---> features ---> ((HAccent +)(LEX credit)) ^^^^^^^^^ ---> intones ----> NIL <card> ---< ---> features ---> ((HAccent +)(LEX card)) ^^^^^^^^^
For HLP input, the WordStream becomes
---> intones ----> NIL <you> ----< ---> features ---> ((NAccent ++)(CAT NP)(LEX you)(Focus ++)) ---> intones ----> NIL <can> ----< ---> features ---> ((HAccent -)(CAT Aux)(LEX can)) ^^^^^^^^^ ---> intones ----> NIL <pay> ----< ---> features ---> ((HAccent +)(CAT Verb)(LEX pay)) ^^^^^^^^^ ---> intones ----> NIL <for> ----< ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep) ^^^^^^^^^ (LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((HAccent -)(CAT Det)(LEX the)) ^^^^^^^^^ ---> intones ----> NIL <hotel> --< ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++)) ---> intones ----> NIL <with> ---< ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep) ^^^^^^^^^ (LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((HAccent -)(CAT Det)(LEX a)) ^^^^^^^^^ ---> intones ----> NIL <credit> -< ---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit)) ^^^^^^^^^ ^^^^^^^^^^^ ---> intones ----> NIL <card> ---< ---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX card) ^^^^^^^^^ (Focus ++))
Comparing WordStreams, it can be seen that the one generated from HLP input contains far more accurate features than that from Text. This is a direct result of the superior information offered by HLP input.
The hlp_rephrase
function calls the hlp_phrase_flatten
function which in turn calls the hlp_remove_empty_phrase
function which then calls the hlp_rebuild_phrase
function.
The last three functions cycle as often as required.
The hlp_rephrase
function operates on the SphraseStream. (`S'
stands for `Syntax'.) Three tasks are performed. Referring to the
SphraseStream from the HLP input of the current example
(((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement)) (((NAccent ++) (CAT NP) (LEX you) (Focus ++))) (((CAT VP)) (((HAccent -) (CAT Aux) (LEX can))) (((HAccent +) (CAT V) (LEX pay))) (((CAT PP)) (((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX for))) (((CAT NP)) (((HAccent -) (CAT Det) (LEX the))) (((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++))))) (((CAT PP)) (((HAccent -) (PhraseLevel :C) (CAT Prep) (LEX with))) (((CAT NP)) (((HAccent -) (CAT Det) (LEX a))) (((HAccent -) (CN Unstress) (CAT Adj) (LEX credit))) (((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))))))),
The hlp_phrase_flatten
function deletes the HLP nodes
(viz. `(CAT NP)', `(CAT VP)' or `(CAT PP)') since they have served
their purpose and are no longer useful. If the HLP input is viewed
as a tree in which the leaves are words, this function puts the
leaves all at the same level. The `tree' becomes
(((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement)) ((NAccent ++) (CAT NP) (LEX you) (Focus ++)) ((HAccent -) (CAT Aux) (LEX can)) ((HAccent +) (CAT V) (LEX pay)) ((PhraseLevel :C)) ((HAccent -) (CAT Prep) (LEX for)) ((HAccent -) (CAT Det) (LEX the)) ((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++)) ((PhraseLevel :C)) ((HAccent -) (CAT Prep) (LEX with)) ((HAccent -) (CAT Det) (LEX a)) ((HAccent -) (CN Unstress) (CAT Adj) (LEX credit)) ((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))),
The hlp_remove_empty_phrase
function cleans the SphraseStream
by locating empty phrases and removing them. In the current example
there are none present, so nothing will change.
The hlp_rebuild_phrase
function rebuilds the SphraseStream
into a tree form by extracting the `PhraseLevel' features and making
nodes of them. For HLP input the SphraseStream becomes
((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement)) (((NAccent ++) (CAT NP) (LEX you) (Focus ++))) (((HAccent -) (CAT Aux) (LEX can))) (((HAccent +) (CAT V) (LEX pay))) (((PhraseLevel :C)) (((HAccent -) (CAT Prep) (LEX for))) (((HAccent -) (CAT Det) (LEX the))) (((NAccent ++) (CAT Noun) (LEX hotel) (Focus ++)))) (((PhraseLevel :C)) (((HAccent -) (CAT Prep) (LEX with))) (((HAccent -) (CAT Det) (LEX a))) (((HAccent -) (CN Unstress) (CAT Adj) (LEX credit))) (((CN Stress) (NAccent ++) (CAT Noun) (LEX card) (Focus ++))))))
For Text input (already having a flat HLP tree), the SphraseStream changes to
((((PitchRange two) (Start 0.0) (PhraseLevel :S) (CAT S) (IFT Statement)) (((HAccent +) (LEX you))) (((HAccent -) (LEX can))) (((HAccent +) (LEX pay))) (((PhraseLevel :C)) (((HAccent -) (LEX for))) (((HAccent -) (LEX the))) (((HAccent +) (LEX hotel)))) (((PhraseLevel :C)) (((HAccent -) (LEX with))) (((HAccent -) (LEX a))) (((HAccent +) (LEX credit))) (((HAccent +) (LEX card))))))
The add_boundaries
function is in file
$CHATR_ROOT/src/lex
This function calls two further functions, find_left_boundary
and find_right_boundary
.
The purpose of these functions is to locate and mark the left and
right boundaries between each word. Remember that speech will
eventually be formed by concatenation of phonemes to form words
and the spaces (silence) between them. So not just the
position of break is noted; a value is assigned which indicates the
unit space to be allocated later between those words. The figures
are based on the break indexes already determined by
hlp_phr_module
. These values are adjusted, however; a break
index of 1 becomes a boundary value of 0, and a break index of 4
becomes a boundary of 2. In case of conflict the highest value is
chosen. The left boundary of the first word and the right boundary
of the last are set to 4.
The boundary values for the WordStream of the present example are
4 you 0 0 can 0 0 pay 2 2 for 0 0 the 0 0 hotel 2 2 with 0 0 a 0 0 credit 0 0 card 4
Boundary values are kept in the left_boundary and right_boundary fields of each word.
The hlp_realise_accents
function calls the
hlp_apply_patterns
function which in turn calls the
hlp_apply_pattern
function. This function cycles as often as
necessary and calls the function hlp_apply_actions
which
cycles too. Finally the hlp_apply_actions
function calls
hlp_apply_simple_actions
.
The hlp_realise_accents
function applies the pattern rules
stored in the HLP_Patterns variable. These rules take the form
(Statement (START ) (HAccent (+ (H*)) (++ (L+H*))) (PHRASE (H-)) (TAIL (L-L%))) (YNQuestion (START ) (HAccent (+ (L*))) (TAIL (H-H%))) (Question (START ) (HAccent (+ (L*))) (TAIL (L-L%))) (* (START) (HAccent (+ (H*))) (PHRASE (H-)) (TAIL (H-L%)))
Some actions, like START, PHRASE or TAIL, are considered special
because they concern phrases. These are applied by the
hlp_apply_actions
function. Others, like HAccent, are said to
be simple because they concern words. They are applied by the
function hlp_apply_simple_actions
.
The current example is a `Statement' utterance type, so the part of the pattern rules which are going to be used is
(Statement (START ) (HAccent (+ (H*)) (++ (L+H*))) (PHRASE (H-)) (TAIL (L-L%))) )))
The hlp_realise_accents
function is the first to affect the
`intones' field of the WordStream. If a word has an `(HAccent +)'
feature, a `(H*)' intone will be added to it's intones field. If it
is the last word of a phrase, a `(H-)' intone will also be added.
For Text input, execution of this module changes the WordStream to
---> intones ----> ((H*)) <you> ----< ^^ ---> features ---> ((HAccent +) (LEX you)) ---> intones ----> NIL <can> ----< ---> features ---> ((HAccent -) (LEX can)) ---> intones ----> ((H*) (H-)) <pay> ----< ^^ ^^ ---> features ---> ((HAccent +) (LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((HAccent -) (PhraseLevel :C) (LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((HAccent -) (LEX the)) ---> intones ----> ((H*) (H-)) <hotel> --< ^^ ^^ ---> features ---> ((HAccent +) (LEX hotel)) ---> intones ----> NIL <with> ---< ---> features ---> ((HAccent -) (PhraseLevel :C) (LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((HAccent -) (LEX a)) ---> intones ----> ((H*)) <credit> -< ^^ ---> features ---> ((HAccent +) (LEX credit)) ---> intones ----> ((H*) (L-L%)) <card> ---< ^^ ^^^^ ---> features ---> ((HAccent +) (LEX card))
For HLP input, the WordStream becomes
---> intones ----> NIL <you> ----< ---> features ---> ((NAccent ++) (CAT NP) (LEX you) (Focus ++)) ---> intones ----> NIL <can> ----< ---> features ---> ((HAccent -) (CAT Aux) (LEX can)) ---> intones ----> ((H*) (H-)) <pay> ----< ^^ ^^ ---> features ---> ((HAccent +) (CAT Verb) (LEX pay)) ---> intones ----> NIL <for> ----< ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep) (LEX for)) ---> intones ----> NIL <the> ----< ---> features ---> ((HAccent -) (CAT Det) (LEX the)) ---> intones ----> ((H-)) <hotel> --< ^^ ---> features ---> ((NAccent ++)(CAT Noun)(LEX hotel)(Focus ++)) ---> intones ----> NIL <with> ---< ---> features ---> ((HAccent -)(PhraseLevel :C)(CAT Prep) (LEX with)) ---> intones ----> NIL <a> ------< ---> features ---> ((HAccent -) (CAT Det) (LEX a)) ---> intones ----> NIL <credit> -< ---> features ---> ((HAccent -)(CN Unstress)(CAT Adj)(LEX credit)) ---> intones ----> ((L-L%)) <card> ---< ^^^^ ---> features ---> ((CN stress)(NAccent ++)(CAT Noun)(LEX card) (Focus ++))
The `word' module is in file
$CHATR_ROOT/src/lex/word.c
word_module
calls two functions, add_boundaries
and
add_intonation
, and two modules, lexicon_module
and
reduce_module
.
The add_intonation
function is in file
$CHATR_ROOT/src/intonation/intonation.c
The `reduce' module is in file
$CHATR_ROOT/src/lex/reduce.c
This module creates three streams
Appropriate cells of each of these streams are linked to those of the WordStream.
For text or HLP input, the add_boundaries
function has already
been called once by hlp_module
. See section The add_boundaries Function, for a description.
lexicon_module
calls three functions, lex_lookup
,
add_syllables
and add_phonemes
. This module consults
the lexicon for each word of the WordStream. The lexicon contains all
the words CHATR can utter, with their decomposition into
syllables and phonemes. The SylStream and PhoneStreams are built
from this information. The SylStream associated with the current
example is
(<y uu> --------- lex_stress ------ (0) <k @ n> -------- lex_stress ------ (0) <p ei> --------- lex_stress ------ (1) <f @> ---------- lex_stress ------ (0) <dh @> --------- lex_stress ------ (0) <h ou> --------- lex_stress ------ (0) <t e l> -------- lex_stress ------ (1) <w i th> ------- lex_stress ------ (0) < @ > ---------- lex_stress ------ (0) <k r e> -------- lex_stress ------ (1) <d i t> -------- lex_stress ------ (0) <k aa d>) ------ lex_stress ------ (1) [(0) represents `unstressed', a (1) `stressed']
The add_intonation
function calls the function
make_intonation_cell
. These functions build the IntoneStream
from information in the intones field of the WordStream. Depending
on the input type, the cells of the intone fields have been filled in
different ways. This results in three quite different IntoneStreams.
For text input, the IntoneStream will be
(<you> <=====================> (<H*> <can> ====================> <H*> <pay> < ====================> <H-> <for> <the> ====================> <H*> <hotel> < ====================> <H-> <with> <a> <credit> <=====================> <H*> ====================> <H*> <card>) < ====================> <L-L%>)
and for PhonoWord input
(<you> <=====================> (<H*> <can> <pay> <for> <the> <hotel> <=====================> <H*> <with> <a> <credit> ====================> <H*> <card>) < ====================> <L-L%>)
finally, for HLP input
(<you> ( <can> ====================> <H*> <pay> < ====================> <H-> <for> <the> <hotel> <=====================> <H-> <with> <a> <credit> <card>) <=====================> <L-L%>)
The HLP input IntoneStream may appear rather sparse, especially when considering the length of code that produced it compared to the size of other modules. This is in fact due to the HLP_Patterns variable. In the one used for this example, there were no mappings for (NAccent ++) or (HAccent -) features. Had those been added, making the `statement' part of the variable
(Statement (START ) (HAccent (+ (H*)) (++ (L+H*)) (- (L*))) (NAccent (++ (H+!H*))) (PHRASE (H-)) (TAIL (L-L%)))
the text input IntoneStream would have been
(<you> <=====================> (<H*> <can> <=====================> <L*> ====================> <H*> <pay> < ====================> <H-> <for> <=====================> <L*> <the> <=====================> <L*> ====================> <H*> <hotel> < ====================> <H-> <with> <=====================> <L*> <a> <=====================> <L*> <credit> <=====================> <H*> ====================> <H*> <card>) < ====================> <L-L%>)
and for HLP input
(<you> <=====================> (<H+!H*> <can> <=====================> <L*> ====================> <H*> <pay> < ====================> <H-> <for> <=====================> <L*> <the> <=====================> <L*> <hotel> <=====================> <H-> <with> <=====================> <L*> <a> <=====================> <L*> <credit> <=====================> <L*> <card>) <=====================> <L-L%>)
So for little improvement in high intonation, the modified variable
has introduced a lot of low intonation `clutter'. With regards to
HLP input, although `hotel' and `card' are tagged (Focus ++) like
`you', there is no (H+!H*) Intone cell aligned with these words.
This is because CHATR only accepts one (Focus ++) marked word
for any one sentence; the following occurrences are ignored. This
part of the code could of course easily be changed in the
hlp_realise_accent
function. For text input, the IntoneStream
is further affected; too many (H*) intones have been added (usually
on every noun, pronoun and verb). Important words are therefore
hidden among not so important ones.
With HLP Input, it is not arduous for the user to mark the important words (if actually found necessary) by adding a (Focus ++) label. If (Focus ++) matches with a (H+!H*) accent, these words will sound different.
For PhonoWord Input, the user is expected to supply all the accents. CHATR will not add new ones. While it may be quick to add accents like (H*) or (H+!H*) to important words (usually not of a great number), it rapidly becomes tedious to add accents like (L*) to unimportant words (usually numerous).
As far as the IntoneStream is concerned, HLP input has to be the best method. It automatically finds the accents for each word, and gives the user the capability to make changes--indicate words that need to be focused on, for instance. The only drawback being that an HLP input is quite long to write.
reduce_module
calls two functions, contract_word
and
reduce_syls
.
This module detects and performs contractions. For the grammatically challenged, this means it turns "would have" into "would've" or "he is" into "he's". A word must satisfy several criteria to be contracted: it must be in a list of contractable words (contained in the `contract_words' variable), such as have, has, are, am, would, etc.; both left and right boundaries must be zero; it must have no intone. If these criteria are met, the word will be removed and the phoneme it cross-references to in the `contract_words' variable added to the previous word. Contents of and links between streams are of course modified too.
The `phonology' module is in file
$CHATR_ROOT/src/phoneme/phonology.c
The module phonology_module
calls three functions;
fill_phoneme
, phone_to_segment
and phrase_pause
.
Depending on the pause prediction method selected, the last function
calls either pp_disctree
or insert_phrase_pause
. If
called, insert_phrase_pause
further calls insert_pause
.
The insert_phrase_pause
and insert_pause
functions are
in file
$CHATR_ROOT/src/intonation/phrase_int.c
This module affects two streams
The fill_phoneme
function fills the features of the
PhoneStream that was previously created by the `word' module.
See section The word Module, for creation information.
The phone_to_segment
function builds the SegStream.
The phrase_pause
function inserts silence segments where
needed, according to the chosen pause prediction method. If
pp_disctree
is selected, a silence segment is inserted after
every comma, colon, question mark or full stop (period), if
the following phrase contains at least one stressed syllable. If
insert_phrase_pause
is selected, a silence segment is inserted
at every phrase break (phrase_level :C).
For new or basic users, the insert_phrase_pause
method is
recommended as a start, since it utilizes work done by previous
modules.
The `intone' module is in file
$CHATR_ROOT/src/intonation/intonation.c
intone_module
calls the function tobi_intonation
. This
function is in file
$CHATR_ROOT/src/intonation/ToBI.c
This module fills the IntoneStream. The method currently used is the
ToBI intonation method (H*, L*, H-,...). For each syllable of
the SylStream, tobi_intonation
predicts pitch accents (H*,
L*,...), phrase accents (H-, L-,...) and boundaries tones
(L%,...).
However, for text or HLP input it is recommended that this module is either bypassed or at least run with pitch accent prediction switched off. The reason for this is that the HLP Module (see section The hlp Module) has already supplied sufficient intonation information; more is not necessary and possibly counterproductive.
For PhonoWord input this module is useful. It supplies further information about phrase accents and boundaries. Using the current example, the WordStream becomes
(<you> -------- intones ------ ((H*)) <can> <pay> -------- intones ------ ((H-)) <for> ^^^^ <the> <hotel> ------ intones ------ ((H-))((H*)) <with> ^^^^ <a> <credit> <card>) ------ intones ------ ((H*)(L-L%))
and the IntoneStream
(<you> <=====================> (<H*> <can> <pay> <=====================> <H-> <for> <the> ====================> <H*> <hotel> < ====================> <H-> <with> ^^^^ <a> <credit> ====================> <H*> <card>) < ====================> <L-L%>)
Note that other intone cells already present, such as the `<L-L%>' on `card' are further added by this module.
The `duration' module is in file
$CHATR_ROOT/src/duration/duration.c
duration_module
calls three functions, lr_dur
,
dur_mark_all_segs
and dur_mark_all_syls
. The function
lr_dur
calls two further functions, lrd_segs_durations
and lrd_pause_durations
. The last function calls one more,
pause_duration
.
The `duration' module determines the finite timing for the
constituents of an utterance. The lrd_segs_durations
function
determines the duration of the segments. The function
lrd_pause_durations
determines that of the pauses, in
accordance with the PhraseLevel type. The dur_mark_all_segs
function marks absolute starts for segments. The
dur_mark_all_syls
function does the same for syllables.
For the current example, the segments and their resulting absolute start timings (in mS) for each input method considered are
Segs PhonoWord Text HLP y 0 0 0 uu 66 64 61 k 118 117 89 @ 237 235 208 n 276 274 247 p 317 315 288 ei 419 423 396 # 538 565 538 f 638 665 638 @ 737 764 737 dh 790 817 790 @ 837 864 837 h 893 920 893 ou 985 1012 979 t 1111 1138 1083 e 1205 1232 1169 l 1333 1360 1278 # 1421 1448 1361 w 1521 1548 1461 i th @ 1664 1691 1604 k 1716 1743 1656 r e d 1926 1987 1866 i t k 2112 2198 2052 aa d [`#' indicates a pause]
Syllables and absolute start timings (again in mS) are
Syllables PhonoWord Text HLP (y uu) 0 0 0 (k @ n) 118 117 89 (p ei) 317 315 288 (f @) 638 665 638 (dh @) 790 817 790 (h ou) 893 920 893 (t e l) 1111 1138 1083 (w i dh) 1521 1548 1461 (@) 1664 1691 1604 (k r e) 1716 1743 1656 (d i t) 1926 1987 1866 (k aa d) 2112 2198 2052
Since prosody is different for each type of input, durations are correspondingly different.
The `int_target' module is in file
$CHATR_ROOT/src/intonation/intonation.c
The module int_target_module
calls the function
tobi_make_targets
which in turn calls
tobi_make_targets_lr
. This function calls three further
functions, lr_predict
, f0_normalize
and
tobi_add_target_seg
.
The tobi_make_targets
and tobi_make_targets_lr
functions are in file
$CHATR_ROOT/src/intonation/ToBI.c
The int_target_module
function determines the f0 value for
each syllable of the SylStream. The method presently used is linear
regression for ToBI intonation. Three f0 values (normalized) are
given for each syllable by the function make_targets_lr
. The
first is for the first segment of the syllable, the second for the
nucleus segment (the vowel) and the third for the last segment. The
f0 values for the current example are
PhonoWord Text HLP * (y uu) (syllable) 196.917068 192.78118 176.45158 (f0 before normalization) y 142.290955 140.17704 131.83081 (1st segment + f0 256.088196 252.19085 205.92985 normalized) uu 172.533966 170.54199 146.89747 (nucleus segment) 242.849289 239.84178 209.83558 uu 165.767410 164.23024 148.89373 (last segment) * (k @ n) 204.067780 201.49116 173.36627 k 145.945755 144.62881 130.25387 228.281631 228.18071 215.93901 @ 158.321716 158.27014 152.01327 213.495285 231.47927 225.05279 n 150.764252 159.95608 156.67143 * (p ei) 192.711884 209.04147 201.94967 p 140.141632 148.48786 144.86317 181.688629 227.94963 229.74664 ei 134.507523 158.15203 159.07051 161.305099 191.31132 196.99272 ei 124.089272 139.42578 142.32962 * (f @) 169.542007 199.16011 201.15170 f 128.299255 143.43739 144.45532 151.233627 170.03631 171.95320 @ 118.941635 128.55189 129.53163 151.086487 161.10058 164.70037 @ 118.866425 123.98474 125.82463 * (dh @) 151.363708 164.91391 171.62640 dh 119.008118 125.93377 129.36460 156.121429 165.93020 169.92852 @ 121.439842 126.45321 128.49679 173.493652 187.45507 172.47857 @ 130.318985 137.45481 129.80015 * (h ou) 171.729584 190.19567 176.44268 h 129.417343 138.85557 131.82626 210.256027 230.74173 184.58163 ou 149.108643 159.57910 135.98617 218.643097 226.80609 178.81588 ou 153.395355 157.56755 133.03923 * (t e l) 223.215790 227.31828 178.08351 t 155.732513 157.82934 179.66101 e 165.714127 163.17337 133.47117 204.412720 193.96093 163.17901 l 146.122055 140.78002 125.04705 * (w i dh) 202.834274 196.86778 163.32327 w 145.315292 142.26574 125.12078 165.008423 156.45224 141.01254 i 125.982086 121.60892 113.71752 159.509415 153.40560 150.44239 dh 123.171478 120.05175 118.53722 * (@) 162.552399 155.83990 151.50260 @ 124.726784 121.29595 119.07910 155.529129 151.53080 150.36933 @ 121.137108 119.09352 118.49988 157.041901 172.01840 155.39329 @ 121.910309 129.56495 121.06768 * (k r e) 163.318283 177.07127 163.75167 k 125.118233 132.14753 125.33974 137.365234 183.52533 139.22203 e 111.853340 135.44627 112.80236 137.160309 185.15049 137.73902 e 111.748604 136.27691 112.04439 * (d i t) 134.786163 186.64566 133.66029 d 110.535149 137.04110 109.95970 121.638107 184.22201 119.69841 i 103.815033 135.80236 102.82363 126.368195 167.65847 105.95538 t 106.232635 127.33655 95.799423 * (k aa d) 145.771805 190.14331 125.73971 k 116.150032 138.82879 105.91140 119.275711 136.78218 70.974113 aa 102.607590 111.55533 77.920105 52.435909 65.044205 20.000900 d 68.445023 74.889259 51.867126
As with durations, f0 values are different for each type of input, since they have different prosodies.
Now that values of duration and f0 have been determined, f0 contours can be built.
Go to the first, previous, next, last section, table of contents.