CHATR (Generic Speech Synthesis System)

Page for the NYT



               text: "It sounds just like me"

the way we say the sentence depends on the small differences in meaning
    - in this example, the different focus (me vs just) can be seen
      from both the waveform and the fundamental frequency contour
      (the latter showing the rise and fall of the voice)


Example 1:

         focus on `me':

human speech: sound (16K aiff)
waveform and fundamental frequency (postscript file) synthesis sources: sound (16k aiff) Example 2: focus on `just' (early hump in the intonation contour): human speech: (click here (16K aiff))
waveform and fundamental frequency (postscript file) synthesised version: sound (16K aiff)
Here is the debug trace from chatr for that sentence: (it shows the waveform segments used to make the utterance, and the contexts of each sound) the sounds are written in computer-readable phonetic notation `ih' is the sound /i/ as in `it' `aw' is the sound /ou/ as in `house' the numbers are predicted and actual durations (in milliseconds) and the lines starting /dept2/workk22/etc are the actual waveform files showing the start time and duration of the waveform segment that we use. the waveform samples include more context than is used in the final synthesis chatr> (Save UnitLabels+ '-) ; Unit Stream plus ; (filename start duration num_units ; (Seg_name source_dur target_dur) ; ...) ; ... (Utterance Unit ( ("/dept2/work22/pi/data/chatr_dbs/nes/wav/US015.wav" 16370 87 2 ( ih 50 51) ;; #iht ; aet#ihtsax ( t 37 87) ;; ihts ; t#ihtsaxz ) - from the words "It's a ... " (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US065.wav" 31562 80 1 ( s 80 82) ;; tsay ; ehstsayd# ) - from the words "West Side" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US089.wav" 10916 106 1 ( aw 106 62) ;; sawth ; ax#sawthaxm ) - from the words "South America" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US031.wav" 31544 129 3 ( n 56 23) ;; awnd ; npawndzoh ( d 30 27) ;; ndz ; pawndzohv ( z 42 15) ;; dzoh ; awndzohvm ) - from the words "pounds of" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US020.wav" 14305 78 1 ( jh 78 82) ;; zjhah ; #ihzjhahst ) - from the words "it's just" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US125.wav" 5164 197 2 ( ah 126 123) ;; jhahs ; ohmjhahstih ( s 71 71) ;; ahst ; mjhahstihs ) - from the words "from just " (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US074.wav" 5114 125 2 ( t 76 60) ;; stl ; tihstlaen ( l 49 39) ;; tlae ; ihstlaend ) - from the words "latest land " (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US025.wav" 5612 60 1 ( ay 61 90) ;; layk ; ngzlaykdhae ) - from the words "things like that" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US046.wav" 37543 13 ( k 135 173) ;; aykl ; eysayklzaa ) - from the word "cycles" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US006.wav" 13510 70 1 ( m 70 75) ;; kmow ; laykmowst ) - from the words "like most" (postscript file)
("/dept2/work22/pi/data/chatr_dbs/nes/wav/US007.wav" 2513 327 1 ( iy 327 310) ;; miy# ; tuwmiy#ays ) - from the words "to me." (postscript file)
)) gives a sequence of small waveform sections:
(postscript file) (sound) and the final wave is produced by joining the parts of each:
(postscript file) (sound)
special thanks to Andy & Kris


(C) Copyright ATR Interpreting Telecommunications Research Labs 1997