Although audio output is the main purpose of a speech synthesis system, seeing a graphical representation of an utterance can be useful during development or research. CHATR supports such a system. Rather than include a whole graphics functionality directly within CHATR, software has been written to interface with existing graphics packages.
Note that color-hungry applications, such as some Internet browsers, may leave too little resources for these graphics packages to run efficiently or at all. If this occurs the simplest solution is to exit those applications while using the CHATR graphics.
XWAVES is the name of a waveform display package developed by Entropics. It has its own excellent `help' facility, so only the method of invocation will be described here.
XWAVES is started from within CHATR using the command
(Display Open mode)
where `mode' is one of
XWAVES
XWAVES+
XWAVES2
Initially two windows will open, a `Signal Display' control window and a `Miscellaneous Controls' panel. Somewhat generalizing, the former is used to determine what gets displayed while the later how it is displayed. `Help' may be obtained by clicking on the `xwaves MANUAL' button in the `Signal Display' window. Note no waveform or stream display will take place yet.
Once an utterance has been defined and synthesized, it may be displayed using the command
(Display utt1)
Several windows will now open, displaying various waveforms and streams of the specified utterance. The number and type are dependent on the display `mode' selected, as described earlier.
If no argument is given with the `Display' command, the last synthesized utterance will be displayed. If no utterance exists, a non-fatal error message to that effect is returned. If an utterance has been changed but not re-synthesized, a `No currently generated wave' error message is returned.
To finish with XWAVES, click on the `QUIT!' button in the `Signal Display' window. Future versions of CHATR may support use of the `Display Close' command, a function not currently implemented.
A library file may be loaded before using XWAVES which sets up appropriate paths, etc. An example can be found at `$CHATR_ROOT/lib/data/xwaves.ch'. To use this facility, issue the command
(load_library "xwaves.ch")
This line may of course be added to your `.chatrrc' file.
By default the parts of an utterance which are displayed are determined by the choice of `Display Method'; However, an optional second argument to the `Display' command allows the user to select what is displayed. This argument may be a single atom, any one of
wave f0 segment word intone unit
or a list of these, for example
(Display utt1 (wave intone segment))
The `unit' argument will display marks of the selected units and the file name in the database where they were selected.
As the selected units will in general not be the same length as the targets specified in the Segment Stream, the Unit Stream and Segment Stream will not line up. If no signal processing is done on the waveform after unit selection (i.e. `DUMB' concatenation method) the Unit Stream will line up while the Segment Stream will not. If signal processing is done (e.g. `PS_PSOLA' or `NUUCEP'), the Segment Stream will (generally) line up, while the Unit Stream will not. Be aware that all durations are recommendations and not absolutes--boundaries may often not be exact.
XMG is a graphic display system developed in the Centre for Speech Technology Research at the University of Edinburgh. It is available free at time of writing - see section Glossary of Terms and Acronyms, for url. A `help' facility is included, so only the method of invocation will be described here.
XMG is started from within CHATR using the command
(Display Open XMG)
Two windows will now open, a `command' window and a display called `graph0'. Note that no waveform or stream display will yet take place.
Once an utterance has been defined and synthesized, it may be displayed using the command
(Display utt1)
Features displayed are: Wave, Target F0, Segment Stream, the Word Stream and elapsed time.
In server mode, XMG is started using the command xmg -
server
.
To send additional commands to the display server, the command
Display Command
is used. This command does not evaluate any
of its arguments. As an example, to start a new window issue the
command
(Display Command new)
An X-windows program called Inspect
is included within
CHATR, which enables users to examine the internal format of an
utterance. It is not graphical as such but should not be dismissed
lightly for that; for any utterance, all existing streams, their
cells and cell contents are shown textually. By this means much
information which would normally require searching of databases and
opening of many files can be instantly displayed at the click of a
window `button'.
With advanced versions of Inspect
, cell contents are displayed
superimposed on buttons. By merely clicking on these buttons, users
may change CHATR-selected units to an alternative in the
candidate list and have the utterance re-synthesized and played,
either phoneme-by-phoneme or in its entirety.
Inspect
has evolved through four versions during development.
For research and support purposes, earlier versions of CHATR
have several of the variations available. As a result, each are
invoked by slightly different commands. Generalizing, the latest
versions of CHATR use solely the latest version of
Inspect
that was available at time of release, started by the
most basic command. Hopefully this will become clearer in the course
of the rest of this chapter.
Features of each version and method of use will now be explained.
Inspect
version 1 is fairly basic and really only kept for
compatibility with previous versions of CHATR. It is not
available to CHATR version 0.92 or above.
With CHATR version 0.91 or below Inspect
is called using
the command
(Inspect utt1)
If no argument is given, the most recently synthesized utterance is displayed. If no synthesis has taken place, just the utterance will be displayed.
A command window named `xchatr' will now open containing `buttons' representing streams. Clicking on a particular button opens another window which displays that stream. If the stream is made up of a series of concatenated elements, each will be represented by another button which must be clicked to view details of that particular element.
A command line in the command window allows selection and loading of a different but already synthesized utterance.
To finish with Inspect
version 1, click on the `Quit'
button in the command window.
Inspect
version 2 is not available to CHATR version 0.92
or later.
With CHATR version 0.91 or below it is called using the command
(Inspect2 utt1)
If no argument is given, the most recently synthesized utterance is displayed.
A window named `Inspect2' will now open to display the WordStream, PhonemeStream and directory path of the database where units were selected from. All elements are superimposed on `buttons'. Clicking on an individual word gives a list of phonemes making up that word, plus details of chosen unit, index number, start/stop timings and selection/joint costs. Clicking on a phoneme gives a list of unit candidates with similar information for that particular phoneme. Clicking on the `wave file' or `start-end' buttons results in the playing of either that portion of the original corpus recording (i.e. the .wav file) or that particular phoneme respectively. The `listen' button at the top of the window (the one next to the `next' button) allows the playing of the utterance.
Perhaps the most powerful function of all is available when a list of unit candidates is being shown. By clicking on a candidate phoneme button, that unit may be selected from the candidate list. Clicking the `cat' button causes the utterance to be re-synthesized with that new unit in place. The two `listen' buttons may be clicked to hear new and original versions. This whole process may be repeated indefinitely. The results can be saved by clicking the `save' button.
To finish with Inspect
version 2, click on the `Quit'
button in the top left-hand corner of the `Inspect2' window.
Inspect
version 3 is only available to CHATR versions 0.91
and 0.92.
With CHATR version 0.91 it is called using the command
(Inspect3 utt1)
With CHATR version 0.92 it is called using the command
(Inspect utt1)
If no argument is given, the most recently synthesized utterance is displayed.
A window named `inspect.tcl' will now open which displays utterance phoneme, unit cost, joint cost, index, wave file directory path, start/length and F0, in vertical columns. A scroll-bar on the right allows viewing of the whole utterance. All elements of each stream are in the form of clickable buttons. Clicking on a phoneme button will cause another window to open which displays the unit candidate list, and other details as in the previous window. Clicking on the `wave file' or `start/length' buttons results in the playing of either that portion of the original corpus recording or that particular phoneme respectively.
To finish with Inspect
version 3, click on the `quit'
button in the top right-hand corner of the `inspect.tcl' window.
Inspect
version 4 is only available to CHATR versions 0.93
and above.
It is called using the command
(Inspect utt1)
If no argument is given, the most recently synthesized utterance is displayed.
A window named `ch inspect', containing three sub-windows, will now open. The middle window displays utterance phoneme, unit cost, joint cost, wave-file name, index, start time, duration, pitch, and target duration, in vertical columns. A scroll-bar on the right allows viewing of the whole utterance. The phoneme, wave file name, start time, and duration columns are in the form of clickable buttons. Clicking on a `phoneme' button will cause the unit candidate list with the same details as in the middle window to appear in the lower window. Clicking on a `wave-file' button results in that portion of the original corpus recording (i.e. the .wav file) being played. Clicking on a `duration' button results in the associated phoneme being played. Clicking on a `start-time' button results in the opening of another window. This displays the contents of the .wav-file in the form of elapsed time, color and phonemes. The currently selected phoneme is highlighted.
The most powerful function is initiated by clicking on a phoneme from the candidate list in the lower window. That phoneme will then replace the CHATR-selected one in the middle window. A number in the first column of the middle window indicates which was selected. By clicking on the `cat units & play' button in the first window, the chosen phoneme(s) may be used in the original utterance. The `replay' and `say' buttons may be clicked to hear new and original versions respectively. Finally, clicking the `Save modified unit labels' button in the first window enables the new version of the utterance to be saved for future use.
To finish with Inspect
version 4, click on the `quit'
button in the top right-hand corner of the `ch inspect' window.
Go to the first, previous, next, last section, table of contents.