how to make a CHATR database

record some speech (about an hour is probably enough)

align the speech and the text (at the phoneme level)

extract the prosodic features (loudness, timing, and intonation)

build an index (pointers from features to waveform segments)

train some weights for selection (to help find the best segments)

we've done it about 75 times already!