TRAMES, 2007, 11(61/56), 2, 284298


Modelling speech temporal structure for Estonian text-to-speech synthesis: feature selection

(full text in pdf format)


Meelis Mihkla


Institute of the Estonian Language, Tallinn


Abstract. The article discusses the principles of selecting features for modelling the temporal structure of Estonian speech, using different types of read-out texts, with a view to text-to-speech synthesis (TTS). Feature selection is known to depend on certain general issues regulating speech temporal structure, as well as on some language specific aspects. The durational model of Estonian stands out for some foot-bound features (foot quantity degree, number of feet in the word) being included in the input. In addition to the traditional descriptors of sound context and hierarchical position the prediction of Estonian segmental durations requires information on some morphological, syntactic and lexical features of the word, such as word form, part of sentence, and part of speech. In the prediction of pauses in the speech flow the relevant features are: distance from sentence beginning and from the previous pause, the length and quantity degree of the preceding foot, and the occurrence of a punctuation mark or conjunction. Although expert opinions were used in feature selection, statistical methods should be applied to test the vector of optimal argument features.


Keywords: feature selection, speech timing, segmental durations, pauses, text-to-speech synthesis, feature significance, statistical modelling




