 |
 |
|
|
|
|
Title: |
US5230037:
Phonetic Hidden Markov model speech synthesizer
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Giustiniani, Massimo; Rome, Italy
Pierucci, Piero; Rome, Italy

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1993-07-20
/ 1991-06-07

|
Application Number: |
US1991000716022

|
IPC Code: |
Advanced:
G01L 3/00;
G10L 13/02;
G10L 13/08;
G10L 15/14;
G10L 19/00;
Core:
G10L 13/00;
G10L 15/00;
more...
IPC-7:
G10L 9/02;

|
U.S. Class: |
Current:
704/200;
Original:
395/002;

|
Field of Search: |
381/041-53
395/002

|
Priority Number: |

|
Abstract: |
A method and a system for synthesizing speech from unrestricted text, based on the principle of associating a written string of text with a sequence of speech features vectors that most probably model the corresponding speech utterance. The synthesizer is based on the interaction between two different Ergodic Hidden Markov Models: an acoustic model reflecting the constraints on the acoustic arrangement of speech, and a phonetic model interfacing phonemic transcription to the speech features representation.

|
Attorney, Agent or Firm: |
Schechter, Marc D. ;

|
Primary / Asst. Examiners: |
Fleming, Michael R.; Doerrler, Michelle

|
Maintenance Status: |
E2 Expired Check current status

|
INPADOC Legal Status: |
Show legal status actions
Family Legal Status Report

|
Designated Country: |
DE FR GB IT

|
Family: |
Show 7 known family members

|
First Claim:
Show all 10 claims |
We claim:
1. A method for generating synthesized speech wherein an acoustic ergodic hidden Markov model (AEHMM) reflecting constraints on the acoustic arrangement of speech is correlated to a phonetic ergodic hidden Markov model (PhEHMM), the method comprising the steps of
- a) building an AEHMM in which an observations sequence comprises speech features vectors extracted from frames in which the speech uttered during the training of said AEHMM is divided, and in which a hidden sequence comprises a sequence of sources that most probably emitted the speech utterance frames;
- b) initializing said AEHMM by a vector quantization clustering scheme having the same size as said AEHMM;
- c) training said AEHMM by the Forward-Backward algorithm and Baum-Welch re-estimation formulas;
- d) associating with each frame a label representing a most probable source;
- e) building a PhEHMM of the same size as said AEHMM in which an observations sequence comprises phoneme sequence obtained from a written text, and in which a hidden sequence comprises a sequence of labels;
- f) initializing a PhEHMM transition probability matrix by assigning to state transition probabilities the same values as the transition probabilities of the corresponding states of said AEHMM;
- g) initializing PhEHMM observation probability functions by:
- (g.1) using a speech corpus aligned with a sequence of phonemes,
- (g.2) generating for said speech corpus a sequence of most probable labels, using said AEHMM, and
- (g.3) computing the observations probability function for each phoneme, counting the number of occurrences of the phoneme in a state divided by the total number of phonemes emitted by said state;
- h) training said PhEHMM by the Baum-Welch algorithm on a proper synthetic observations corpus;
- h.1) providing an input text of one or more words to be synthesized;
- i) determining for each word to be synthesized a phoneme sequence and through said PhEHMM a sequence of labels corresponding to the word to be synthesized by means of a proper optimality criterion;
- j) determining from the input text a set of additional parameters, as energy, prosody contours and voicing, by a prosodic processor;
- k) determining, for the sequence of labels corresponding to the word to be synthesized, a set of speech features vectors corresponding to the word to be synthesized through said AEHMM;
- l) transforming said speech features vectors corresponding to the word to be synthesized into a set of filter coefficients representing spectral information; and
- m) using said set of filter coefficients and said additional parameters in a synthesis filter to produce a synthetic speech output.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 25 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other Abstract Info: |
DERABS G92-133508

|
Other References: |
Falaschi, A. et al., "A Functional Based Phonetic Units Definition for Statistical Speech Recognizers", Eurospeech Proceedings, Paris, France, Sep. 1989, vol. 1, pp. 13-16.
Juang, B. H., "On the Hidden Markov Model and Dynamic Time Warping for Speech Recognition-A Unified View", AT&T Bell Lab. Tech. Journal, vol. 63, No. 7, Sep. 1984, pp. 1213-1243.
(31 pages)
Cernuschi-Frias, B. et al., "On the Exact Maximum Likelihood Estimation of Gaussian Autoregressive Processes", IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 36, No. 6, Jun. 1988, pp. 922-924.
(3 pages)
Falaschi, A. et al., "A Finite States Markov Quantizer for Speech Coding", ICASSP Conference Proc., N.M., Jun. 1990, pp. 205-208.
Falaschi, A. et al., "A Hidden Markov Model Approach to Speech Synthesis", Eurospeech Proc. off Paris, France, 1989, pp. 187-190.

|


|
Nominate this for the Gallery...

|
|