Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   

 The Delphion Integrated View

  Buy Now:   Buy PDF- 17pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 Email this to a friend  Email this to a friend 
Title: US5230037: Phonetic Hidden Markov model speech synthesizer
[ Derwent Title ]

Country: US United States of America

View Images High


17 pages

Inventor: Giustiniani, Massimo; Rome, Italy
Pierucci, Piero; Rome, Italy

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1993-07-20 / 1991-06-07

Application Number: US1991000716022

IPC Code: Advanced: G01L 3/00; G10L 13/02; G10L 13/08; G10L 15/14; G10L 19/00;
IPC-7: G10L 9/02;

ECLA Code: G10L15/14; T05K999/99;

U.S. Class: Current: 704/200;
Original: 395/002;

Field of Search: 381/041-53 395/002

Priority Number:
1990-10-16  EP1990000119789

Abstract:     A method and a system for synthesizing speech from unrestricted text, based on the principle of associating to a written string of text the sequence of speech features vectors that most probably model the corresponding speech utterance. The synthesizer is based on the interaction between two different Ergodic Hidden Markov Models: an acoustic model reflecting the phonotactical constrains and a phonetic model interfacing phonemic transcription to the speech features representation. <IMAGE>

Attorney, Agent or Firm: Schechter, Marc D. ;

Primary / Asst. Examiners: Fleming, Michael R.; Doerrler, Michelle

Maintenance Status: E2 Expired  Check current status

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Designated Country: DE FR GB IT 

Family: Show 7 known family members

First Claim:
Show all 10 claims
We claim:     1. A method for generating synthesized speech wherein an acoustic ergodic hidden Markov model (AEHMM) reflecting constraints on the acoustic arrangement of speech is correlated to a phonetic ergodic hidden Markov model (PhEHMM), the method comprising the steps of
  • a) building an AEHMM in which an observations sequence comprises speech features vectors extracted from frames in which the speech uttered during the training of said AEHMM is divided, and in which a hidden sequence comprises a sequence of sources that most probably emitted the speech utterance frames;
  • b) initializing said AEHMM by a vector quantization clustering scheme having the same size as said AEHMM;
  • c) training said AEHMM by the Forward-Backward algorithm and Baum-Welch re-estimation formulas;
  • d) associating with each frame a label representing a most probable source;
  • e) building a PhEHMM of the same size as said AEHMM in which an observations sequence comprises phoneme sequence obtained from a written text, and in which a hidden sequence comprises a sequence of labels;
  • f) initializing a PhEHMM transition probability matrix by assigning to state transition probabilities the same values as the transition probabilities of the corresponding states of said AEHMM;
  • g) initializing PhEHMM observation probability functions by:
    • (g.1) using a speech corpus aligned with a sequence of phonemes,
    • (g.2) generating for said speech corpus a sequence of most probable labels, using said AEHMM, and
    • (g.3) computing the observations probability function for each phoneme, counting the number of occurrences of the phoneme in a state divided by the total number of phonemes emitted by said state;
  • h) training said PhEHMM by the Baum-Welch algorithm on a proper synthetic observations corpus;
    • h.1) providing an input text of one or more words to be synthesized;
  • i) determining for each word to be synthesized a phoneme sequence and through said PhEHMM a sequence of labels corresponding to the word to be synthesized by means of a proper optimality criterion;
  • j) determining from the input text a set of additional parameters, as energy, prosody contours and voicing, by a prosodic processor;
  • k) determining, for the sequence of labels corresponding to the word to be synthesized, a set of speech features vectors corresponding to the word to be synthesized through said AEHMM;
  • l) transforming said speech features vectors corresponding to the word to be synthesized into a set of filter coefficients representing spectral information; and
  • m) using said set of filter coefficients and said additional parameters in a synthesis filter to produce a synthetic speech output.

Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 29 U.S. patent(s) that reference this one

U.S. References: Go to Result Set: All U.S. references   |  Forward references (29)   |   Backward references (3)   |   Citation Link

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 28pp US4852180  1989-07 Levinson  American Telephone and Telegraph Company, AT&T Bell Laboratories Speech recognition by acoustic/phonetic system and technique
Get PDF - 23pp US4882759  1989-11 Bahl et al.  International Business Machines Corporation Synthesizing word baseforms used in speech recognition
Get PDF - 31pp US5033087  1991-07 Bahl et al.  International Business Machines Corp. Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system
Foreign References: None

Other Abstract Info: DERABS G92-133508

Other References:
  • Falaschi, A. et al., "A Functional Based Phonetic Units Definition for Statistical Speech Recognizers", Eurospeech Proceedings, Paris, France, Sep. 1989, vol. 1, pp. 13-16.
  • Juang, B. H., "On the Hidden Markov Model and Dynamic Time Warping for Speech Recognition-A Unified View", AT&T Bell Lab. Tech. Journal, vol. 63, No. 7, Sep. 1984, pp. 1213-1243. (31 pages)
  • Cernuschi-Frias, B. et al., "On the Exact Maximum Likelihood Estimation of Gaussian Autoregressive Processes", IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 36, No. 6, Jun. 1988, pp. 922-924. (3 pages)
  • Falaschi, A. et al., "A Finite States Markov Quantizer for Speech Coding", ICASSP Conference Proc., N.M., Jun. 1990, pp. 205-208.
  • Falaschi, A. et al., "A Hidden Markov Model Approach to Speech Synthesis", Eurospeech Proc. off Paris, France, 1989, pp. 187-190.

  • Inquire Regarding Licensing

    Powered by Verity

    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help