“News-20150401”版本间的差异
(相同用户的3个中间修订版本未显示) | |||
第1行: | 第1行: | ||
− | + | '''Speech structure; human-inspired representation of speech acoustics''' | |
− | + | ||
− | |||
− | |||
— What are "really" speaker-independent features? — | — What are "really" speaker-independent features? — | ||
+ | |||
+ | Time: 2015/03/30 10AM | ||
+ | Venue: FIT BLDG, 1-315 | ||
+ | |||
Speech signals covey various kinds of information, which are broadly grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech signals and speaker recognizers extract only speaker identity. Here, irrelevant aspects are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant aspects are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic (word identity) and extra‐linguistic information (speaker identity). Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our proposal of "really" speaker-independent or speaker-invariant features, called speech structure, is explained. Speaker variation can be modeled as feature space transformation and our speech structure model is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. In this talk, we show how we technically implemented findings obtained in phonology and psychology and we also show some examples of applying speech structure to speech applications. | Speech signals covey various kinds of information, which are broadly grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech signals and speaker recognizers extract only speaker identity. Here, irrelevant aspects are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant aspects are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic (word identity) and extra‐linguistic information (speaker identity). Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our proposal of "really" speaker-independent or speaker-invariant features, called speech structure, is explained. Speaker variation can be modeled as feature space transformation and our speech structure model is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. In this talk, we show how we technically implemented findings obtained in phonology and psychology and we also show some examples of applying speech structure to speech applications. | ||
第17行: | 第18行: | ||
[[文件:20150330 104102.jpg|300px]] | [[文件:20150330 104102.jpg|300px]] | ||
+ | |||
+ | |||
+ | [http://www.gavo.t.u-tokyo.ac.jp/~mine/speech_structure2.pdf slides] |
2015年5月15日 (五) 11:40的最后版本
Speech structure; human-inspired representation of speech acoustics
— What are "really" speaker-independent features? —
Time: 2015/03/30 10AM Venue: FIT BLDG, 1-315
Speech signals covey various kinds of information, which are broadly grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech signals and speaker recognizers extract only speaker identity. Here, irrelevant aspects are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant aspects are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic (word identity) and extra‐linguistic information (speaker identity). Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our proposal of "really" speaker-independent or speaker-invariant features, called speech structure, is explained. Speaker variation can be modeled as feature space transformation and our speech structure model is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. In this talk, we show how we technically implemented findings obtained in phonology and psychology and we also show some examples of applying speech structure to speech applications.
Nobuaki Minematsu received the doctor of Engineering in 1995 from the University of Tokyo. Currently, he is a full professor there. He has a wide interest in speech communication covering from science to engineering. He has published more than 400 scientific and technical papers including conference papers, which focus on speech analysis, speech perception, speech disorder, speech recognition, speech synthesis, dialogue system, language learning systems, etc. He received paper awards from RISP, JSAI, ICIST, O-COCOSDA in 2005, 2007, 2011, and 2014 and received an encouragement award from PSJ in 2014. He gave a tutorial talk on CALL at APSIPA2011 and INTERSPEECH2012. Recently, he developed an online tutoring system OJAD (Online Japanese Accent Dictionary) for learners/teachers of Japanese, which has been introduced to many international institutes of Japanese language education. On March 26 and 27, three workshops will be given at Beijing Foreign Studies University, Beijing Language and Culture University, and Beijing Normal University. He is a member of IEEE, ISCA, IPA, SLaTE, IEICE, ASJ, etc.