“Reading list from NCMMSC Speech group”版本间的差异
来自cslt Wiki
(相同用户的9个中间修订版本未显示) | |||
第2行: | 第2行: | ||
!Paper!!Referee!!Area and notes!!Link | !Paper!!Referee!!Area and notes!!Link | ||
|- | |- | ||
− | |George E. Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent | + | |George E. Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, 2011, IEEE Trans on ASLP. Vol.20, No.1.||贾磊(百度)||推动DNN应用于工业级ASR || http://research.microsoft.com/pubs/144412/dbn4lvcsr-transaslp.pdf |
|- | |- | ||
− | + | |Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition||谢磊(西工大)||HMM ||http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf | |
|- | |- | ||
− | + | |End-to-End Text-Dependent Speaker Verification Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer||肖雄(南洋理工大学)||这篇文章用神经网络来对不同长度的句子提取固定长度的向量(类似ivector)的作用。 | |
|- | |- | ||
− | + | |Rapid Speaker Adaptation in Eigenvoice Space||苏腾荣(华米)|| 对后面的基于超矢量的方法都有影响 | |
|- | |- | ||
− | | | + | |G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012. || 邹月娴(北大深圳)||DNN 声学模型|| |
|- | |- | ||
− | + | |Speech recognition with weighted finite-state transducers||苏腾荣(华米)|| ASR的标配||http://www.cslu.ogi.edu/~zak/cs506-lvr/mohri-wfst_asr.pdf | |
|- | |- | ||
− | + | |Speech Recognition Algorithms Using Weighted Finite-State Transducers Takaaki Hori and Atsushi Nakamura Synthesis Lectures on Speech and Audio Processing, January 2013, Vol. 9, No. 1 , Pages 1-162||陶斐(UTD)|| ASR和WFST|| | |
|- | |- | ||
− | | | + | |Biing-Hwang Juang, Wu Chou, Member, and Chin-Hui Lee,Minimum classification error rate methods for speech recognition||洪青阳(厦门大学) ||区分性训练MCE|| |
|- | |- | ||
− | + | |Daniel Povey.Discriminative Training for Large Vocabulary Speech Recognition.||杨嵩(驰声科技)||声学模型区分性训练|| | |
|- | |- | ||
− | + | |Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays, Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition || 徐海华(南阳理工大学),苏牧(云知声)||CTC | |
|- | |- | ||
− | + | |Alex Graves, Supervised Sequence Labeling with Recurrent Neural Networks. Phd thesis.||汤本来(南开),李博(谷歌)||LSTM,CTC|| | |
|- | |- | ||
− | | | + | |Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition.Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays || 徐海华(南洋理工学)||CTC||http://arxiv.org/pdf/1507.06947.pdf |
|- | |- | ||
− | + | |Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling by Brian Kingsbury, IBM Watson|| 王广森(新加坡I2R)|||| | |
|- | |- | ||
− | + | |MJF Gales:Maximum likelihood linear transformations for HMM-based speech recognition.《Computer Speech & Language》, 1998, 12(2):75–98 ||钱彦旻(上海交大)||MLLR|| | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |MJF Gales:Maximum likelihood linear transformations for HMM-based speech | + | |
− | + | ||
− | recognition.《Computer Speech & Language》, 1998, 12(2):75–98 | + | |
− | | | + | |
− | 钱彦旻(上海交大)||MLLR|| | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
+ | |Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), | ||
171-185||钱彦旻(上海交大)||MLLR|| | 171-185||钱彦旻(上海交大)||MLLR|| | ||
|- | |- | ||
− | |Tandem connectionist feature extraction for conventional HMM | + | |Tandem connectionist feature extraction for conventional HMM systems,hermansky||钱彦旻(上海交大)||自适应|| |
|- | |- | ||
− | + | |Subspace Gaussian mixture models for speech recognition. Povey, D. || 钱彦旻(上海交大)||dan的SGMM|| | |
|- | |- | ||
− | + | |A novel scheme for speaker recognition using a phonetically-aware deep neural network Y Lei, N Scheffer, L Ferrer, M McLaren ||夏瑞(Intel Lab)|||| | |
|- | |- | ||
− | + | |Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. Signal Processing Letters, IEEE, 2006, 13(5): 308-311. ||龙艳花(上海师范大学)||基于SVM声纹识别方面的文章|| | |
|- | |- | ||
− | |A | + | |Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, 1: I-I. ||龙艳花(上海师范大学) ||基于SVM声纹识别方面的文章|| |
|- | |- | ||
− | + | |Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, Speaker Verfication Using Adapted Gaussian Mixture Models||洪青阳(厦门大学)||说话人识别,GMM-UBM|| | |
|- | |- | ||
− | | | + | |Najim Dehak, Patrick Kenny, R′eda Dehak, Pierre Dumouchel, and Pierre Ouellet, Front-End Factor Analysis For Speaker Verification||洪青阳(厦门大学)|| 说话人识别,i-vector|| |
|- | |- | ||
− | + | |Analysis of I-vector Length Normalization in Speaker Recognition Systems Daniel Garcia-Romero and Carol Y. Espy-Wilson||许敏强(阿里巴巴)||length normalization + PLDA|| | |
|- | |- | ||
− | + | |Within-Class Covariance Normalization for SVM-based Speaker Recognition Andrew O. Hatch, Sachin Kajarekar, and Andreas Stolcke||许敏强(阿里巴巴)|| speaker方向,这个论文的方法,不仅可以用于speaker,还可以推广到图像识别、分类等领域,效果明显|| | |
|- | |- | ||
− | | | + | |Silke M Witt, Steve J Young, Phone-level pronunciation scoring and assessment for interactive language learning, 2000, Speech Communication ||黄浩(新疆大学)||GOP以及错误检测|| |
|- | |- | ||
− | + | |S. M. Witt.Use of Speech Recognition in Computer-assisted Language learning||杨嵩(驰声科技)||语音评测|| | |
|- | |- | ||
− | + | |Andrew J. Hunt, Alan W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP1996.||康永国(百度)||拼接语音合成的典型工作|| | |
|- | |- | ||
+ | |Zen H, Tokuda K, Black A W. Statistical parametric speech synthesis[J]. Speech Communication, 2009, 51(11): 1039-1064.||凌振华(中科大)||HMM统计参数语音合成|| | ||
|- | |- | ||
− | + | |Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov models[J]. Proceedings of the IEEE, 2013, 101(5): 1234-1252.||凌振华(中科大)||HMM统计参数语音合成|| | |
|- | |- | ||
− | + | |Zee, H., Senior, A., Schuster. M. 2013, Statistical parametric speech sythesis uusing deep neural networks||吴君如(华东师大),康永国(百度)|||| | |
|- | |- | ||
− | | | + | |K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech|parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000||康永国(百度)||HMM统计参数语音合成|| |
|- | |- | ||
− | + | |S. King, "A reading list of recent advances in speech synthesis", Proc. ICPhS2015.||武执正(爱丁堡大学),杨鹏(百度)|| https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS1043.pdf | |
|- | |- | ||
− | + | |statistical parametric speech synthesis,Heiga Zen||杨辰雨(新加坡I2R)||语音合成声学建模方面|| | |
|- | |- | ||
− | | | + | |ZH Ling:Deep Learning for Acoustic Modeling in Parametric Speech Generation.《Signal Processing Magazine IEEE》, 2015, 32(3):35-52 ||杨辰雨(新加坡I2R)||语音合成声学建模方面|| |
|- | |- | ||
− | + | |Xu Yi. Separation of functional components of tone and intonation from observed F0 patterns.||林怡亭(Nuance),李雅(中科院自动化所)|||| | |
|- | |- | ||
− | + | |Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based |automatic segmentation of speech into sentences and topics. Speech communication, 32(1), 127-154.||陈磊(ETS语音评测),谢磊(西工大)|| SRI使用Prosody信息做语音结构化切分的工作,Google Scholar 引用 430|| | |
|- | |- | ||
− | | | + | |ToBI: A standard for labeling English prosody||杨辰雨(新加坡I2R)||中英文韵律标注|| |
|- | |- | ||
− | + | |chinese prosody and prosodic labeling of spontaneous speech || 杨辰雨(新加坡I2R)||C-ToBI 3.0|| | |
|- | |- | ||
− | + | |Shrikanth S. Narayanan and Panayiotis Georgiou, Behavioral Signal Processing: Deriving Human Behavioral Informatics from Speech and Language (2013), in: Proceedings of IEEE, 101:5(1203 - 1233) ||李明(中山大学)|| 语音及多模态行为信号分析的综述性paper 推荐给做情感计算和行为分析这一领域的人|| | |
|- | |- | ||
− | | | + | |Levelt. W, Roelofs. A, 1999, A theory of lexical access in speech production. ||吴君如(华东师大)|| 语言认知领域,本文为心理语言学界到90年代末为止,对人类语言产生心理过程实证研究结果及机制探讨最全面的总结,不少计算模型都以重现本文列举的效应为目标 |
|- | |- | ||
− | + | |A Highly Robust Audio Fingerprinting System,Pilips 的Jaap Haitsma || 朱磊(芋头科技)||audio fingerprint|| | |
|- | |- | ||
− | + | |Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. ||陈谐(剑桥)|||| | |
|- | |- | ||
− | | | + | |Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio, Neural Machine Translation By Jointly Learning To Align And Translate || 肖雄(南洋理工大学),徐海华(南洋理工大学)||attention model for MT ||http://arxiv.org/pdf/1409.0473.pdf |
|- | |- | ||
− | + | |Book and Thesis|||||| | |
|- | |- | ||
− | + | |《Spoken Language Processing: A Guide to Theory, Algorithm, and System Development》 黄学东||何伟(中国传媒大学)钱彦旻(上海交大)|||| | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |《Spoken Language Processing: A Guide to Theory, Algorithm, and System | + | |
− | + | ||
− | Development》 黄学东||何伟(中国传媒大学)钱彦旻(上海交大)|||| | + | |
|- | |- | ||
|自然语言处理综论,daniel jurafsky||汪淼淼(阿里巴巴)|||| | |自然语言处理综论,daniel jurafsky||汪淼淼(阿里巴巴)|||| | ||
|- | |- | ||
− | |Speech enhancement theory and practice, Philipos C. Loizou, | + | |Speech enhancement theory and practice, Philipos C. Loizou, ||张学良(内蒙古大学)||语音增强的书|| |
|- | |- | ||
− | + | |Statistical methods for speech recognition, Jenilek,||金琴(中国人民大学)经典教材|| | |
|- | |- | ||
− | | | + | |Hidden Markov Models for Speech Recognition (Edinburgh University Press 1990) 穆向禹(百度)|||| |
|- | |- | ||
− | + | |Machine Learning Paradigms for Speech Recognition||卢鲤(腾讯)||用机器学习的观点看语音识别,框架非常清晰|| | |
|- | |- | ||
− | | | + | |《实用语音识别基础》,国防工业出版社||王晶(北理工)|||| |
|- | |- | ||
− | + | |Text-to-speech synthesis, Paul Taylor, University of Cambridge || 黄东延(新加坡)||||书对text-to-speech 怎样work 给了详细深入的解释 | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |Text-to-speech synthesis, Paul Taylor, University of Cambridge | + | |
− | | | + | |
− | 黄东延(新加坡)||||书对text-to-speech 怎样work 给了详细深入的解释 | + | |
|- | |- | ||
|A course in phonetics, Ladefoged||冯卉(天津大学)||群内多人推荐|| | |A course in phonetics, Ladefoged||冯卉(天津大学)||群内多人推荐|| | ||
|- | |- | ||
− | |A Course in Phonetics (7th Ed.). P. Ladeforged & K. Johnson (2015). Cengage | + | |A Course in Phonetics (7th Ed.). P. Ladeforged & K. Johnson (2015). Cengage Learning. ||顾文涛(南京师范大学)||很好的入门级教科书|| |
− | + | ||
− | Learning. ||顾文涛(南京师范大学)||很好的入门级教科书 | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
− | | | + | |Acoustics and Auditory Phonetics (3rd Ed.).K. Johnson (2012). Wiley-Blackwell.||顾文涛(南京师范大学)|||| |
|- | |- | ||
− | Wiley-Blackwell.||顾文涛(南京师范大学)|||| | + | |Articulatory Phonetics. B. Gick, I. Wilson, & D. Derrick (2013). Wiley-Blackwell.||顾文涛(南京师范大学)|||| |
|- | |- | ||
|实验语音学概要,实验语音学概要 修订版||熊子瑜(语言所),时秀娟(天津师大)|||| | |实验语音学概要,实验语音学概要 修订版||熊子瑜(语言所),时秀娟(天津师大)|||| | ||
第306行: | 第119行: | ||
|实验语音学基础教程,孔江平||时秀娟(天津师大)|||| | |实验语音学基础教程,孔江平||时秀娟(天津师大)|||| | ||
|- | |- | ||
− | |Phonetics,Reetz & Jongman|| | + | |Phonetics,Reetz & Jongman||孙锐欣(华东师大)国内李爱军老师等在翻译中文版|| |
− | + | ||
− | + | ||
|- | |- | ||
|《实验语音学概要》吴宗济||王磊(音乐雷达)等||语音合成--音韵学|| | |《实验语音学概要》吴宗济||王磊(音乐雷达)等||语音合成--音韵学|| | ||
第314行: | 第125行: | ||
|自然语言处理综论,Daniel Jurafsky|||||| | |自然语言处理综论,Daniel Jurafsky|||||| | ||
|- | |- | ||
− | | | + | |Duda的 Pattern Classification 第二版,有中文版||谢凌云(中国传媒大学)||模式识别|| |
− | | | + | |
− | 模式识别|| | + | |
|- | |- | ||
|《现代汉语音典》蔡莲红、孔江平||王愈(捷通华声)|||| | |《现代汉语音典》蔡莲红、孔江平||王愈(捷通华声)|||| | ||
第324行: | 第133行: | ||
|在英语语调理论AM基础上对汉语语调的研究|| | |在英语语调理论AM基础上对汉语语调的研究|| | ||
|- | |- | ||
− | |Sun-Ah Jun写的prosodic | + | |Sun-Ah Jun写的prosodic topology,中科院声学所吕士楠老师将之翻译为中文版《韵律类型学》||郝玉峰(海天瑞声)||多语言韵律标注|| |
|- | |- | ||
− | + | |Kenneth N. Stevens的Acoustic Phonetics||解炎陆(北京语言大学)|| 从acoustic的角度阐述了各种发音的特征,原版太贵,希望国内能出版。|| | |
|- | |- | ||
− | + | |"Ladefoged《世界语音》 || 时秀娟(天津师大)||http://mp.weixin.qq.com/s?__biz=MzA3OTI3MjEzNg==&mid=400341406&idx=2&sn=484d61f4ab9dcfe7bb613bb8d119a161&scene=1&srcid=1104uQiKdYcy75BYTBJ9xA99#rd | |
|- | |- | ||
− | | | + | |Theory and Applications of Digital Speech Processing, Lawrence Rabiner,|| 党建武(天津大学)|||| |
|- | |- | ||
− | + | |T. F. Quatieri, Discrete-time speech signal processing(英文版)|| 王晶(北理工)||经典的语音信号处理课程教材|| | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |T. F. Quatieri, Discrete-time speech signal processing(英文版) | + | |
− | | | + | |
− | 王晶(北理工)||经典的语音信号处理课程教材|| | + | |
|- | |- | ||
|《信号与系统》奥本海《Signals and Systems》Alan V. Oppenheim||陈谐(剑桥)|||| | |《信号与系统》奥本海《Signals and Systems》Alan V. Oppenheim||陈谐(剑桥)|||| | ||
|- | |- | ||
− | |Microphone Arrays: Signal Processing Techniques and Applications (Digital | + | |Microphone Arrays: Signal Processing Techniques and Applications (Digital Signal Processing) by Michael Brandstein, Darren Ward, Springer, 2001. || 李军锋(中科院声学所)||语音信号处理领域|| |
|- | |- | ||
− | | | + | |Pattern recognition and meachine learning||王东(清华)|| 机器学习领域经典大作|| |
|- | |- | ||
+ | |Machine learning a probabilistic perspective,machine learning algorithmic perspective||卢鲤(腾讯)|||| | ||
|- | |- | ||
− | + | |Introduction to statistical pattern recognition. Keinosuke Fukunaga ||朱璇(三星北京研究院)||模式识别 这本书对于特征空间的表述非常清晰,深入浅出,很适合初学者。 | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |Introduction to statistical pattern recognition. Keinosuke Fukunaga | + | |
− | | | + | |
− | 朱璇(三星北京研究院)||模式识别 | + | |
− | + | ||
− | 这本书对于特征空间的表述非常清晰,深入浅出,很适合初学者。 | + | |
|- | |- | ||
|An introduction for support vector machine||朱璇(三星北京研究院)||svm|| | |An introduction for support vector machine||朱璇(三星北京研究院)||svm|| | ||
第377行: | 第159行: | ||
|<<测度论与概率论基础>>,北京大学出版社||明怀平(新加坡I2R) | |<<测度论与概率论基础>>,北京大学出版社||明怀平(新加坡I2R) | ||
|- | |- | ||
− | + | |Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 ||俞凯(上海交大)||鉴别性训练,博士论文|| | |
− | + | ||
− | |Daniel Povey, "Discriminative Training for Large Vocabulary Speech | + | |
− | + | ||
− | Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 | + | |
− | | | + | |
− | 俞凯(上海交大)||鉴别性训练,博士论文 | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
− | |||||| | + | |语境相关的声学模型和搜索策略的研究,高升,中国科学院博士论文,2001 || 李宏言(阿里巴巴)||国内早期lvcsr的力作|| |
|- | |- | ||
|Tools|||||| | |Tools|||||| | ||
第406行: | 第178行: | ||
|- | |- | ||
|Eesen ||||CTC toolkit||https://github.com/yajiemiao/eesen | |Eesen ||||CTC toolkit||https://github.com/yajiemiao/eesen | ||
− | | | + | ||- |
− | + | ||
− | |- | + | |
|Video & online course|||||| | |Video & online course|||||| | ||
|- | |- | ||
− | |Deep Learning Summer School, Montreal 2015 | + | |Deep Learning Summer School, Montreal 2015 |||||| http://videolectures.net/deeplearning2015_montreal/ |
− | | | + | |
− | | | + | |
− | http://videolectures.net/deeplearning2015_montreal | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
− | | | + | |INTRODUCTION TO DIGITAL FILTERS ||王愈(捷通华声)||一套在线的信号处理教程,深入浅出地讲解了信号分析处理的基础知识,并结合Matlab常用的信号系统库函数——如freqz——推导讲解简明透彻|| https://ccrma.stanford.edu/~jos/filters/ |
|- | |- | ||
− | | | + | |九州语言网||李爱军(社科院语言所)|||| |
|- | |- | ||
− | http://9zhou.phonetics.org.cn/ | + | |对汉语方言语法、语音感兴趣的,可以访问熊子瑜负责的语言所在建九州语言网||||||http://9zhou.phonetics.org.cn/ |
|- | |- | ||
|} | |} |
2015年11月7日 (六) 04:07的最后版本
Paper | Referee | Area and notes | Link | |||||
---|---|---|---|---|---|---|---|---|
George E. Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, 2011, IEEE Trans on ASLP. Vol.20, No.1. | 贾磊(百度) | 推动DNN应用于工业级ASR | http://research.microsoft.com/pubs/144412/dbn4lvcsr-transaslp.pdf | |||||
Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition | 谢磊(西工大) | HMM | http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf | |||||
End-to-End Text-Dependent Speaker Verification Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer | 肖雄(南洋理工大学) | 这篇文章用神经网络来对不同长度的句子提取固定长度的向量(类似ivector)的作用。 | ||||||
Rapid Speaker Adaptation in Eigenvoice Space | 苏腾荣(华米) | 对后面的基于超矢量的方法都有影响 | ||||||
G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012. | 邹月娴(北大深圳) | DNN 声学模型 | ||||||
Speech recognition with weighted finite-state transducers | 苏腾荣(华米) | ASR的标配 | http://www.cslu.ogi.edu/~zak/cs506-lvr/mohri-wfst_asr.pdf | |||||
Speech Recognition Algorithms Using Weighted Finite-State Transducers Takaaki Hori and Atsushi Nakamura Synthesis Lectures on Speech and Audio Processing, January 2013, Vol. 9, No. 1 , Pages 1-162 | 陶斐(UTD) | ASR和WFST | ||||||
Biing-Hwang Juang, Wu Chou, Member, and Chin-Hui Lee,Minimum classification error rate methods for speech recognition | 洪青阳(厦门大学) | 区分性训练MCE | ||||||
Daniel Povey.Discriminative Training for Large Vocabulary Speech Recognition. | 杨嵩(驰声科技) | 声学模型区分性训练 | ||||||
Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays, Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition | 徐海华(南阳理工大学),苏牧(云知声) | CTC | ||||||
Alex Graves, Supervised Sequence Labeling with Recurrent Neural Networks. Phd thesis. | 汤本来(南开),李博(谷歌) | LSTM,CTC | ||||||
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition.Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays | 徐海华(南洋理工学) | CTC | http://arxiv.org/pdf/1507.06947.pdf | |||||
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling by Brian Kingsbury, IBM Watson | 王广森(新加坡I2R) | |||||||
MJF Gales:Maximum likelihood linear transformations for HMM-based speech recognition.《Computer Speech & Language》, 1998, 12(2):75–98 | 钱彦旻(上海交大) | MLLR | ||||||
Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2),
171-185||钱彦旻(上海交大)||MLLR|| | ||||||||
Tandem connectionist feature extraction for conventional HMM systems,hermansky | 钱彦旻(上海交大) | 自适应 | ||||||
Subspace Gaussian mixture models for speech recognition. Povey, D. | 钱彦旻(上海交大) | dan的SGMM | ||||||
A novel scheme for speaker recognition using a phonetically-aware deep neural network Y Lei, N Scheffer, L Ferrer, M McLaren | 夏瑞(Intel Lab) | |||||||
Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. Signal Processing Letters, IEEE, 2006, 13(5): 308-311. | 龙艳花(上海师范大学) | 基于SVM声纹识别方面的文章 | ||||||
Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, 1: I-I. | 龙艳花(上海师范大学) | 基于SVM声纹识别方面的文章 | ||||||
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, Speaker Verfication Using Adapted Gaussian Mixture Models | 洪青阳(厦门大学) | 说话人识别,GMM-UBM | ||||||
Najim Dehak, Patrick Kenny, R′eda Dehak, Pierre Dumouchel, and Pierre Ouellet, Front-End Factor Analysis For Speaker Verification | 洪青阳(厦门大学) | 说话人识别,i-vector | ||||||
Analysis of I-vector Length Normalization in Speaker Recognition Systems Daniel Garcia-Romero and Carol Y. Espy-Wilson | 许敏强(阿里巴巴) | length normalization + PLDA | ||||||
Within-Class Covariance Normalization for SVM-based Speaker Recognition Andrew O. Hatch, Sachin Kajarekar, and Andreas Stolcke | 许敏强(阿里巴巴) | speaker方向,这个论文的方法,不仅可以用于speaker,还可以推广到图像识别、分类等领域,效果明显 | ||||||
Silke M Witt, Steve J Young, Phone-level pronunciation scoring and assessment for interactive language learning, 2000, Speech Communication | 黄浩(新疆大学) | GOP以及错误检测 | ||||||
S. M. Witt.Use of Speech Recognition in Computer-assisted Language learning | 杨嵩(驰声科技) | 语音评测 | ||||||
Andrew J. Hunt, Alan W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP1996. | 康永国(百度) | 拼接语音合成的典型工作 | ||||||
Zen H, Tokuda K, Black A W. Statistical parametric speech synthesis[J]. Speech Communication, 2009, 51(11): 1039-1064. | 凌振华(中科大) | HMM统计参数语音合成 | ||||||
Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov models[J]. Proceedings of the IEEE, 2013, 101(5): 1234-1252. | 凌振华(中科大) | HMM统计参数语音合成 | ||||||
Zee, H., Senior, A., Schuster. M. 2013, Statistical parametric speech sythesis uusing deep neural networks | 吴君如(华东师大),康永国(百度) | |||||||
parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000 | 康永国(百度) | HMM统计参数语音合成 | ||||||
S. King, "A reading list of recent advances in speech synthesis", Proc. ICPhS2015. | 武执正(爱丁堡大学),杨鹏(百度) | https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS1043.pdf | ||||||
statistical parametric speech synthesis,Heiga Zen | 杨辰雨(新加坡I2R) | 语音合成声学建模方面 | ||||||
ZH Ling:Deep Learning for Acoustic Modeling in Parametric Speech Generation.《Signal Processing Magazine IEEE》, 2015, 32(3):35-52 | 杨辰雨(新加坡I2R) | 语音合成声学建模方面 | ||||||
Xu Yi. Separation of functional components of tone and intonation from observed F0 patterns. | 林怡亭(Nuance),李雅(中科院自动化所) | |||||||
automatic segmentation of speech into sentences and topics. Speech communication, 32(1), 127-154. | 陈磊(ETS语音评测),谢磊(西工大) | SRI使用Prosody信息做语音结构化切分的工作,Google Scholar 引用 430 | ||||||
ToBI: A standard for labeling English prosody | 杨辰雨(新加坡I2R) | 中英文韵律标注 | ||||||
chinese prosody and prosodic labeling of spontaneous speech | 杨辰雨(新加坡I2R) | C-ToBI 3.0 | ||||||
Shrikanth S. Narayanan and Panayiotis Georgiou, Behavioral Signal Processing: Deriving Human Behavioral Informatics from Speech and Language (2013), in: Proceedings of IEEE, 101:5(1203 - 1233) | 李明(中山大学) | 语音及多模态行为信号分析的综述性paper 推荐给做情感计算和行为分析这一领域的人 | ||||||
Levelt. W, Roelofs. A, 1999, A theory of lexical access in speech production. | 吴君如(华东师大) | 语言认知领域,本文为心理语言学界到90年代末为止,对人类语言产生心理过程实证研究结果及机制探讨最全面的总结,不少计算模型都以重现本文列举的效应为目标 | ||||||
A Highly Robust Audio Fingerprinting System,Pilips 的Jaap Haitsma | 朱磊(芋头科技) | audio fingerprint | ||||||
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. | 陈谐(剑桥) | |||||||
Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio, Neural Machine Translation By Jointly Learning To Align And Translate | 肖雄(南洋理工大学),徐海华(南洋理工大学) | attention model for MT | http://arxiv.org/pdf/1409.0473.pdf | |||||
Book and Thesis | ||||||||
《Spoken Language Processing: A Guide to Theory, Algorithm, and System Development》 黄学东 | 何伟(中国传媒大学)钱彦旻(上海交大) | |||||||
自然语言处理综论,daniel jurafsky | 汪淼淼(阿里巴巴) | |||||||
Speech enhancement theory and practice, Philipos C. Loizou, | 张学良(内蒙古大学) | 语音增强的书 | ||||||
Statistical methods for speech recognition, Jenilek, | 金琴(中国人民大学)经典教材 | |||||||
Hidden Markov Models for Speech Recognition (Edinburgh University Press 1990) 穆向禹(百度) | ||||||||
Machine Learning Paradigms for Speech Recognition | 卢鲤(腾讯) | 用机器学习的观点看语音识别,框架非常清晰 | ||||||
《实用语音识别基础》,国防工业出版社 | 王晶(北理工) | |||||||
Text-to-speech synthesis, Paul Taylor, University of Cambridge | 黄东延(新加坡) | 书对text-to-speech 怎样work 给了详细深入的解释 | ||||||
A course in phonetics, Ladefoged | 冯卉(天津大学) | 群内多人推荐 | ||||||
A Course in Phonetics (7th Ed.). P. Ladeforged & K. Johnson (2015). Cengage Learning. | 顾文涛(南京师范大学) | 很好的入门级教科书 | ||||||
Acoustics and Auditory Phonetics (3rd Ed.).K. Johnson (2012). Wiley-Blackwell. | 顾文涛(南京师范大学) | |||||||
Articulatory Phonetics. B. Gick, I. Wilson, & D. Derrick (2013). Wiley-Blackwell. | 顾文涛(南京师范大学) | |||||||
实验语音学概要,实验语音学概要 修订版 | 熊子瑜(语言所),时秀娟(天津师大) | |||||||
实验语音学基础教程,孔江平 | 时秀娟(天津师大) | |||||||
Phonetics,Reetz & Jongman | 孙锐欣(华东师大)国内李爱军老师等在翻译中文版 | |||||||
《实验语音学概要》吴宗济 | 王磊(音乐雷达)等 | 语音合成--音韵学 | ||||||
自然语言处理综论,Daniel Jurafsky | ||||||||
Duda的 Pattern Classification 第二版,有中文版 | 谢凌云(中国传媒大学) | 模式识别 | ||||||
《现代汉语音典》蔡莲红、孔江平 | 王愈(捷通华声) | |||||||
《汉语语调实验研究》2012年,作者林茂灿 | 李爱军(社科院语言所) | |||||||
在英语语调理论AM基础上对汉语语调的研究 | ||||||||
Sun-Ah Jun写的prosodic topology,中科院声学所吕士楠老师将之翻译为中文版《韵律类型学》 | 郝玉峰(海天瑞声) | 多语言韵律标注 | ||||||
Kenneth N. Stevens的Acoustic Phonetics | 解炎陆(北京语言大学) | 从acoustic的角度阐述了各种发音的特征,原版太贵,希望国内能出版。 | ||||||
"Ladefoged《世界语音》 | 时秀娟(天津师大) | http://mp.weixin.qq.com/s?__biz=MzA3OTI3MjEzNg==&mid=400341406&idx=2&sn=484d61f4ab9dcfe7bb613bb8d119a161&scene=1&srcid=1104uQiKdYcy75BYTBJ9xA99#rd | ||||||
Theory and Applications of Digital Speech Processing, Lawrence Rabiner, | 党建武(天津大学) | |||||||
T. F. Quatieri, Discrete-time speech signal processing(英文版) | 王晶(北理工) | 经典的语音信号处理课程教材 | ||||||
《信号与系统》奥本海《Signals and Systems》Alan V. Oppenheim | 陈谐(剑桥) | |||||||
Microphone Arrays: Signal Processing Techniques and Applications (Digital Signal Processing) by Michael Brandstein, Darren Ward, Springer, 2001. | 李军锋(中科院声学所) | 语音信号处理领域 | ||||||
Pattern recognition and meachine learning | 王东(清华) | 机器学习领域经典大作 | ||||||
Machine learning a probabilistic perspective,machine learning algorithmic perspective | 卢鲤(腾讯) | |||||||
Introduction to statistical pattern recognition. Keinosuke Fukunaga | 朱璇(三星北京研究院) | 模式识别 这本书对于特征空间的表述非常清晰,深入浅出,很适合初学者。 | ||||||
An introduction for support vector machine | 朱璇(三星北京研究院) | svm | ||||||
步尚全《基础泛函分析》 | 邓侃(思昂教育) | 泛函 | ||||||
<<测度论与概率论基础>>,北京大学出版社 | 明怀平(新加坡I2R) | |||||||
Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 | 俞凯(上海交大) | 鉴别性训练,博士论文 | ||||||
语境相关的声学模型和搜索策略的研究,高升,中国科学院博士论文,2001 | 李宏言(阿里巴巴) | 国内早期lvcsr的力作 | ||||||
Tools | ||||||||
HTK book | ||||||||
Kaldi | ||||||||
Praat | ||||||||
Theano | ||||||||
CNTK | ||||||||
RNNLIB | ||||||||
Eesen | CTC toolkit | https://github.com/yajiemiao/eesen | - | Video & online course | ||||
Deep Learning Summer School, Montreal 2015 | http://videolectures.net/deeplearning2015_montreal/ | |||||||
INTRODUCTION TO DIGITAL FILTERS | 王愈(捷通华声) | 一套在线的信号处理教程,深入浅出地讲解了信号分析处理的基础知识,并结合Matlab常用的信号系统库函数——如freqz——推导讲解简明透彻 | https://ccrma.stanford.edu/~jos/filters/ | |||||
九州语言网 | 李爱军(社科院语言所) | |||||||
对汉语方言语法、语音感兴趣的,可以访问熊子瑜负责的语言所在建九州语言网 | http://9zhou.phonetics.org.cn/ |