Reading list from NCMMSC Speech group

Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, 2011, IEEE Trans on ASLP. Vol.20, No.1.||贾磊（百度）||推动DNN应用于工业级ASR http://research.microsoft.com/pubs/144412/dbn4lvcsr-transaslp.pdf applications in speech recognition||谢磊（西工大）||HMM http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Samy Bengio, Noam Shazeer||肖雄（南洋理工大学）这篇文章用神经网络来对不同长度的句子提取固定长度的向量（类似ivector）的作用。下载地址：http://arxiv.org/abs/1509.08062。对后面的基于超矢量的方法都有影响 http://courses.cs.tamu.edu/rgutier/cpsc689_s07/kuhn2000speakerAdaptationEigenvoice.pdf in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012. 邹月娴（北大深圳）||DNN 声学模型|| ASR的标配||http://www.cslu.ogi.edu/~zak/cs506-lvr/mohri-wfst_asr.pdf Hori and Atsushi Nakamura Synthesis Lectures on Speech and Audio Processing, January 2013, Vol. 9, No. 1 , Pages 1-162||陶斐（UTD） ASR和WFST|| Biing-Hwang Juang, Wu Chou, Member, and Chin-Hui Lee，Minimum classification error rate methods for speech recognition||洪青阳（厦门大学）区分性训练MCE|| 杨嵩（驰声科技）||声学模型区分性训练|| Accurate Recurrent Neural Network Acoustic Models for Speech Recognition 徐海华（南阳理工大学），苏牧（云知声）||CTC thesis.||汤本来（南开），李博（谷歌）||LSTM，CTC|| Recognition.Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays 徐海华（南洋理工大学）||CTC||http://arxiv.org/pdf/1507.06947.pdf neural-network acoustic modeling by Brian Kingsbury, IBM Watson 王广森（新加坡I2R）|||| recognition.《Computer Speech & Language》, 1998, 12(2):75–98 钱彦旻（上海交大）||MLLR|| continuous density hidden Markov models. Computer Speech and Language 9(2), 171-185||钱彦旻（上海交大）||MLLR|| systems，hermansky||钱彦旻（上海交大）||自适应|| Subspace Gaussian mixture models for speech recognition. Povey, D. 钱彦旻（上海交大）||dan的SGMM|| network Y Lei, N Scheffer, L Ferrer, M McLaren ||夏瑞（Intel Lab）|||| supervectors for speaker verification[J]. Signal Processing Letters, IEEE, 2006, 13(5): 308-311. ||龙艳花（上海师范大学）||基于SVM声纹识别方面的文章|| using a GMM supervector kernel and NAP variability compensation[C]//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, 1: I-I. ||龙艳花（上海师范大学）基于SVM声纹识别方面的文章|| Verfication Using Adapted Gaussian Mixture Models||洪青阳（厦门大学）说话人识别，GMM-UBM|| Front-End Factor Analysis For Speaker Verification||洪青阳（厦门大学）说话人识别，i-vector|| Daniel Garcia-Romero and Carol Y. Espy-Wilson||许敏强（阿里巴巴）||length normalization + PLDA|| O. Hatch, Sachin Kajarekar, and Andreas Stolcke||许敏强（阿里巴巴） speaker方向,这个论文的方法，不仅可以用于speaker，还可以推广到图像识别、分类等领域，效果明显|| for interactive language learning, 2000, Speech Communication 黄浩（新疆大学）||GOP以及错误检测|| 杨嵩（驰声科技）||语音评测|| synthesis system using a large speech database, ICASSP1996.||康永国（百度）拼接语音合成的典型工作|| Communication, 2009, 51(11): 1039-1064.||凌振华（中科大） HMM统计参数语音合成|| models[J]. Proceedings of the IEEE, 2013, 101(5): 1234-1252. 凌振华（中科大）||HMM统计参数语音合成|| using deep neural networks||吴君如（华东师大），康永国（百度）|||| K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech ICASSP, pp.1315-1318, June 2000||康永国（百度）||HMM统计参数语音合成|| S. King, "A reading list of recent advances in speech synthesis", Proc. ICPhS 2015.||武执正（爱丁堡大学），杨鹏（百度） https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS1043.pdf 语音合成声学建模方面|| Generation.《Signal Processing Magazine IEEE》, 2015, 32(3):35-52 杨辰雨（新加坡I2R）||语音合成声学建模方面|| observed F0 patterns.||林怡亭（Nuance）,李雅（中科院自动化所）|||| Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based communication, 32(1), 127-154.||陈磊（ETS语音评测），谢磊（西工大） SRI使用Prosody信息做语音结构化切分的工作，Google Scholar 引用 430|| 中英文韵律标注|| 杨辰雨（新加坡I2R）||C-ToBI 3.0|| Deriving Human Behavioral Informatics from Speech and Language (2013), in: Proceedings of IEEE, 101:5(1203 - 1233) ||李明（中山大学）语音及多模态行为信号分析的综述性paper 推荐给做情感计算和行为分析这一领域的人|| 吴君如（华东师大）语言认知领域，本文为心理语言学界到90年代末为止，对人类语言产生心理过程实证研究结果及机制探讨最全面的总结，不少计算模型都以重现本文列举的效应为目标朱磊（芋头科技）||audio fingerprint|| 陈谐(剑桥）|||| Jointly Learning To Align And Translate 肖雄（南洋理工大学）,徐海华（南洋理工大学）||attention model for MT http://arxiv.org/pdf/1409.0473.pdf Book and Thesis|||||| Development》黄学东||何伟（中国传媒大学）钱彦旻（上海交大）|||| 张学良（内蒙古大学）||语音增强的书|| 经典教材|| 穆向禹（百度）|||| 用机器学习的观点看语音识别，框架非常清晰|| 《实用语音识别基础》，国防工业出版社||王晶（北理工）|||| 黄东延（新加坡）||||书对text-to-speech 怎样work 给了详细深入的解释 Learning. ||顾文涛（南京师范大学）||很好的入门级教科书|| Wiley-Blackwell.||顾文涛（南京师范大学）|||| Wiley-Blackwell.||顾文涛（南京师范大学）|||| 国内李爱军老师等在翻译中文版|| 模式识别|| topology，中科院声学所吕士楠老师将之翻译为中文版《韵律类型学》郝玉峰（海天瑞声）||多语言韵律标注|| 从acoustic的角度阐述了各种发音的特征，原版太贵，希望国内能出版。|| "||时秀娟（天津师大） http://mp.weixin.qq.com/s?__biz=MzA3OTI3MjEzNg==&mid=400341406&idx=2&sn=484d61f4ab9dcfe7bb613bb8d119a161&scene=1&srcid=1104uQiKdYcy75BYTBJ9xA99#rd 党建武（天津大学）|||| 王晶（北理工）||经典的语音信号处理课程教材|| 李军锋（中科院声学所）||语音信号处理领域|| 机器学习领域经典大作|| perspective||卢鲤（腾讯）|||| 朱璇（三星北京研究院）||模式识别这本书对于特征空间的表述非常清晰，深入浅出，很适合初学者。我推荐一个数学基础的，|| Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 俞凯（上海交大）||鉴别性训练，博士论文|| 李宏言（阿里巴巴）||国内早期lvcsr的力作|| http://videolectures.net/deeplearning2015_montreal/https://ccrma.stanford.edu/~jos/filters/http://9zhou.phonetics.org.cn/

Paper	Referee	Area and notes	Link
George E. Dahl, Dong Yu, Li Deng, and Alex Acero, Context-Dependent
Lawrence R. Rabiner, A tutorial on hidden Markov models and selected
End-to-End Text-Dependent Speaker Verification Georg Heigold, Ignacio Moreno,
Rapid Speaker Adaptation in Eigenvoice Space	苏腾荣（华米）
G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling
Speech recognition with weighted finite-state transducers	苏腾荣（华米）
Speech Recognition Algorithms Using Weighted Finite-State Transducers Takaaki
Daniel Povey.Discriminative Training for Large Vocabulary Speech Recognition.
Has¸im Sak, Andrew Senior, Kanishka Rao, Franc¸oise Beaufays, Fast and
Alex Graves, Supervised Sequence Labeling with Recurrent Neural Networks. Phd
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech
Lattice-based optimization of sequence classification criteria for
MJF Gales:Maximum likelihood linear transformations for HMM-based speech
Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of
Tandem connectionist feature extraction for conventional HMM
A novel scheme for speaker recognition using a phonetically-aware deep neural
Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM
Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, Speaker
Najim Dehak, Patrick Kenny, R′eda Dehak, Pierre Dumouchel, and Pierre Ouellet,
Analysis of I-vector Length Normalization in Speaker Recognition Systems
Within-Class Covariance Normalization for SVM-based Speaker Recognition Andrew
Silke M Witt, Steve J Young, Phone-level pronunciation scoring and assessment
S. M. Witt.Use of Speech Recognition in Computer-assisted Language learning
Andrew J. Hunt, Alan W. Black, Unit selection in a concatenative speech
Zen H, Tokuda K, Black A W. Statistical parametric speech synthesis[J]. Speech
Tokuda K, Nankaku Y, Toda T, et al. Speech synthesis based on hidden Markov
Zee, H., Senior, A., Schuster. M. 2013, Statistical parametric speech sythesis
parameter generation algorithms for HMM-based speech synthesis, Proc. of
statistical parametric speech synthesis，Heiga Zen	杨辰雨（新加坡I2R）
ZH Ling:Deep Learning for Acoustic Modeling in Parametric Speech
Xu Yi. Separation of functional components of tone and intonation from
automatic segmentation of speech into sentences and topics. Speech
ToBI: A standard for labeling English prosody	杨辰雨（新加坡I2R）
chinese prosody and prosodic labeling of spontaneous speech
Shrikanth S. Narayanan and Panayiotis Georgiou, Behavioral Signal Processing:
Levelt. W, Roelofs. A, 1999, A theory of lexical access in speech production.
A Highly Robust Audio Fingerprinting System，Pilips 的Jaap Haitsma
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.
Dzmitry Bahdanau, KyungHyun Cho, Yoshua Bengio, Neural Machine Translation By


《Spoken Language Processing: A Guide to Theory, Algorithm, and System
自然语言处理综论，daniel jurafsky	汪淼淼（阿里巴巴）
Speech enhancement theory and practice, Philipos C. Loizou,
Statistical methods for speech recognition, Jenilek，	金琴（中国人民大学）
Hidden Markov Models for Speech Recognition (Edinburgh University Press 1990)
Machine Learning Paradigms for Speech Recognition	卢鲤（腾讯）
Text-to-speech synthesis, Paul Taylor, University of Cambridge
A course in phonetics, Ladefoged	冯卉（天津大学）	群内多人推荐
A Course in Phonetics (7th Ed.). P. Ladeforged & K. Johnson (2015). Cengage
Acoustics and Auditory Phonetics (3rd Ed.).K. Johnson (2012).
Articulatory Phonetics. B. Gick, I. Wilson, & D. Derrick (2013).
实验语音学概要，实验语音学概要修订版	熊子瑜（语言所），时秀娟（天津师大）
实验语音学基础教程，孔江平	时秀娟（天津师大）
Phonetics，Reetz & Jongman	孙锐欣（华东师大）
《实验语音学概要》吴宗济	王磊（音乐雷达）等	语音合成--音韵学
自然语言处理综论，Daniel Jurafsky
Duda的Pattern Classification 第二版，有中文版	谢凌云（中国传媒大学）
《现代汉语音典》蔡莲红、孔江平	王愈（捷通华声）
《汉语语调实验研究》2012年，作者林茂灿	李爱军（社科院语言所）
在英语语调理论AM基础上对汉语语调的研究
Sun-Ah Jun写的prosodic
Kenneth N. Stevens的Acoustic Phonetics	解炎陆（北京语言大学）
"Ladefoged《世界语音》
Theory and Applications of Digital Speech Processing, Lawrence Rabiner，
T. F. Quatieri, Discrete-time speech signal processing（英文版）
《信号与系统》奥本海《Signals and Systems》Alan V. Oppenheim	陈谐(剑桥）
Microphone Arrays: Signal Processing Techniques and Applications (Digital
Signal Processing) by Michael Brandstein, Darren Ward, Springer, 2001.
Pattern recognition and meachine learning	王东（清华）
Machine learning a probabilistic perspective，machine learning algorithmic
Introduction to statistical pattern recognition. Keinosuke Fukunaga
An introduction for support vector machine	朱璇（三星北京研究院）	svm
步尚全《基础泛函分析》	邓侃(思昂教育）	泛函
<<测度论与概率论基础>>，北京大学出版社	明怀平（新加坡I2R）
Daniel Povey, "Discriminative Training for Large Vocabulary Speech
语境相关的声学模型和搜索策略的研究，高升，中国科学院博士论文，2001

Tools
HTK book
Kaldi
Praat
Theano
CNTK
RNNLIB
Eesen		CTC toolkit	https://github.com/yajiemiao/eesen

Video & online course
Deep Learning Summer School, Montreal 2015
INTRODUCTION TO DIGITAL FILTERS	王愈（捷通华声）
一套在线的信号处理教程,深入浅出地讲解了信号分析处理的基础知识，并结合Matlab常用的信号系统库函数——如freqz——推导讲解简明透彻。
九州语言网	李爱军（社科院语言所）
对汉语方言语法、语音感兴趣的，可以访问熊子瑜负责的语言所在建九州语言网

Reading list from NCMMSC Speech group

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具