
来自cslt Wiki
跳转至: 导航搜索
DNN architecture
第1行: 第1行:
==DNN architecture==
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft]
* [https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition]
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition]
* [http://t.cn/RfZHxko MICRO 2016 ]
* [[媒体文件:Cambricon-X.pdf| Cambricon-X: An Accelerator for Sparse Neural Networks]]
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/26/REVISE_SATURATED_ACTIVATION_FUNCTIONS.pdf revise saturated activation functions]
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models]
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Understanding_intermediate_layers_using_linear_classifier_probes.pdf Understanding_intermediate_layers_using_linear_classifier_probes]
==Speaker recognition==
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
* [ INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
==Speech Perception, Production And Acquisition==
[[1.1 Models of speech production]]
[[1.2 Physiology and neurophysiology of speech production]]
[[1.3 Neural basis of speech production]]
[[1.4 Coarticulation]]
[[1.5 Models of speech perception]]
[[1.6 Physiology and neurophysiology of speech perception]]
[[1.7 Neural basis of speech perception]]
[[1.8 Acoustic and articulatory cues in speech perception]]
[[1.9 Interaction speech production-speech perception]]
[[1.10 Multimodal speech perception]]
[[1.11 Cognition and brain studies on speech]]
[[1.12 Multilingual studies]]
[[1.13 L1 acquisition and bilingual acquisition]]
[[1.14 L2 acquisition by children and adults]]
[[1.15 Speech and hearing disorders]]
[[1.16 Singing voice: production and perception]]
[[1.17 Speech and other biosignals]]
[[1.18 Special Session: Intelligibility under the microscope]]
==Phonetics, Phonology, And Prosody==
[[2.1 Phonetics and phonology]]
[[2.2 Language descriptions]]
[[2.3 Linguistic systems]]
[[2.4 Discourse and dialog structures]]
[[2.5 Acoustic phonetics]]
[[2.6 Phonation, voice quality]]
[[2.7 Articulatory and acoustic features of prosody]]
[[2.8 Perception of prosody]]
[[2.9 Phonological processes and models]]
[[2.10 Laboratory phonology]]
[[2.11 Phonetic universals]]
[[2.12 Sound changes]]
[[2.13 Sociophonetics]]
[[2.14 Phonetics of L1-L2 interaction]]
==Analysis Of Paralinguistics In Speech And Language==
[[3.1 Analysis of speaker states]]
[[3.2 Analysis of speaker traits]]
[[3.3 Automatic analysis of speaker states and traits]]
[[3.4 Pathological speech and language]]
[[3.5 Non-verbal communication]]
[[3.6 Social and vocal signals]]
[[3.7 Sentiment analysis and opinion mining]]
[[3.8 Paralinguistics in singing]]
[[3.9 Perception of paralinguistic phenomena]]
[[3.10 Phonetic and linguistic aspects of paralinguistics]]
[[3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity]]
[[3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders]]
==Speaker And Language Identification==
[[4.1 Language identification and verification]]
[[4.2 Dialect and accent recognition]]
[[4.3 Speaker verification and identification]]
[[4.4 Features for speaker and language recognition]]
[[4.5 Robustness to variable and degraded channels]]
[[4.6 Speaker confidence estimation]]
[[4.7 Speaker diarization]]
[[4.8 Higher-level knowledge in speaker and language recognition]]
[[4.9 Evaluation of speaker and language identification systems]]
[[4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]]
[[4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]]
==Analysis Of Speech And Audio Signals==
[[5.1 Speech acoustics]]
[[5.2 Speech analysis and representation]]
[[5.3 Audio signal analysis and representation]]
[[5.4 Speech and audio segmentation and classification]]
[[5.5 Voice activity detection]]
[[5.6 Pitch and harmonic analysis]]
[[5.7 Source separation and computational auditory scene analysis]]
[[5.8 Speaker spatial localization]]
[[5.9 Voice separation]]
[[5.10 Music signal processing and understanding]]
[[5.11 Singing analysis]]
[[5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations ]]
==Speech Coding And Enhancement==
[[6.1 Speech coding and transmission]]
[[6.2 Low-bit-rate speech coding]]
[[6.3 Perceptual audio coding of speech signals]]
[[6.4 Noise reduction for speech signals]]
[[6.5 Speech enhancement: single-channel]]
[[6.6 Speech enhancement: multi-channel]]
[[6.7 Speech intelligibility]]
[[6.8 Active noise control]]
[[6.9 Speech enhancement in hearing aids]]
[[6.10 Adaptive beamforming for speech enhancement]]
[[6.11 Dereverberation for speech signals]]
[[6.12 Echo cancelation for speech signals]]
[[6.13 Evaluation of speech transmission, coding and enhancement]]
==Speech Synthesis And Spoken Language Generation==
[[7.1 Grapheme-to-phoneme conversion for synthesis]]
[[7.2 Text processing for speech synthesis]]
[[7.3 Signal processing/statistical models for synthesis]]
[[7.4 Speech synthesis paradigms and methods]]
[[7.5 Articulatory speech synthesis]]
[[7.6 Segment-level and/or concatenative synthesis]]
[[7.7 Unit selection speech synthesis]]
[[7.8 Statistical parametric speech synthesis]]
[[7.9 Prosody modeling and generation]]
[[7.10 Expression, emotion and personality generation]]
[[7.11 Synthesis of singing voices]]
[[7.12 Voice modification, conversion and morphing]]
[[7.13 Concept-to-speech conversion]]
[[7.14 Cross-lingual and multilingual aspects in speech synthesis]]
[[7.15 Avatars and talking faces]]
[[7.16 Tools and data for speech synthesis]]
[[7.17 Evaluation of speech synthesis]]
[[7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap]]
[[7.19 Special Session: Voice Conversion Challenge 2016]]
==Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation==
[[8.1 Feature extraction and low-level feature modeling for ASR]]
[[8.2 Prosodic features and models]]
[[8.3 Robustness against noise, reverberation]]
[[8.4 Far field and microphone array speech recognition]]
[[8.5 Speaker normalization (e.g., VTLN)]]
[[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
[[8.7 Discriminative acoustic training methods for ASR]]
[[8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)]]
[[8.9 Speaker adaptation, speaker adapted training methods]]
[[8.10 Pronunciation variants and modeling for speech recognition]]
[[8.11 Acoustic confidence measures]]
[[8.13 Cross-lingual and multilingual aspects, non-native accents]]
[[8.14 Acoustic modeling for conversational speech (dialog, interaction)]]
[[8.15 Evaluation of speech recognition]]
==Speech Recognition - Architecture, Search, And Linguistic Components==
[[9.1 Lexical modeling and access: units and models]]
[[9.2 Automatic lexicon learning]]
[[9.3 Supervised/unsupervised morphological models]]
[[9.4 Prosodic features and models for language modeling]]
[[9.5 Discriminative training methods for language modeling]]
[[9.6 Language model adaptation (domain, diachronic adaptation)]]
[[9.7 Language modeling for conversational speech (dialog, interaction)]]
[[9.8 Neural networks for language modeling]]
[[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
[[9.10 New computational strategies, data-structures for ASR]]
[[9.11 Computational resource constrained speech recognition]]
[[9.12 Confidence measures]]
[[9.13 Cross-lingual and multilingual components for speech recognition]]
[[9.14 Structured classification approaches]]
==Speech Recognition - Technologies And Systems For New Applications==
[[10.1 Multimodal systems]]
[[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
[[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
[[10.4 Speech science in end-user applications]]
[[10.5 Rich transcription]]
[[10.6 Innovative products and services based on speech technologies]]
[[10.7 Sparse, template-based representations]]
[[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
[[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
[[10.10 Special Session: Realism in robust speech processing ]]
[[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
[[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
==Spoken Language Processing - Dialog, Summarization, Understanding==
[[11.1 Spoken dialog systems]]
[[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
[[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
[[11.4 Interactive systems for speech/language training, therapy, communication aids]]
[[11.5 Stochastic modeling for dialog]]
[[11.6 Question-answering from speech]]
[[11.7 Spoken document summarization]]
[[11.8 Systems for spoken language understanding]]
[[11.9 Topic spotting and classification]]
[[11.10 Entity extraction from speech]]
[[11.11 Semantic analysis and classification]]
[[11.12 Conversation and interaction]]
[[11.13 Evaluation of speech and multimodal dialog systems]]
[[11.14 Evaluation of summarization and understanding]]
==Spoken Language Processing: Translation, Information Retrieval, Resources==
[[12.1 Spoken machine translation]]
[[12.2 Speech-to-speech translation systems]]
[[12.3 Transliteration]]
[[12.4 Voice search]]
[[12.5 Spoken term detection]]
[[12.6 Audio indexing]]
[[12.7 Spoken document retrieval]]
[[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
[[12.9 Speech and multimodal resources and annotation]]
[[12.10 Metadata descriptions of speech, audio and text resources]]
[[12.11 Metadata for semantic or content markup]]
[[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
[[12.13 Methodologies and tools for language resource construction and annotation]]
[[12.14 Automatic segmentation and labeling of resources]]
[[12.15 Multilingual resources]]
[[12.16 Evaluation and quality insurance of language resources]]
[[12.17 Evaluation of translation and information retrieval systems]]
[[12.18 Special Session: Open Data for Under-Resourced Languages]]
==Speech And Spoken-Language Based Multimodal Processing And Systems==
[[13.1 Multimodal Speech Recognition]]
[[13.2 Multimodal LVCSR Systems]]
[[13.3 Multimodal Speech Analysis]]
[[13.4 Multimodal Synthesis]]
[[13.5 Multimodal Language Analysis ]]
[[13.6 Multimodal and multimedia language trait recognition ]]
[[13.7 Multimodal paralinguistics ]]
[[13.8 Multimodal interactions, interfaces]]
[[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
==Learning Methods==
[[14.1 Supervised Learning]]
[[14.2 Unsupervised Learning]]
[[14.3 Reinforcement Learning]]
[[14.4 Learning Theory]]
[[14.5 Generative Models]]
[[14.6 Discriminative Models]]
[[14.7 Probabilistic Models]]
[[14.8 Bayesian Methods]]
[[14.9 Gaussian Processes]]
==Deep Learning==
[[15.1 Network Architecture]]
[[15.2 Autoencoder]]
[[15.3 Representation Learning]]
[[15.4 Optimization]]
[[15.5 Regularization]]
[[15.6 Sparsity]]
[[15.7 Transfer Learning]]
[[15.8 Sequence Learning]]
[[15.9 Online Learning]]
[[15.10 Tricks]]
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
*[[媒体文件:Interspeech16 review.pdf|Zhiyuan Tang 20161122 - INTERSPEECH 2016 summary ]]

2016年12月1日 (四) 08:17的最后版本

DNN architecture


Speaker recognition
