“2016”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
SPEAKER RECOGNITION
DNN architecture
 
(3位用户的18个中间修订版本未显示)
第1行: 第1行:
 +
==DNN architecture==
  
 +
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
 +
* [[媒体文件:OUTRAGEOUSLYLARGENEURALNETWORKSTHESPARSELY-GATEDMIXTURE-OF-EXPERTSLAYER.pdf|ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER]]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft]
 +
* [https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition]
 +
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition]
 +
* [http://t.cn/RfZHxko MICRO 2016 ]
 +
* [[媒体文件:Cambricon-X.pdf| Cambricon-X: An Accelerator for Sparse Neural Networks]]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/26/REVISE_SATURATED_ACTIVATION_FUNCTIONS.pdf revise saturated activation functions]
  
 +
==Visualization==
  
=SPEECH PROCESS=
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Understanding_intermediate_layers_using_linear_classifier_probes.pdf Understanding_intermediate_layers_using_linear_classifier_probes]
  
 +
==Speaker recognition==
  
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
 +
* [http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
  
==Speech Perception, Production And Acquisition==
 
  
 
+
==Review==
 
+
[[1.1 Models of speech production]]
+
 
+
[[1.2 Physiology and neurophysiology of speech production]]
+
 
+
[[1.3 Neural basis of speech production]]
+
 
+
[[1.4 Coarticulation]]
+
 
+
[[1.5 Models of speech perception]]
+
 
+
[[1.6 Physiology and neurophysiology of speech perception]]
+
 
+
[[1.7 Neural basis of speech perception]]
+
 
+
[[1.8 Acoustic and articulatory cues in speech perception]]
+
 
+
[[1.9 Interaction speech production-speech perception]]
+
 
+
[[1.10 Multimodal speech perception]]
+
 
+
[[1.11 Cognition and brain studies on speech]]
+
 
+
[[1.12 Multilingual studies]]
+
 
+
[[1.13 L1 acquisition and bilingual acquisition]]
+
 
+
[[1.14 L2 acquisition by children and adults]]
+
 
+
[[1.15 Speech and hearing disorders]]
+
 
+
[[1.16 Singing voice: production and perception]]
+
 
+
[[1.17 Speech and other biosignals]]
+
 
+
[[1.18 Special Session: Intelligibility under the microscope]]
+
 
+
 
+
 
+
==Phonetics, Phonology, And Prosody==
+
 
+
 
+
 
+
[[2.1 Phonetics and phonology]]
+
 
+
[[2.2 Language descriptions]]
+
 
+
[[2.3 Linguistic systems]]
+
 
+
[[2.4 Discourse and dialog structures]]
+
 
+
[[2.5 Acoustic phonetics]]
+
 
+
[[2.6 Phonation, voice quality]]
+
 
+
[[2.7 Articulatory and acoustic features of prosody]]
+
 
+
[[2.8 Perception of prosody]]
+
 
+
[[2.9 Phonological processes and models]]
+
 
+
[[2.10 Laboratory phonology]]
+
 
+
[[2.11 Phonetic universals]]
+
 
+
[[2.12 Sound changes]]
+
 
+
[[2.13 Sociophonetics]]
+
 
+
[[2.14 Phonetics of L1-L2 interaction]]
+
 
+
 
+
 
+
==Analysis Of Paralinguistics In Speech And Language==
+
 
+
 
+
 
+
[[3.1 Analysis of speaker states]]
+
 
+
[[3.2 Analysis of speaker traits]]
+
 
+
[[3.3 Automatic analysis of speaker states and traits]]
+
 
+
[[3.4 Pathological speech and language]]
+
 
+
[[3.5 Non-verbal communication]]
+
 
+
[[3.6 Social and vocal signals]]
+
 
+
[[3.7 Sentiment analysis and opinion mining]]
+
 
+
[[3.8 Paralinguistics in singing]]
+
 
+
[[3.9 Perception of paralinguistic phenomena]]
+
 
+
[[3.10 Phonetic and linguistic aspects of paralinguistics]]
+
 
+
[[3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity]]
+
 
+
[[3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders]]
+
 
+
 
+
 
+
==Speaker And Language Identification==
+
 
+
 
+
 
+
[[4.1 Language identification and verification]]
+
 
+
[[4.2 Dialect and accent recognition]]
+
 
+
[[4.3 Speaker verification and identification]]
+
 
+
[[4.4 Features for speaker and language recognition]]
+
 
+
[[4.5 Robustness to variable and degraded channels]]
+
 
+
[[4.6 Speaker confidence estimation]]
+
 
+
[[4.7 Speaker diarization]]
+
 
+
[[4.8 Higher-level knowledge in speaker and language recognition]]
+
 
+
[[4.9 Evaluation of speaker and language identification systems]]
+
 
+
[[4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]]
+
 
+
[[4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]]
+
 
+
 
+
 
+
==Analysis Of Speech And Audio Signals==
+
 
+
 
+
 
+
[[5.1 Speech acoustics]]
+
 
+
[[5.2 Speech analysis and representation]]
+
 
+
[[5.3 Audio signal analysis and representation]]
+
 
+
[[5.4 Speech and audio segmentation and classification]]
+
 
+
[[5.5 Voice activity detection]]
+
 
+
[[5.6 Pitch and harmonic analysis]]
+
 
+
[[5.7 Source separation and computational auditory scene analysis]]
+
 
+
[[5.8 Speaker spatial localization]]
+
 
+
[[5.9 Voice separation]]
+
 
+
[[5.10 Music signal processing and understanding]]
+
 
+
[[5.11 Singing analysis]]
+
 
+
[[5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations ]]
+
 
+
 
+
 
+
==Speech Coding And Enhancement==
+
 
+
 
+
 
+
[[6.1 Speech coding and transmission]]
+
 
+
[[6.2 Low-bit-rate speech coding]]
+
 
+
[[6.3 Perceptual audio coding of speech signals]]
+
 
+
[[6.4 Noise reduction for speech signals]]
+
 
+
[[6.5 Speech enhancement: single-channel]]
+
 
+
[[6.6 Speech enhancement: multi-channel]]
+
 
+
[[6.7 Speech intelligibility]]
+
 
+
[[6.8 Active noise control]]
+
 
+
[[6.9 Speech enhancement in hearing aids]]
+
 
+
[[6.10 Adaptive beamforming for speech enhancement]]
+
 
+
[[6.11 Dereverberation for speech signals]]
+
 
+
[[6.12 Echo cancelation for speech signals]]
+
 
+
[[6.13 Evaluation of speech transmission, coding and enhancement]]
+
 
+
 
+
 
+
==Speech Synthesis And Spoken Language Generation==
+
 
+
 
+
 
+
[[7.1 Grapheme-to-phoneme conversion for synthesis]]
+
 
+
[[7.2 Text processing for speech synthesis]]
+
 
+
[[7.3 Signal processing/statistical models for synthesis]]
+
 
+
[[7.4 Speech synthesis paradigms and methods]]
+
 
+
[[7.5 Articulatory speech synthesis]]
+
 
+
[[7.6 Segment-level and/or concatenative synthesis]]
+
 
+
[[7.7 Unit selection speech synthesis]]
+
 
+
[[7.8 Statistical parametric speech synthesis]]
+
 
+
[[7.9 Prosody modeling and generation]]
+
 
+
[[7.10 Expression, emotion and personality generation]]
+
 
+
[[7.11 Synthesis of singing voices]]
+
 
+
[[7.12 Voice modification, conversion and morphing]]
+
 
+
[[7.13 Concept-to-speech conversion]]
+
 
+
[[7.14 Cross-lingual and multilingual aspects in speech synthesis]]
+
 
+
[[7.15 Avatars and talking faces]]
+
 
+
[[7.16 Tools and data for speech synthesis]]
+
 
+
[[7.17 Evaluation of speech synthesis]]
+
 
+
[[7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap]]
+
 
+
[[7.19 Special Session: Voice Conversion Challenge 2016]]
+
 
+
 
+
 
+
==Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation==
+
 
+
 
+
 
+
[[8.1 Feature extraction and low-level feature modeling for ASR]]
+
 
+
[[8.2 Prosodic features and models]]
+
 
+
[[8.3 Robustness against noise, reverberation]]
+
 
+
[[8.4 Far field and microphone array speech recognition]]
+
 
+
[[8.5 Speaker normalization (e.g., VTLN)]]
+
 
+
[[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
+
 
+
[[8.7 Discriminative acoustic training methods for ASR]]
+
 
+
[[8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)]]
+
 
+
[[8.9 Speaker adaptation, speaker adapted training methods]]
+
 
+
[[8.10 Pronunciation variants and modeling for speech recognition]]
+
 
+
[[8.11 Acoustic confidence measures]]
+
 
+
[[8.13 Cross-lingual and multilingual aspects, non-native accents]]
+
 
+
[[8.14 Acoustic modeling for conversational speech (dialog, interaction)]]
+
 
+
[[8.15 Evaluation of speech recognition]]
+
 
+
 
+
 
+
==Speech Recognition - Architecture, Search, And Linguistic Components==
+
 
+
 
+
 
+
[[9.1 Lexical modeling and access: units and models]]
+
 
+
[[9.2 Automatic lexicon learning]]
+
 
+
[[9.3 Supervised/unsupervised morphological models]]
+
 
+
[[9.4 Prosodic features and models for language modeling]]
+
 
+
[[9.5 Discriminative training methods for language modeling]]
+
 
+
[[9.6 Language model adaptation (domain, diachronic adaptation)]]
+
 
+
[[9.7 Language modeling for conversational speech (dialog, interaction)]]
+
 
+
[[9.8 Neural networks for language modeling]]
+
 
+
[[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
+
 
+
[[9.10 New computational strategies, data-structures for ASR]]
+
 
+
[[9.11 Computational resource constrained speech recognition]]
+
 
+
[[9.12 Confidence measures]]
+
 
+
[[9.13 Cross-lingual and multilingual components for speech recognition]]
+
 
+
[[9.14 Structured classification approaches]]
+
 
+
 
+
 
+
==Speech Recognition - Technologies And Systems For New Applications==
+
 
+
 
+
 
+
[[10.1 Multimodal systems]]
+
 
+
[[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
+
 
+
[[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
+
 
+
[[10.4 Speech science in end-user applications]]
+
 
+
[[10.5 Rich transcription]]
+
 
+
[[10.6 Innovative products and services based on speech technologies]]
+
 
+
[[10.7 Sparse, template-based representations]]
+
 
+
[[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
+
 
+
[[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
+
 
+
[[10.10 Special Session: Realism in robust speech processing ]]
+
 
+
[[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
+
 
+
[[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
+
 
+
 
+
 
+
==Spoken Language Processing - Dialog, Summarization, Understanding==
+
 
+
 
+
 
+
[[11.1 Spoken dialog systems]]
+
 
+
[[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
+
 
+
[[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
+
 
+
[[11.4 Interactive systems for speech/language training, therapy, communication aids]]
+
 
+
[[11.5 Stochastic modeling for dialog]]
+
 
+
[[11.6 Question-answering from speech]]
+
 
+
[[11.7 Spoken document summarization]]
+
 
+
[[11.8 Systems for spoken language understanding]]
+
 
+
[[11.9 Topic spotting and classification]]
+
 
+
[[11.10 Entity extraction from speech]]
+
 
+
[[11.11 Semantic analysis and classification]]
+
 
+
[[11.12 Conversation and interaction]]
+
 
+
[[11.13 Evaluation of speech and multimodal dialog systems]]
+
 
+
[[11.14 Evaluation of summarization and understanding]]
+
 
+
 
+
 
+
==Spoken Language Processing: Translation, Information Retrieval, Resources==
+
 
+
 
+
 
+
[[12.1 Spoken machine translation]]
+
 
+
[[12.2 Speech-to-speech translation systems]]
+
 
+
[[12.3 Transliteration]]
+
 
+
[[12.4 Voice search]]
+
 
+
[[12.5 Spoken term detection]]
+
 
+
[[12.6 Audio indexing]]
+
 
+
[[12.7 Spoken document retrieval]]
+
 
+
[[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
+
 
+
[[12.9 Speech and multimodal resources and annotation]]
+
 
+
[[12.10 Metadata descriptions of speech, audio and text resources]]
+
 
+
[[12.11 Metadata for semantic or content markup]]
+
 
+
[[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
+
 
+
[[12.13 Methodologies and tools for language resource construction and annotation]]
+
 
+
[[12.14 Automatic segmentation and labeling of resources]]
+
 
+
[[12.15 Multilingual resources]]
+
 
+
[[12.16 Evaluation and quality insurance of language resources]]
+
 
+
[[12.17 Evaluation of translation and information retrieval systems]]
+
 
+
[[12.18 Special Session: Open Data for Under-Resourced Languages]]
+
 
+
 
+
 
+
==Speech And Spoken-Language Based Multimodal Processing And Systems==
+
 
+
 
+
 
+
[[13.1 Multimodal Speech Recognition]]
+
 
+
[[13.2 Multimodal LVCSR Systems]]
+
 
+
[[13.3 Multimodal Speech Analysis]]
+
 
+
[[13.4 Multimodal Synthesis]]
+
 
+
[[13.5 Multimodal Language Analysis ]]
+
 
+
[[13.6 Multimodal and multimedia language trait recognition ]]
+
 
+
[[13.7 Multimodal paralinguistics ]]
+
 
+
[[13.8 Multimodal interactions, interfaces]]
+
 
+
[[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
+
 
+
 
+
 
+
 
+
=MACHINE LEARNING=
+
 
+
 
+
==Learning Methods==
+
 
+
[[14.1 Supervised Learning]]
+
 
+
[[14.2 Unsupervised Learning]]
+
 
+
[[14.3 Reinforcement Learning]]
+
 
+
[[14.4 Learning Theory]]
+
 
+
 
+
[[14.5 Generative Models]]
+
 
+
[[14.6 Discriminative Models]]
+
 
+
 
+
[[14.7 Probabilistic Models]]
+
 
+
[[14.8 Bayesian Methods]]
+
 
+
[[14.9 Gaussian Processes]]
+
 
+
 
+
 
+
==Deep Learning==
+
 
+
[[15.1 Network Architecture]]
+
 
+
[[15.2 Autoencoder]]
+
 
+
[[15.3 Representation Learning]]
+
 
+
[[15.4 Optimization]]
+
 
+
[[15.5 Regularization]]
+
 
+
[[15.6 Sparsity]]
+
 
+
[[15.7 Transfer Learning]]
+
 
+
[[15.8 Sequence Learning]]
+
 
+
[[15.9 Online Learning]]
+
 
+
[[15.10 Tricks]]
+
 
+
 
+
=SPEAKER RECOGNITION=
+
 
+
[[16.1 Deep learning]]
+
 
+
[[16.2 Short utterances]]
+
 
+
[[16.3 Challenge]]
+
 
+
=Review=
+
  
 
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
 
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
 
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
 
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
 +
*[[媒体文件:Interspeech16 review.pdf|Zhiyuan Tang 20161122 - INTERSPEECH 2016 summary ]]

2016年12月1日 (四) 08:17的最后版本

DNN architecture

Visualization

Speaker recognition


Review