“2016”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
DNN architecture
 
(3位用户的8个中间修订版本未显示)
第1行: 第1行:
DNN architecture:
+
==DNN architecture==
  
[http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
+
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
 +
* [[媒体文件:OUTRAGEOUSLYLARGENEURALNETWORKSTHESPARSELY-GATEDMIXTURE-OF-EXPERTSLAYER.pdf|ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER]]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft]
 +
* [https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition]
 +
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition]
 +
* [http://t.cn/RfZHxko MICRO 2016 ]
 +
* [[媒体文件:Cambricon-X.pdf| Cambricon-X: An Accelerator for Sparse Neural Networks]]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/26/REVISE_SATURATED_ACTIVATION_FUNCTIONS.pdf revise saturated activation functions]
  
[[媒体文件:OUTRAGEOUSLYLARGENEURALNETWORKSTHESPARSELY-GATEDMIXTURE-OF-EXPERTSLAYER.pdf|ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER]]
+
==Visualization==
  
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft]
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models]
 +
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Understanding_intermediate_layers_using_linear_classifier_probes.pdf Understanding_intermediate_layers_using_linear_classifier_probes]
  
[https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition]
+
==Speaker recognition==
  
[http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition]
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
 +
* [http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
  
  
Visualization
+
==Review==
 
+
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
+
 
+
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models]
+
 
+
 
+
Speaker recognition:
+
 
+
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
+
 
+
 
+
[[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
+
 
+
 
+
 
+
 
+
 
+
[[9.1 Lexical modeling and access: units and models]]
+
 
+
[[9.2 Automatic lexicon learning]]
+
 
+
[[9.3 Supervised/unsupervised morphological models]]
+
 
+
[[9.4 Prosodic features and models for language modeling]]
+
 
+
[[9.5 Discriminative training methods for language modeling]]
+
 
+
[[9.6 Language model adaptation (domain, diachronic adaptation)]]
+
 
+
[[9.7 Language modeling for conversational speech (dialog, interaction)]]
+
 
+
[[9.8 Neural networks for language modeling]]
+
 
+
[[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
+
 
+
[[9.10 New computational strategies, data-structures for ASR]]
+
 
+
[[9.11 Computational resource constrained speech recognition]]
+
 
+
[[9.12 Confidence measures]]
+
 
+
[[9.13 Cross-lingual and multilingual components for speech recognition]]
+
 
+
[[9.14 Structured classification approaches]]
+
 
+
 
+
 
+
==Speech Recognition - Technologies And Systems For New Applications==
+
 
+
 
+
 
+
[[10.1 Multimodal systems]]
+
 
+
[[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
+
 
+
[[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
+
 
+
[[10.4 Speech science in end-user applications]]
+
 
+
[[10.5 Rich transcription]]
+
 
+
[[10.6 Innovative products and services based on speech technologies]]
+
 
+
[[10.7 Sparse, template-based representations]]
+
 
+
[[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
+
 
+
[[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
+
 
+
[[10.10 Special Session: Realism in robust speech processing ]]
+
 
+
[[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
+
 
+
[[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
+
 
+
 
+
 
+
==Spoken Language Processing - Dialog, Summarization, Understanding==
+
 
+
 
+
 
+
[[11.1 Spoken dialog systems]]
+
 
+
[[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
+
 
+
[[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
+
 
+
[[11.4 Interactive systems for speech/language training, therapy, communication aids]]
+
 
+
[[11.5 Stochastic modeling for dialog]]
+
 
+
[[11.6 Question-answering from speech]]
+
 
+
[[11.7 Spoken document summarization]]
+
 
+
[[11.8 Systems for spoken language understanding]]
+
 
+
[[11.9 Topic spotting and classification]]
+
 
+
[[11.10 Entity extraction from speech]]
+
 
+
[[11.11 Semantic analysis and classification]]
+
 
+
[[11.12 Conversation and interaction]]
+
 
+
[[11.13 Evaluation of speech and multimodal dialog systems]]
+
 
+
[[11.14 Evaluation of summarization and understanding]]
+
 
+
 
+
 
+
==Spoken Language Processing: Translation, Information Retrieval, Resources==
+
 
+
 
+
 
+
[[12.1 Spoken machine translation]]
+
 
+
[[12.2 Speech-to-speech translation systems]]
+
 
+
[[12.3 Transliteration]]
+
 
+
[[12.4 Voice search]]
+
 
+
[[12.5 Spoken term detection]]
+
 
+
[[12.6 Audio indexing]]
+
 
+
[[12.7 Spoken document retrieval]]
+
 
+
[[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
+
 
+
[[12.9 Speech and multimodal resources and annotation]]
+
 
+
[[12.10 Metadata descriptions of speech, audio and text resources]]
+
 
+
[[12.11 Metadata for semantic or content markup]]
+
 
+
[[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
+
 
+
[[12.13 Methodologies and tools for language resource construction and annotation]]
+
 
+
[[12.14 Automatic segmentation and labeling of resources]]
+
 
+
[[12.15 Multilingual resources]]
+
 
+
[[12.16 Evaluation and quality insurance of language resources]]
+
 
+
[[12.17 Evaluation of translation and information retrieval systems]]
+
 
+
[[12.18 Special Session: Open Data for Under-Resourced Languages]]
+
 
+
 
+
 
+
==Speech And Spoken-Language Based Multimodal Processing And Systems==
+
 
+
 
+
 
+
[[13.1 Multimodal Speech Recognition]]
+
 
+
[[13.2 Multimodal LVCSR Systems]]
+
 
+
[[13.3 Multimodal Speech Analysis]]
+
 
+
[[13.4 Multimodal Synthesis]]
+
 
+
[[13.5 Multimodal Language Analysis ]]
+
 
+
[[13.6 Multimodal and multimedia language trait recognition ]]
+
 
+
[[13.7 Multimodal paralinguistics ]]
+
 
+
[[13.8 Multimodal interactions, interfaces]]
+
 
+
[[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
+
 
+
 
+
 
+
 
+
=MACHINE LEARNING=
+
 
+
 
+
==Learning Methods==
+
 
+
[[14.1 Supervised Learning]]
+
 
+
[[14.2 Unsupervised Learning]]
+
 
+
[[14.3 Reinforcement Learning]]
+
 
+
[[14.4 Learning Theory]]
+
 
+
 
+
[[14.5 Generative Models]]
+
 
+
[[14.6 Discriminative Models]]
+
 
+
 
+
[[14.7 Probabilistic Models]]
+
 
+
[[14.8 Bayesian Methods]]
+
 
+
[[14.9 Gaussian Processes]]
+
 
+
 
+
 
+
==Deep Learning==
+
 
+
[[15.1 Network Architecture]]
+
 
+
[[15.2 Autoencoder]]
+
 
+
[[15.3 Representation Learning]]
+
 
+
[[15.4 Optimization]]
+
 
+
[[15.5 Regularization]]
+
 
+
[[15.6 Sparsity]]
+
 
+
[[15.7 Transfer Learning]]
+
 
+
[[15.8 Sequence Learning]]
+
 
+
[[15.9 Online Learning]]
+
 
+
[[15.10 Tricks]]
+
 
+
[[15.12 visualization]]
+
 
+
=SPEAKER RECOGNITION=
+
 
+
[[16.1 Deep learning]]
+
 
+
[[16.2 Short utterances]]
+
 
+
[[16.3 Challenge]]
+
 
+
=Review=
+
  
 
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
 
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
 
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
 
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
 +
*[[媒体文件:Interspeech16 review.pdf|Zhiyuan Tang 20161122 - INTERSPEECH 2016 summary ]]

2016年12月1日 (四) 08:17的最后版本

DNN architecture

Visualization

Speaker recognition


Review