“2016”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第1行: 第1行:
DNN architecture:
+
==DNN architecture==
  
 
[http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
 
[http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
第12行: 第12行:
  
  
Visualization
+
==Visualization==
  
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
第19行: 第19行:
  
  
Speaker recognition:
+
==Speaker recognition==
  
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
  
 +
[http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
  
[[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
 
 
 
 
 
 
[[9.1 Lexical modeling and access: units and models]]
 
 
[[9.2 Automatic lexicon learning]]
 
 
[[9.3 Supervised/unsupervised morphological models]]
 
 
[[9.4 Prosodic features and models for language modeling]]
 
 
[[9.5 Discriminative training methods for language modeling]]
 
 
[[9.6 Language model adaptation (domain, diachronic adaptation)]]
 
 
[[9.7 Language modeling for conversational speech (dialog, interaction)]]
 
 
[[9.8 Neural networks for language modeling]]
 
 
[[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
 
 
[[9.10 New computational strategies, data-structures for ASR]]
 
 
[[9.11 Computational resource constrained speech recognition]]
 
 
[[9.12 Confidence measures]]
 
 
[[9.13 Cross-lingual and multilingual components for speech recognition]]
 
 
[[9.14 Structured classification approaches]]
 
 
 
 
==Speech Recognition - Technologies And Systems For New Applications==
 
 
 
 
[[10.1 Multimodal systems]]
 
 
[[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
 
 
[[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
 
 
[[10.4 Speech science in end-user applications]]
 
 
[[10.5 Rich transcription]]
 
 
[[10.6 Innovative products and services based on speech technologies]]
 
 
[[10.7 Sparse, template-based representations]]
 
 
[[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
 
 
[[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
 
 
[[10.10 Special Session: Realism in robust speech processing ]]
 
 
[[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
 
 
[[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
 
 
 
 
==Spoken Language Processing - Dialog, Summarization, Understanding==
 
 
 
 
[[11.1 Spoken dialog systems]]
 
 
[[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
 
 
[[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
 
 
[[11.4 Interactive systems for speech/language training, therapy, communication aids]]
 
 
[[11.5 Stochastic modeling for dialog]]
 
 
[[11.6 Question-answering from speech]]
 
 
[[11.7 Spoken document summarization]]
 
 
[[11.8 Systems for spoken language understanding]]
 
 
[[11.9 Topic spotting and classification]]
 
 
[[11.10 Entity extraction from speech]]
 
 
[[11.11 Semantic analysis and classification]]
 
 
[[11.12 Conversation and interaction]]
 
 
[[11.13 Evaluation of speech and multimodal dialog systems]]
 
 
[[11.14 Evaluation of summarization and understanding]]
 
 
 
 
==Spoken Language Processing: Translation, Information Retrieval, Resources==
 
 
 
 
[[12.1 Spoken machine translation]]
 
 
[[12.2 Speech-to-speech translation systems]]
 
 
[[12.3 Transliteration]]
 
 
[[12.4 Voice search]]
 
 
[[12.5 Spoken term detection]]
 
 
[[12.6 Audio indexing]]
 
 
[[12.7 Spoken document retrieval]]
 
 
[[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
 
 
[[12.9 Speech and multimodal resources and annotation]]
 
 
[[12.10 Metadata descriptions of speech, audio and text resources]]
 
 
[[12.11 Metadata for semantic or content markup]]
 
 
[[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
 
 
[[12.13 Methodologies and tools for language resource construction and annotation]]
 
 
[[12.14 Automatic segmentation and labeling of resources]]
 
 
[[12.15 Multilingual resources]]
 
 
[[12.16 Evaluation and quality insurance of language resources]]
 
 
[[12.17 Evaluation of translation and information retrieval systems]]
 
 
[[12.18 Special Session: Open Data for Under-Resourced Languages]]
 
 
 
 
==Speech And Spoken-Language Based Multimodal Processing And Systems==
 
 
 
 
[[13.1 Multimodal Speech Recognition]]
 
 
[[13.2 Multimodal LVCSR Systems]]
 
 
[[13.3 Multimodal Speech Analysis]]
 
 
[[13.4 Multimodal Synthesis]]
 
 
[[13.5 Multimodal Language Analysis ]]
 
 
[[13.6 Multimodal and multimedia language trait recognition ]]
 
 
[[13.7 Multimodal paralinguistics ]]
 
 
[[13.8 Multimodal interactions, interfaces]]
 
 
[[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
 
 
 
 
 
=MACHINE LEARNING=
 
 
 
==Learning Methods==
 
 
[[14.1 Supervised Learning]]
 
 
[[14.2 Unsupervised Learning]]
 
 
[[14.3 Reinforcement Learning]]
 
 
[[14.4 Learning Theory]]
 
 
 
[[14.5 Generative Models]]
 
 
[[14.6 Discriminative Models]]
 
 
 
[[14.7 Probabilistic Models]]
 
 
[[14.8 Bayesian Methods]]
 
 
[[14.9 Gaussian Processes]]
 
 
 
 
==Deep Learning==
 
 
[[15.1 Network Architecture]]
 
 
[[15.2 Autoencoder]]
 
 
[[15.3 Representation Learning]]
 
 
[[15.4 Optimization]]
 
 
[[15.5 Regularization]]
 
 
[[15.6 Sparsity]]
 
 
[[15.7 Transfer Learning]]
 
 
[[15.8 Sequence Learning]]
 
 
[[15.9 Online Learning]]
 
 
[[15.10 Tricks]]
 
 
[[15.12 visualization]]
 
 
=SPEAKER RECOGNITION=
 
 
[[16.1 Deep learning]]
 
 
[[16.2 Short utterances]]
 
 
[[16.3 Challenge]]
 
  
 
=Review=
 
=Review=

2016年11月9日 (三) 01:04的版本

DNN architecture

Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

lightRNN from microsoft

Kaiming He et al. Deep Residual Learning for Image Recognition

Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition


Visualization

Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models


Speaker recognition

INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances

INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge


Review