“2016”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第19行: 第19行:
  
  
 +
Speaker recognition:
 +
 +
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
  
  

2016年11月9日 (三) 01:03的版本

DNN architecture:

Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

lightRNN from microsoft

Kaiming He et al. Deep Residual Learning for Image Recognition

Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition


Visualization

Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models


Speaker recognition:

INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances


8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)



9.1 Lexical modeling and access: units and models

9.2 Automatic lexicon learning

9.3 Supervised/unsupervised morphological models

9.4 Prosodic features and models for language modeling

9.5 Discriminative training methods for language modeling

9.6 Language model adaptation (domain, diachronic adaptation)

9.7 Language modeling for conversational speech (dialog, interaction)

9.8 Neural networks for language modeling

9.9 Search methods, decoding algorithms, lattices, multipass strategies

9.10 New computational strategies, data-structures for ASR

9.11 Computational resource constrained speech recognition

9.12 Confidence measures

9.13 Cross-lingual and multilingual components for speech recognition

9.14 Structured classification approaches


Speech Recognition - Technologies And Systems For New Applications

10.1 Multimodal systems

10.2 Applications in education and learning (incl. CALL, assessment of fluency)

10.3 Applications in medical practice (CIS, voice assessment, etc.)

10.4 Speech science in end-user applications

10.5 Rich transcription

10.6 Innovative products and services based on speech technologies

10.7 Sparse, template-based representations

10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)

10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications

10.10 Special Session: Realism in robust speech processing

10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing

10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education


Spoken Language Processing - Dialog, Summarization, Understanding

11.1 Spoken dialog systems

11.2 Multimodal human-machine interaction (conversat. agents, human-robot)

11.3 Analysis of verbal, co-verbal and nonverbal behavior

11.4 Interactive systems for speech/language training, therapy, communication aids

11.5 Stochastic modeling for dialog

11.6 Question-answering from speech

11.7 Spoken document summarization

11.8 Systems for spoken language understanding

11.9 Topic spotting and classification

11.10 Entity extraction from speech

11.11 Semantic analysis and classification

11.12 Conversation and interaction

11.13 Evaluation of speech and multimodal dialog systems

11.14 Evaluation of summarization and understanding


Spoken Language Processing: Translation, Information Retrieval, Resources

12.1 Spoken machine translation

12.2 Speech-to-speech translation systems

12.3 Transliteration

12.4 Voice search

12.5 Spoken term detection

12.6 Audio indexing

12.7 Spoken document retrieval

12.8 Systems for mining spoken data, search or retrieval of speech documents

12.9 Speech and multimodal resources and annotation

12.10 Metadata descriptions of speech, audio and text resources

12.11 Metadata for semantic or content markup

12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)

12.13 Methodologies and tools for language resource construction and annotation

12.14 Automatic segmentation and labeling of resources

12.15 Multilingual resources

12.16 Evaluation and quality insurance of language resources

12.17 Evaluation of translation and information retrieval systems

12.18 Special Session: Open Data for Under-Resourced Languages


Speech And Spoken-Language Based Multimodal Processing And Systems

13.1 Multimodal Speech Recognition

13.2 Multimodal LVCSR Systems

13.3 Multimodal Speech Analysis

13.4 Multimodal Synthesis

13.5 Multimodal Language Analysis

13.6 Multimodal and multimedia language trait recognition

13.7 Multimodal paralinguistics

13.8 Multimodal interactions, interfaces

13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines



MACHINE LEARNING

Learning Methods

14.1 Supervised Learning

14.2 Unsupervised Learning

14.3 Reinforcement Learning

14.4 Learning Theory


14.5 Generative Models

14.6 Discriminative Models


14.7 Probabilistic Models

14.8 Bayesian Methods

14.9 Gaussian Processes


Deep Learning

15.1 Network Architecture

15.2 Autoencoder

15.3 Representation Learning

15.4 Optimization

15.5 Regularization

15.6 Sparsity

15.7 Transfer Learning

15.8 Sequence Learning

15.9 Online Learning

15.10 Tricks

15.12 visualization

SPEAKER RECOGNITION

16.1 Deep learning

16.2 Short utterances

16.3 Challenge

Review