“2016”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
DNN architecture
 
(4位用户的48个中间修订版本未显示)
第1行: 第1行:
[1 SPEECH PERCEPTION, PRODUCTION AND ACQUISITION]
+
==DNN architecture==
  
1.1 Models of speech production
+
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
1.2 Physiology and neurophysiology of speech production
+
* [[媒体文件:OUTRAGEOUSLYLARGENEURALNETWORKSTHESPARSELY-GATEDMIXTURE-OF-EXPERTSLAYER.pdf|ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER]]
1.3 Neural basis of speech production
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft]
1.4 Coarticulation
+
* [https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition]
1.5 Models of speech perception
+
* [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition]
1.6 Physiology and neurophysiology of speech perception
+
* [http://t.cn/RfZHxko MICRO 2016 ]
1.7 Neural basis of speech perception
+
* [[媒体文件:Cambricon-X.pdf| Cambricon-X: An Accelerator for Sparse Neural Networks]]
1.8 Acoustic and articulatory cues in speech perception
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/26/REVISE_SATURATED_ACTIVATION_FUNCTIONS.pdf revise saturated activation functions]
1.9 Interaction speech production-speech perception
+
1.10 Multimodal speech perception
+
1.11 Cognition and brain studies on speech
+
1.12 Multilingual studies
+
1.13 L1 acquisition and bilingual acquisition
+
1.14 L2 acquisition by children and adults
+
1.15 Speech and hearing disorders
+
1.16 Singing voice: production and perception
+
1.17 Speech and other biosignals
+
1.18 Special Session: Intelligibility under the microscope
+
  
[2 PHONETICS, PHONOLOGY, AND PROSODY]
+
==Visualization==
  
2.1 Phonetics and phonology
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
2.2 Language descriptions
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models]
2.3 Linguistic systems
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Understanding_intermediate_layers_using_linear_classifier_probes.pdf Understanding_intermediate_layers_using_linear_classifier_probes]
2.4 Discourse and dialog structures
+
2.5 Acoustic phonetics
+
2.6 Phonation, voice quality
+
2.7 Articulatory and acoustic features of prosody
+
2.8 Perception of prosody
+
2.9 Phonological processes and models
+
2.10 Laboratory phonology
+
2.11 Phonetic universals
+
2.12 Sound changes
+
2.13 Sociophonetics
+
2.14 Phonetics of L1-L2 interaction
+
  
[3 ANALYSIS OF PARALINGUISTICS IN SPEECH AND LANGUAGE]
+
==Speaker recognition==
  
3.1 Analysis of speaker states
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
3.2 Analysis of speaker traits
+
* [http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
3.3 Automatic analysis of speaker states and traits
+
3.4 Pathological speech and language
+
3.5 Non-verbal communication
+
3.6 Social and vocal signals
+
3.7 Sentiment analysis and opinion mining
+
3.8 Paralinguistics in singing
+
3.9 Perception of paralinguistic phenomena
+
3.10 Phonetic and linguistic aspects of paralinguistics
+
3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity
+
3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders
+
  
[4 SPEAKER AND LANGUAGE IDENTIFICATION]
 
  
4.1 Language identification and verification
+
==Review==
4.2 Dialect and accent recognition
+
4.3 Speaker verification and identification
+
4.4 Features for speaker and language recognition
+
4.5 Robustness to variable and degraded channels
+
4.6 Speaker confidence estimation
+
4.7 Speaker diarization
+
4.8 Higher-level knowledge in speaker and language recognition
+
4.9 Evaluation of speaker and language identification systems
+
4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances
+
4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge
+
  
[5 ANALYSIS OF SPEECH AND AUDIO SIGNALS]
+
*[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]]
 
+
*[[媒体文件:Nn analysis.pdf  |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]]
5.1 Speech acoustics
+
*[[媒体文件:Interspeech16 review.pdf|Zhiyuan Tang 20161122 - INTERSPEECH 2016 summary ]]
5.2 Speech analysis and representation
+
5.3 Audio signal analysis and representation
+
5.4 Speech and audio segmentation and classification
+
5.5 Voice activity detection
+
5.6 Pitch and harmonic analysis
+
5.7 Source separation and computational auditory scene analysis
+
5.8 Speaker spatial localization
+
5.9 Voice separation
+
5.10 Music signal processing and understanding
+
5.11 Singing analysis
+
5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations
+
 
+
[6 SPEECH CODING AND ENHANCEMENT]
+
 
+
6.1 Speech coding and transmission
+
6.2 Low-bit-rate speech coding
+
6.3 Perceptual audio coding of speech signals
+
6.4 Noise reduction for speech signals
+
6.5 Speech enhancement: single-channel
+
6.6 Speech enhancement: multi-channel
+
6.7 Speech intelligibility
+
6.8 Active noise control
+
6.9 Speech enhancement in hearing aids
+
6.10 Adaptive beamforming for speech enhancement
+
6.11 Dereverberation for speech signals
+
6.12 Echo cancelation for speech signals
+
6.13 Evaluation of speech transmission, coding and enhancement
+
 
+
[7 SPEECH SYNTHESIS AND SPOKEN LANGUAGE GENERATION]
+
 
+
7.1 Grapheme-to-phoneme conversion for synthesis
+
7.2 Text processing for speech synthesis
+
7.3 Signal processing/statistical models for synthesis
+
7.4 Speech synthesis paradigms and methods
+
7.5 Articulatory speech synthesis
+
7.6 Segment-level and/or concatenative synthesis
+
7.7 Unit selection speech synthesis
+
7.8 Statistical parametric speech synthesis
+
7.9 Prosody modeling and generation
+
7.10 Expression, emotion and personality generation
+
7.11 Synthesis of singing voices
+
7.12 Voice modification, conversion and morphing
+
7.13 Concept-to-speech conversion
+
7.14 Cross-lingual and multilingual aspects in speech synthesis
+
7.15 Avatars and talking faces
+
7.16 Tools and data for speech synthesis
+
7.17 Evaluation of speech synthesis
+
7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap
+
7.19 Special Session: Voice Conversion Challenge 2016
+
 
+
[8 SPEECH RECOGNITION: SIGNAL PROCESSING, ACOUSTIC MODELING, ROBUSTNESS, ADAPTATION]
+
 
+
8.1 Feature extraction and low-level feature modeling for ASR
+
8.2 Prosodic features and models
+
8.3 Robustness against noise, reverberation
+
8.4 Far field and microphone array speech recognition
+
8.5 Speaker normalization (e.g., VTLN)
+
8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)
+
8.7 Discriminative acoustic training methods for ASR
+
8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)
+
8.9 Speaker adaptation, speaker adapted training methods
+
8.10 Pronunciation variants and modeling for speech recognition
+
8.11 Acoustic confidence measures
+
8.13 Cross-lingual and multilingual aspects, non-native accents
+
8.14 Acoustic modeling for conversational speech (dialog, interaction)
+
8.15 Evaluation of speech recognition
+
 
+
[9 SPEECH RECOGNITION - ARCHITECTURE, SEARCH, AND LINGUISTIC COMPONENTS]
+
 
+
9.1 Lexical modeling and access: units and models
+
9.2 Automatic lexicon learning
+
9.3 Supervised/unsupervised morphological models
+
9.4 Prosodic features and models for language modeling
+
9.5 Discriminative training methods for language modeling
+
9.6 Language model adaptation (domain, diachronic adaptation)
+
9.7 Language modeling for conversational speech (dialog, interaction)
+
9.8 Neural networks for language modeling
+
9.9 Search methods, decoding algorithms, lattices, multipass strategies
+
9.10 New computational strategies, data-structures for ASR
+
9.11 Computational resource constrained speech recognition
+
9.12 Confidence measures
+
9.13 Cross-lingual and multilingual components for speech recognition
+
9.14 Structured classification approaches
+
 
+
[10 SPEECH RECOGNITION - TECHNOLOGIES AND SYSTEMS FOR NEW APPLICATIONS]
+
 
+
10.1 Multimodal systems
+
10.2 Applications in education and learning (incl. CALL, assessment of fluency)
+
10.3 Applications in medical practice (CIS, voice assessment, etc.)
+
10.4 Speech science in end-user applications
+
10.5 Rich transcription
+
10.6 Innovative products and services based on speech technologies
+
10.7 Sparse, template-based representations
+
10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)
+
10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications
+
10.10 Special Session: Realism in robust speech processing
+
10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing
+
10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education
+
 
+
[11 SPOKEN LANGUAGE PROCESSING - DIALOG, SUMMARIZATION, UNDERSTANDING]
+
 
+
11.1 Spoken dialog systems
+
11.2 Multimodal human-machine interaction (conversat. agents, human-robot)
+
11.3 Analysis of verbal, co-verbal and nonverbal behavior
+
11.4 Interactive systems for speech/language training, therapy, communication aids
+
11.5 Stochastic modeling for dialog
+
11.6 Question-answering from speech
+
11.7 Spoken document summarization
+
11.8 Systems for spoken language understanding
+
11.9 Topic spotting and classification
+
11.10 Entity extraction from speech
+
11.11 Semantic analysis and classification
+
11.12 Conversation and interaction
+
11.13 Evaluation of speech and multimodal dialog systems
+
11.14 Evaluation of summarization and understanding
+
 
+
[12 SPOKEN LANGUAGE PROCESSING: TRANSLATION, INFORMATION RETRIEVAL, RESOURCES]
+
 
+
12.1 Spoken machine translation
+
12.2 Speech-to-speech translation systems
+
12.3 Transliteration
+
12.4 Voice search
+
12.5 Spoken term detection
+
12.6 Audio indexing
+
12.7 Spoken document retrieval
+
12.8 Systems for mining spoken data, search or retrieval of speech documents
+
12.9 Speech and multimodal resources and annotation
+
12.10 Metadata descriptions of speech, audio and text resources
+
12.11 Metadata for semantic or content markup
+
12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)
+
12.13 Methodologies and tools for language resource construction and annotation
+
12.14 Automatic segmentation and labeling of resources
+
12.15 Multilingual resources
+
12.16 Evaluation and quality insurance of language resources
+
12.17 Evaluation of translation and information retrieval systems
+
12.18 Special Session: Open Data for Under-Resourced Languages
+
 
+
[13 SPEECH AND SPOKEN-LANGUAGE BASED MULTIMODAL PROCESSING AND SYSTEMS]
+
 
+
13.1 Multimodal Speech Recognition
+
13.2 Multimodal LVCSR Systems
+
13.3 Multimodal Speech Analysis
+
13.4 Multimodal Synthesis
+
13.5 Multimodal Language Analysis
+
13.6 Multimodal and multimedia language trait recognition
+
13.7 Multimodal paralinguistics
+
13.8 Multimodal interactions, interfaces
+
13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines
+

2016年12月1日 (四) 08:17的最后版本

DNN architecture

Visualization

Speaker recognition


Review