|
|
(4位用户的49个中间修订版本未显示) |
第1行: |
第1行: |
− | [1 SPEECH PERCEPTION, PRODUCTION AND ACQUISITION]
| + | ==DNN architecture== |
| | | |
− | [[1.1 Models of speech production]] | + | * [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks] |
− | 1.2 Physiology and neurophysiology of speech production
| + | * [[媒体文件:OUTRAGEOUSLYLARGENEURALNETWORKSTHESPARSELY-GATEDMIXTURE-OF-EXPERTSLAYER.pdf|ICLR2017: OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER]] |
− | 1.3 Neural basis of speech production
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fb/LightRNN.pdf lightRNN from microsoft] |
− | 1.4 Coarticulation
| + | * [https://arxiv.org/pdf/1512.03385v1.pdf Kaiming He et al. Deep Residual Learning for Image Recognition] |
− | 1.5 Models of speech perception
| + | * [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0515.pdf Wei-Ning Hsu et al. Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition] |
− | 1.6 Physiology and neurophysiology of speech perception
| + | * [http://t.cn/RfZHxko MICRO 2016 ] |
− | 1.7 Neural basis of speech perception
| + | * [[媒体文件:Cambricon-X.pdf| Cambricon-X: An Accelerator for Sparse Neural Networks]] |
− | 1.8 Acoustic and articulatory cues in speech perception
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/26/REVISE_SATURATED_ACTIVATION_FUNCTIONS.pdf revise saturated activation functions] |
− | 1.9 Interaction speech production-speech perception
| + | |
− | 1.10 Multimodal speech perception
| + | |
− | 1.11 Cognition and brain studies on speech
| + | |
− | 1.12 Multilingual studies
| + | |
− | 1.13 L1 acquisition and bilingual acquisition
| + | |
− | 1.14 L2 acquisition by children and adults
| + | |
− | 1.15 Speech and hearing disorders
| + | |
− | 1.16 Singing voice: production and perception
| + | |
− | 1.17 Speech and other biosignals
| + | |
− | 1.18 Special Session: Intelligibility under the microscope
| + | |
| | | |
− | [2 PHONETICS, PHONOLOGY, AND PROSODY]
| + | ==Visualization== |
| | | |
− | 2.1 Phonetics and phonology
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks] |
− | 2.2 Language descriptions
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/43/On_the_Role_of_Nonlinear_Transformations_in_Deep_Neural_Network_Acoustic_Models.PDF On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models] |
− | 2.3 Linguistic systems
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Understanding_intermediate_layers_using_linear_classifier_probes.pdf Understanding_intermediate_layers_using_linear_classifier_probes] |
− | 2.4 Discourse and dialog structures
| + | |
− | 2.5 Acoustic phonetics
| + | |
− | 2.6 Phonation, voice quality
| + | |
− | 2.7 Articulatory and acoustic features of prosody
| + | |
− | 2.8 Perception of prosody
| + | |
− | 2.9 Phonological processes and models
| + | |
− | 2.10 Laboratory phonology
| + | |
− | 2.11 Phonetic universals
| + | |
− | 2.12 Sound changes
| + | |
− | 2.13 Sociophonetics
| + | |
− | 2.14 Phonetics of L1-L2 interaction
| + | |
| | | |
− | [3 ANALYSIS OF PARALINGUISTICS IN SPEECH AND LANGUAGE]
| + | ==Speaker recognition== |
| | | |
− | 3.1 Analysis of speaker states
| + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances] |
− | 3.2 Analysis of speaker traits
| + | * [http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge] |
− | 3.3 Automatic analysis of speaker states and traits
| + | |
− | 3.4 Pathological speech and language
| + | |
− | 3.5 Non-verbal communication
| + | |
− | 3.6 Social and vocal signals
| + | |
− | 3.7 Sentiment analysis and opinion mining
| + | |
− | 3.8 Paralinguistics in singing
| + | |
− | 3.9 Perception of paralinguistic phenomena
| + | |
− | 3.10 Phonetic and linguistic aspects of paralinguistics | + | |
− | 3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity
| + | |
− | 3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders
| + | |
| | | |
− | [4 SPEAKER AND LANGUAGE IDENTIFICATION]
| |
| | | |
− | 4.1 Language identification and verification
| + | ==Review== |
− | 4.2 Dialect and accent recognition
| + | |
− | 4.3 Speaker verification and identification
| + | |
− | 4.4 Features for speaker and language recognition
| + | |
− | 4.5 Robustness to variable and degraded channels
| + | |
− | 4.6 Speaker confidence estimation
| + | |
− | 4.7 Speaker diarization
| + | |
− | 4.8 Higher-level knowledge in speaker and language recognition
| + | |
− | 4.9 Evaluation of speaker and language identification systems
| + | |
− | 4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances
| + | |
− | 4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge
| + | |
| | | |
− | [5 ANALYSIS OF SPEECH AND AUDIO SIGNALS] | + | *[[媒体文件:Note icassp16.pdf|Zhiyuan Tang 20160520 - ICASSP 2016 summary ]] |
− | | + | *[[媒体文件:Nn analysis.pdf |Zhiyuan Tang 20160802 - Visualizing, Measuring and Understanding Neural Networks: A Brief Survey ]] |
− | 5.1 Speech acoustics
| + | *[[媒体文件:Interspeech16 review.pdf|Zhiyuan Tang 20161122 - INTERSPEECH 2016 summary ]] |
− | 5.2 Speech analysis and representation
| + | |
− | 5.3 Audio signal analysis and representation
| + | |
− | 5.4 Speech and audio segmentation and classification
| + | |
− | 5.5 Voice activity detection
| + | |
− | 5.6 Pitch and harmonic analysis
| + | |
− | 5.7 Source separation and computational auditory scene analysis
| + | |
− | 5.8 Speaker spatial localization
| + | |
− | 5.9 Voice separation
| + | |
− | 5.10 Music signal processing and understanding
| + | |
− | 5.11 Singing analysis
| + | |
− | 5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations
| + | |
− | | + | |
− | [6 SPEECH CODING AND ENHANCEMENT] | + | |
− | | + | |
− | 6.1 Speech coding and transmission
| + | |
− | 6.2 Low-bit-rate speech coding
| + | |
− | 6.3 Perceptual audio coding of speech signals
| + | |
− | 6.4 Noise reduction for speech signals
| + | |
− | 6.5 Speech enhancement: single-channel
| + | |
− | 6.6 Speech enhancement: multi-channel
| + | |
− | 6.7 Speech intelligibility
| + | |
− | 6.8 Active noise control
| + | |
− | 6.9 Speech enhancement in hearing aids
| + | |
− | 6.10 Adaptive beamforming for speech enhancement
| + | |
− | 6.11 Dereverberation for speech signals
| + | |
− | 6.12 Echo cancelation for speech signals
| + | |
− | 6.13 Evaluation of speech transmission, coding and enhancement
| + | |
− | | + | |
− | [7 SPEECH SYNTHESIS AND SPOKEN LANGUAGE GENERATION]
| + | |
− | | + | |
− | 7.1 Grapheme-to-phoneme conversion for synthesis
| + | |
− | 7.2 Text processing for speech synthesis
| + | |
− | 7.3 Signal processing/statistical models for synthesis
| + | |
− | 7.4 Speech synthesis paradigms and methods
| + | |
− | 7.5 Articulatory speech synthesis
| + | |
− | 7.6 Segment-level and/or concatenative synthesis
| + | |
− | 7.7 Unit selection speech synthesis
| + | |
− | 7.8 Statistical parametric speech synthesis
| + | |
− | 7.9 Prosody modeling and generation
| + | |
− | 7.10 Expression, emotion and personality generation
| + | |
− | 7.11 Synthesis of singing voices
| + | |
− | 7.12 Voice modification, conversion and morphing
| + | |
− | 7.13 Concept-to-speech conversion
| + | |
− | 7.14 Cross-lingual and multilingual aspects in speech synthesis
| + | |
− | 7.15 Avatars and talking faces
| + | |
− | 7.16 Tools and data for speech synthesis
| + | |
− | 7.17 Evaluation of speech synthesis
| + | |
− | 7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap
| + | |
− | 7.19 Special Session: Voice Conversion Challenge 2016
| + | |
− | | + | |
− | [8 SPEECH RECOGNITION: SIGNAL PROCESSING, ACOUSTIC MODELING, ROBUSTNESS, ADAPTATION]
| + | |
− | | + | |
− | 8.1 Feature extraction and low-level feature modeling for ASR
| + | |
− | 8.2 Prosodic features and models
| + | |
− | 8.3 Robustness against noise, reverberation
| + | |
− | 8.4 Far field and microphone array speech recognition
| + | |
− | 8.5 Speaker normalization (e.g., VTLN)
| + | |
− | 8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)
| + | |
− | 8.7 Discriminative acoustic training methods for ASR
| + | |
− | 8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)
| + | |
− | 8.9 Speaker adaptation, speaker adapted training methods
| + | |
− | 8.10 Pronunciation variants and modeling for speech recognition
| + | |
− | 8.11 Acoustic confidence measures
| + | |
− | 8.13 Cross-lingual and multilingual aspects, non-native accents
| + | |
− | 8.14 Acoustic modeling for conversational speech (dialog, interaction)
| + | |
− | 8.15 Evaluation of speech recognition
| + | |
− | | + | |
− | [9 SPEECH RECOGNITION - ARCHITECTURE, SEARCH, AND LINGUISTIC COMPONENTS] | + | |
− | | + | |
− | 9.1 Lexical modeling and access: units and models
| + | |
− | 9.2 Automatic lexicon learning
| + | |
− | 9.3 Supervised/unsupervised morphological models
| + | |
− | 9.4 Prosodic features and models for language modeling
| + | |
− | 9.5 Discriminative training methods for language modeling
| + | |
− | 9.6 Language model adaptation (domain, diachronic adaptation)
| + | |
− | 9.7 Language modeling for conversational speech (dialog, interaction)
| + | |
− | 9.8 Neural networks for language modeling
| + | |
− | 9.9 Search methods, decoding algorithms, lattices, multipass strategies
| + | |
− | 9.10 New computational strategies, data-structures for ASR
| + | |
− | 9.11 Computational resource constrained speech recognition
| + | |
− | 9.12 Confidence measures
| + | |
− | 9.13 Cross-lingual and multilingual components for speech recognition
| + | |
− | 9.14 Structured classification approaches
| + | |
− | | + | |
− | [10 SPEECH RECOGNITION - TECHNOLOGIES AND SYSTEMS FOR NEW APPLICATIONS]
| + | |
− | | + | |
− | 10.1 Multimodal systems
| + | |
− | 10.2 Applications in education and learning (incl. CALL, assessment of fluency)
| + | |
− | 10.3 Applications in medical practice (CIS, voice assessment, etc.)
| + | |
− | 10.4 Speech science in end-user applications
| + | |
− | 10.5 Rich transcription
| + | |
− | 10.6 Innovative products and services based on speech technologies
| + | |
− | 10.7 Sparse, template-based representations
| + | |
− | 10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)
| + | |
− | 10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications
| + | |
− | 10.10 Special Session: Realism in robust speech processing
| + | |
− | 10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing
| + | |
− | 10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education
| + | |
− | | + | |
− | [11 SPOKEN LANGUAGE PROCESSING - DIALOG, SUMMARIZATION, UNDERSTANDING]
| + | |
− | | + | |
− | 11.1 Spoken dialog systems
| + | |
− | 11.2 Multimodal human-machine interaction (conversat. agents, human-robot)
| + | |
− | 11.3 Analysis of verbal, co-verbal and nonverbal behavior
| + | |
− | 11.4 Interactive systems for speech/language training, therapy, communication aids
| + | |
− | 11.5 Stochastic modeling for dialog
| + | |
− | 11.6 Question-answering from speech
| + | |
− | 11.7 Spoken document summarization
| + | |
− | 11.8 Systems for spoken language understanding
| + | |
− | 11.9 Topic spotting and classification
| + | |
− | 11.10 Entity extraction from speech
| + | |
− | 11.11 Semantic analysis and classification
| + | |
− | 11.12 Conversation and interaction
| + | |
− | 11.13 Evaluation of speech and multimodal dialog systems
| + | |
− | 11.14 Evaluation of summarization and understanding
| + | |
− | | + | |
− | [12 SPOKEN LANGUAGE PROCESSING: TRANSLATION, INFORMATION RETRIEVAL, RESOURCES] | + | |
− | | + | |
− | 12.1 Spoken machine translation
| + | |
− | 12.2 Speech-to-speech translation systems
| + | |
− | 12.3 Transliteration
| + | |
− | 12.4 Voice search
| + | |
− | 12.5 Spoken term detection
| + | |
− | 12.6 Audio indexing
| + | |
− | 12.7 Spoken document retrieval
| + | |
− | 12.8 Systems for mining spoken data, search or retrieval of speech documents
| + | |
− | 12.9 Speech and multimodal resources and annotation
| + | |
− | 12.10 Metadata descriptions of speech, audio and text resources
| + | |
− | 12.11 Metadata for semantic or content markup
| + | |
− | 12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)
| + | |
− | 12.13 Methodologies and tools for language resource construction and annotation
| + | |
− | 12.14 Automatic segmentation and labeling of resources
| + | |
− | 12.15 Multilingual resources
| + | |
− | 12.16 Evaluation and quality insurance of language resources
| + | |
− | 12.17 Evaluation of translation and information retrieval systems
| + | |
− | 12.18 Special Session: Open Data for Under-Resourced Languages
| + | |
− | | + | |
− | [13 SPEECH AND SPOKEN-LANGUAGE BASED MULTIMODAL PROCESSING AND SYSTEMS]
| + | |
− | | + | |
− | 13.1 Multimodal Speech Recognition
| + | |
− | 13.2 Multimodal LVCSR Systems
| + | |
− | 13.3 Multimodal Speech Analysis
| + | |
− | 13.4 Multimodal Synthesis
| + | |
− | 13.5 Multimodal Language Analysis
| + | |
− | 13.6 Multimodal and multimedia language trait recognition
| + | |
− | 13.7 Multimodal paralinguistics
| + | |
− | 13.8 Multimodal interactions, interfaces
| + | |
− | 13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines
| + | |