“2016”版本间的差异
第1行: | 第1行: | ||
− | [1 SPEECH PERCEPTION, PRODUCTION AND ACQUISITION] | + | [[1 SPEECH PERCEPTION, PRODUCTION AND ACQUISITION]] |
− | + | [[]] | |
− | 1.1 Models of speech production | + | [[1.1 Models of speech production]] |
− | 1.2 Physiology and neurophysiology of speech production | + | [[1.2 Physiology and neurophysiology of speech production]] |
− | 1.3 Neural basis of speech production | + | [[1.3 Neural basis of speech production]] |
− | 1.4 Coarticulation | + | [[1.4 Coarticulation]] |
− | 1.5 Models of speech perception | + | [[1.5 Models of speech perception]] |
− | 1.6 Physiology and neurophysiology of speech perception | + | [[1.6 Physiology and neurophysiology of speech perception]] |
− | 1.7 Neural basis of speech perception | + | [[1.7 Neural basis of speech perception]] |
− | 1.8 Acoustic and articulatory cues in speech perception | + | [[1.8 Acoustic and articulatory cues in speech perception]] |
− | 1.9 Interaction speech production-speech perception | + | [[1.9 Interaction speech production-speech perception]] |
− | 1.10 Multimodal speech perception | + | [[1.10 Multimodal speech perception]] |
− | 1.11 Cognition and brain studies on speech | + | [[1.11 Cognition and brain studies on speech]] |
− | 1.12 Multilingual studies | + | [[1.12 Multilingual studies]] |
− | 1.13 L1 acquisition and bilingual acquisition | + | [[1.13 L1 acquisition and bilingual acquisition]] |
− | 1.14 L2 acquisition by children and adults | + | [[1.14 L2 acquisition by children and adults]] |
− | 1.15 Speech and hearing disorders | + | [[1.15 Speech and hearing disorders]] |
− | 1.16 Singing voice: production and perception | + | [[1.16 Singing voice: production and perception]] |
− | 1.17 Speech and other biosignals | + | [[1.17 Speech and other biosignals]] |
− | 1.18 Special Session: Intelligibility under the microscope | + | [[1.18 Special Session: Intelligibility under the microscope]] |
− | + | [[]] | |
− | [2 PHONETICS, PHONOLOGY, AND PROSODY] | + | [[2 PHONETICS, PHONOLOGY, AND PROSODY]] |
− | + | [[]] | |
− | 2.1 Phonetics and phonology | + | [[2.1 Phonetics and phonology]] |
− | 2.2 Language descriptions | + | [[2.2 Language descriptions]] |
− | 2.3 Linguistic systems | + | [[2.3 Linguistic systems]] |
− | 2.4 Discourse and dialog structures | + | [[2.4 Discourse and dialog structures]] |
− | 2.5 Acoustic phonetics | + | [[2.5 Acoustic phonetics]] |
− | 2.6 Phonation, voice quality | + | [[2.6 Phonation, voice quality]] |
− | 2.7 Articulatory and acoustic features of prosody | + | [[2.7 Articulatory and acoustic features of prosody]] |
− | 2.8 Perception of prosody | + | [[2.8 Perception of prosody]] |
− | 2.9 Phonological processes and models | + | [[2.9 Phonological processes and models]] |
− | 2.10 Laboratory phonology | + | [[2.10 Laboratory phonology]] |
− | 2.11 Phonetic universals | + | [[2.11 Phonetic universals]] |
− | 2.12 Sound changes | + | [[2.12 Sound changes]] |
− | 2.13 Sociophonetics | + | [[2.13 Sociophonetics]] |
− | 2.14 Phonetics of L1-L2 interaction | + | [[2.14 Phonetics of L1-L2 interaction]] |
− | + | [[]] | |
− | [3 ANALYSIS OF PARALINGUISTICS IN SPEECH AND LANGUAGE] | + | [[3 ANALYSIS OF PARALINGUISTICS IN SPEECH AND LANGUAGE]] |
− | + | [[]] | |
− | 3.1 Analysis of speaker states | + | [[3.1 Analysis of speaker states]] |
− | 3.2 Analysis of speaker traits | + | [[3.2 Analysis of speaker traits]] |
− | 3.3 Automatic analysis of speaker states and traits | + | [[3.3 Automatic analysis of speaker states and traits]] |
− | 3.4 Pathological speech and language | + | [[3.4 Pathological speech and language]] |
− | 3.5 Non-verbal communication | + | [[3.5 Non-verbal communication]] |
− | 3.6 Social and vocal signals | + | [[3.6 Social and vocal signals]] |
− | 3.7 Sentiment analysis and opinion mining | + | [[3.7 Sentiment analysis and opinion mining]] |
− | 3.8 Paralinguistics in singing | + | [[3.8 Paralinguistics in singing]] |
− | 3.9 Perception of paralinguistic phenomena | + | [[3.9 Perception of paralinguistic phenomena]] |
− | 3.10 Phonetic and linguistic aspects of paralinguistics | + | [[3.10 Phonetic and linguistic aspects of paralinguistics]] |
− | 3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity | + | [[3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity]] |
− | 3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders | + | [[3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders]] |
− | + | [[]] | |
− | [4 SPEAKER AND LANGUAGE IDENTIFICATION] | + | [[4 SPEAKER AND LANGUAGE IDENTIFICATION]] |
− | + | [[]] | |
− | 4.1 Language identification and verification | + | [[4.1 Language identification and verification]] |
− | 4.2 Dialect and accent recognition | + | [[4.2 Dialect and accent recognition]] |
− | 4.3 Speaker verification and identification | + | [[4.3 Speaker verification and identification]] |
− | 4.4 Features for speaker and language recognition | + | [[4.4 Features for speaker and language recognition]] |
− | 4.5 Robustness to variable and degraded channels | + | [[4.5 Robustness to variable and degraded channels]] |
− | 4.6 Speaker confidence estimation | + | [[4.6 Speaker confidence estimation]] |
− | 4.7 Speaker diarization | + | [[4.7 Speaker diarization]] |
− | 4.8 Higher-level knowledge in speaker and language recognition | + | [[4.8 Higher-level knowledge in speaker and language recognition]] |
− | 4.9 Evaluation of speaker and language identification systems | + | [[4.9 Evaluation of speaker and language identification systems]] |
− | 4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances | + | [[4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]] |
− | 4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge | + | [[4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]] |
− | + | [[]] | |
− | [5 ANALYSIS OF SPEECH AND AUDIO SIGNALS] | + | [[5 ANALYSIS OF SPEECH AND AUDIO SIGNALS]] |
− | + | [[]] | |
− | 5.1 Speech acoustics | + | [[5.1 Speech acoustics]] |
− | 5.2 Speech analysis and representation | + | [[5.2 Speech analysis and representation]] |
− | 5.3 Audio signal analysis and representation | + | [[5.3 Audio signal analysis and representation]] |
− | 5.4 Speech and audio segmentation and classification | + | [[5.4 Speech and audio segmentation and classification]] |
− | 5.5 Voice activity detection | + | [[5.5 Voice activity detection]] |
− | 5.6 Pitch and harmonic analysis | + | [[5.6 Pitch and harmonic analysis]] |
− | 5.7 Source separation and computational auditory scene analysis | + | [[5.7 Source separation and computational auditory scene analysis]] |
− | 5.8 Speaker spatial localization | + | [[5.8 Speaker spatial localization]] |
− | 5.9 Voice separation | + | [[5.9 Voice separation]] |
− | 5.10 Music signal processing and understanding | + | [[5.10 Music signal processing and understanding]] |
− | 5.11 Singing analysis | + | [[5.11 Singing analysis]] |
− | 5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations | + | [[5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations ]] |
− | + | [[]] | |
− | [6 SPEECH CODING AND ENHANCEMENT] | + | [[6 SPEECH CODING AND ENHANCEMENT]] |
− | + | [[]] | |
− | 6.1 Speech coding and transmission | + | [[6.1 Speech coding and transmission]] |
− | 6.2 Low-bit-rate speech coding | + | [[6.2 Low-bit-rate speech coding]] |
− | 6.3 Perceptual audio coding of speech signals | + | [[6.3 Perceptual audio coding of speech signals]] |
− | 6.4 Noise reduction for speech signals | + | [[6.4 Noise reduction for speech signals]] |
− | 6.5 Speech enhancement: single-channel | + | [[6.5 Speech enhancement: single-channel]] |
− | 6.6 Speech enhancement: multi-channel | + | [[6.6 Speech enhancement: multi-channel]] |
− | 6.7 Speech intelligibility | + | [[6.7 Speech intelligibility]] |
− | 6.8 Active noise control | + | [[6.8 Active noise control]] |
− | 6.9 Speech enhancement in hearing aids | + | [[6.9 Speech enhancement in hearing aids]] |
− | 6.10 Adaptive beamforming for speech enhancement | + | [[6.10 Adaptive beamforming for speech enhancement]] |
− | 6.11 Dereverberation for speech signals | + | [[6.11 Dereverberation for speech signals]] |
− | 6.12 Echo cancelation for speech signals | + | [[6.12 Echo cancelation for speech signals]] |
− | 6.13 Evaluation of speech transmission, coding and enhancement | + | [[6.13 Evaluation of speech transmission, coding and enhancement]] |
− | + | [[]] | |
− | [7 SPEECH SYNTHESIS AND SPOKEN LANGUAGE GENERATION] | + | [[7 SPEECH SYNTHESIS AND SPOKEN LANGUAGE GENERATION]] |
− | + | [[]] | |
− | 7.1 Grapheme-to-phoneme conversion for synthesis | + | [[7.1 Grapheme-to-phoneme conversion for synthesis]] |
− | 7.2 Text processing for speech synthesis | + | [[7.2 Text processing for speech synthesis]] |
− | 7.3 Signal processing/statistical models for synthesis | + | [[7.3 Signal processing/statistical models for synthesis]] |
− | 7.4 Speech synthesis paradigms and methods | + | [[7.4 Speech synthesis paradigms and methods]] |
− | 7.5 Articulatory speech synthesis | + | [[7.5 Articulatory speech synthesis]] |
− | 7.6 Segment-level and/or concatenative synthesis | + | [[7.6 Segment-level and/or concatenative synthesis]] |
− | 7.7 Unit selection speech synthesis | + | [[7.7 Unit selection speech synthesis]] |
− | 7.8 Statistical parametric speech synthesis | + | [[7.8 Statistical parametric speech synthesis]] |
− | 7.9 Prosody modeling and generation | + | [[7.9 Prosody modeling and generation]] |
− | 7.10 Expression, emotion and personality generation | + | [[7.10 Expression, emotion and personality generation]] |
− | 7.11 Synthesis of singing voices | + | [[7.11 Synthesis of singing voices]] |
− | 7.12 Voice modification, conversion and morphing | + | [[7.12 Voice modification, conversion and morphing]] |
− | 7.13 Concept-to-speech conversion | + | [[7.13 Concept-to-speech conversion]] |
− | 7.14 Cross-lingual and multilingual aspects in speech synthesis | + | [[7.14 Cross-lingual and multilingual aspects in speech synthesis]] |
− | 7.15 Avatars and talking faces | + | [[7.15 Avatars and talking faces]] |
− | 7.16 Tools and data for speech synthesis | + | [[7.16 Tools and data for speech synthesis]] |
− | 7.17 Evaluation of speech synthesis | + | [[7.17 Evaluation of speech synthesis]] |
− | 7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap | + | [[7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap]] |
− | 7.19 Special Session: Voice Conversion Challenge 2016 | + | [[7.19 Special Session: Voice Conversion Challenge 2016]] |
− | + | [[]] | |
− | [8 SPEECH RECOGNITION: SIGNAL PROCESSING, ACOUSTIC MODELING, ROBUSTNESS, ADAPTATION] | + | [[8 SPEECH RECOGNITION: SIGNAL PROCESSING, ACOUSTIC MODELING, ROBUSTNESS, ADAPTATION]] |
− | + | [[]] | |
− | 8.1 Feature extraction and low-level feature modeling for ASR | + | [[8.1 Feature extraction and low-level feature modeling for ASR]] |
− | 8.2 Prosodic features and models | + | [[8.2 Prosodic features and models]] |
− | 8.3 Robustness against noise, reverberation | + | [[8.3 Robustness against noise, reverberation]] |
− | 8.4 Far field and microphone array speech recognition | + | [[8.4 Far field and microphone array speech recognition]] |
− | 8.5 Speaker normalization (e.g., VTLN) | + | [[8.5 Speaker normalization (e.g., VTLN)]] |
− | 8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN) | + | [[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]] |
− | 8.7 Discriminative acoustic training methods for ASR | + | [[8.7 Discriminative acoustic training methods for ASR]] |
− | 8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent) | + | [[8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)]] |
− | 8.9 Speaker adaptation, speaker adapted training methods | + | [[8.9 Speaker adaptation, speaker adapted training methods]] |
− | 8.10 Pronunciation variants and modeling for speech recognition | + | [[8.10 Pronunciation variants and modeling for speech recognition]] |
− | 8.11 Acoustic confidence measures | + | [[8.11 Acoustic confidence measures]] |
− | 8.13 Cross-lingual and multilingual aspects, non-native accents | + | [[8.13 Cross-lingual and multilingual aspects, non-native accents]] |
− | 8.14 Acoustic modeling for conversational speech (dialog, interaction) | + | [[8.14 Acoustic modeling for conversational speech (dialog, interaction)]] |
− | 8.15 Evaluation of speech recognition | + | [[8.15 Evaluation of speech recognition]] |
− | + | [[]] | |
− | [9 SPEECH RECOGNITION - ARCHITECTURE, SEARCH, AND LINGUISTIC COMPONENTS] | + | [[9 SPEECH RECOGNITION - ARCHITECTURE, SEARCH, AND LINGUISTIC COMPONENTS]] |
− | + | [[]] | |
− | 9.1 Lexical modeling and access: units and models | + | [[9.1 Lexical modeling and access: units and models]] |
− | 9.2 Automatic lexicon learning | + | [[9.2 Automatic lexicon learning]] |
− | 9.3 Supervised/unsupervised morphological models | + | [[9.3 Supervised/unsupervised morphological models]] |
− | 9.4 Prosodic features and models for language modeling | + | [[9.4 Prosodic features and models for language modeling]] |
− | 9.5 Discriminative training methods for language modeling | + | [[9.5 Discriminative training methods for language modeling]] |
− | 9.6 Language model adaptation (domain, diachronic adaptation) | + | [[9.6 Language model adaptation (domain, diachronic adaptation)]] |
− | 9.7 Language modeling for conversational speech (dialog, interaction) | + | [[9.7 Language modeling for conversational speech (dialog, interaction)]] |
− | 9.8 Neural networks for language modeling | + | [[9.8 Neural networks for language modeling]] |
− | 9.9 Search methods, decoding algorithms, lattices, multipass strategies | + | [[9.9 Search methods, decoding algorithms, lattices, multipass strategies]] |
− | 9.10 New computational strategies, data-structures for ASR | + | [[9.10 New computational strategies, data-structures for ASR]] |
− | 9.11 Computational resource constrained speech recognition | + | [[9.11 Computational resource constrained speech recognition]] |
− | 9.12 Confidence measures | + | [[9.12 Confidence measures]] |
− | 9.13 Cross-lingual and multilingual components for speech recognition | + | [[9.13 Cross-lingual and multilingual components for speech recognition]] |
− | 9.14 Structured classification approaches | + | [[9.14 Structured classification approaches]] |
− | + | [[]] | |
− | [10 SPEECH RECOGNITION - TECHNOLOGIES AND SYSTEMS FOR NEW APPLICATIONS] | + | [[10 SPEECH RECOGNITION - TECHNOLOGIES AND SYSTEMS FOR NEW APPLICATIONS]] |
− | + | [[]] | |
− | 10.1 Multimodal systems | + | [[10.1 Multimodal systems]] |
− | 10.2 Applications in education and learning (incl. CALL, assessment of fluency) | + | [[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]] |
− | 10.3 Applications in medical practice (CIS, voice assessment, etc.) | + | [[10.3 Applications in medical practice (CIS, voice assessment, etc.)]] |
− | 10.4 Speech science in end-user applications | + | [[10.4 Speech science in end-user applications]] |
− | 10.5 Rich transcription | + | [[10.5 Rich transcription]] |
− | 10.6 Innovative products and services based on speech technologies | + | [[10.6 Innovative products and services based on speech technologies]] |
− | 10.7 Sparse, template-based representations | + | [[10.7 Sparse, template-based representations]] |
− | 10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models) | + | [[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]] |
− | 10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications | + | [[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]] |
− | 10.10 Special Session: Realism in robust speech processing | + | [[10.10 Special Session: Realism in robust speech processing ]] |
− | 10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing | + | [[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]] |
− | 10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education | + | [[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]] |
− | + | [[]] | |
− | [11 SPOKEN LANGUAGE PROCESSING - DIALOG, SUMMARIZATION, UNDERSTANDING] | + | [[11 SPOKEN LANGUAGE PROCESSING - DIALOG, SUMMARIZATION, UNDERSTANDING]] |
− | + | [[]] | |
− | 11.1 Spoken dialog systems | + | [[11.1 Spoken dialog systems]] |
− | 11.2 Multimodal human-machine interaction (conversat. agents, human-robot) | + | [[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]] |
− | 11.3 Analysis of verbal, co-verbal and nonverbal behavior | + | [[11.3 Analysis of verbal, co-verbal and nonverbal behavior]] |
− | 11.4 Interactive systems for speech/language training, therapy, communication aids | + | [[11.4 Interactive systems for speech/language training, therapy, communication aids]] |
− | 11.5 Stochastic modeling for dialog | + | [[11.5 Stochastic modeling for dialog]] |
− | 11.6 Question-answering from speech | + | [[11.6 Question-answering from speech]] |
− | 11.7 Spoken document summarization | + | [[11.7 Spoken document summarization]] |
− | 11.8 Systems for spoken language understanding | + | [[11.8 Systems for spoken language understanding]] |
− | 11.9 Topic spotting and classification | + | [[11.9 Topic spotting and classification]] |
− | 11.10 Entity extraction from speech | + | [[11.10 Entity extraction from speech]] |
− | 11.11 Semantic analysis and classification | + | [[11.11 Semantic analysis and classification]] |
− | 11.12 Conversation and interaction | + | [[11.12 Conversation and interaction]] |
− | 11.13 Evaluation of speech and multimodal dialog systems | + | [[11.13 Evaluation of speech and multimodal dialog systems]] |
− | 11.14 Evaluation of summarization and understanding | + | [[11.14 Evaluation of summarization and understanding]] |
− | + | [[]] | |
− | [12 SPOKEN LANGUAGE PROCESSING: TRANSLATION, INFORMATION RETRIEVAL, RESOURCES] | + | [[12 SPOKEN LANGUAGE PROCESSING: TRANSLATION, INFORMATION RETRIEVAL, RESOURCES]] |
− | + | [[]] | |
− | 12.1 Spoken machine translation | + | [[12.1 Spoken machine translation]] |
− | 12.2 Speech-to-speech translation systems | + | [[12.2 Speech-to-speech translation systems]] |
− | 12.3 Transliteration | + | [[12.3 Transliteration]] |
− | 12.4 Voice search | + | [[12.4 Voice search]] |
− | 12.5 Spoken term detection | + | [[12.5 Spoken term detection]] |
− | 12.6 Audio indexing | + | [[12.6 Audio indexing]] |
− | 12.7 Spoken document retrieval | + | [[12.7 Spoken document retrieval]] |
− | 12.8 Systems for mining spoken data, search or retrieval of speech documents | + | [[12.8 Systems for mining spoken data, search or retrieval of speech documents]] |
− | 12.9 Speech and multimodal resources and annotation | + | [[12.9 Speech and multimodal resources and annotation]] |
− | 12.10 Metadata descriptions of speech, audio and text resources | + | [[12.10 Metadata descriptions of speech, audio and text resources]] |
− | 12.11 Metadata for semantic or content markup | + | [[12.11 Metadata for semantic or content markup]] |
− | 12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts) | + | [[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]] |
− | 12.13 Methodologies and tools for language resource construction and annotation | + | [[12.13 Methodologies and tools for language resource construction and annotation]] |
− | 12.14 Automatic segmentation and labeling of resources | + | [[12.14 Automatic segmentation and labeling of resources]] |
− | 12.15 Multilingual resources | + | [[12.15 Multilingual resources]] |
− | 12.16 Evaluation and quality insurance of language resources | + | [[12.16 Evaluation and quality insurance of language resources]] |
− | 12.17 Evaluation of translation and information retrieval systems | + | [[12.17 Evaluation of translation and information retrieval systems]] |
− | 12.18 Special Session: Open Data for Under-Resourced Languages | + | [[12.18 Special Session: Open Data for Under-Resourced Languages]] |
− | + | [[]] | |
− | [13 SPEECH AND SPOKEN-LANGUAGE BASED MULTIMODAL PROCESSING AND SYSTEMS] | + | [[13 SPEECH AND SPOKEN-LANGUAGE BASED MULTIMODAL PROCESSING AND SYSTEMS]] |
− | + | [[]] | |
− | 13.1 Multimodal Speech Recognition | + | [[13.1 Multimodal Speech Recognition]] |
− | 13.2 Multimodal LVCSR Systems | + | [[13.2 Multimodal LVCSR Systems]] |
− | 13.3 Multimodal Speech Analysis | + | [[13.3 Multimodal Speech Analysis]] |
− | 13.4 Multimodal Synthesis | + | [[13.4 Multimodal Synthesis]] |
− | 13.5 Multimodal Language Analysis | + | [[13.5 Multimodal Language Analysis ]] |
− | 13.6 Multimodal and multimedia language trait recognition | + | [[13.6 Multimodal and multimedia language trait recognition ]] |
− | 13.7 Multimodal paralinguistics | + | [[13.7 Multimodal paralinguistics ]] |
− | 13.8 Multimodal interactions, interfaces | + | [[13.8 Multimodal interactions, interfaces]] |
− | 13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines | + | [[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]] |
+ | [[]] |
2016年2月16日 (二) 09:28的版本
1 SPEECH PERCEPTION, PRODUCTION AND ACQUISITION [[]] 1.1 Models of speech production 1.2 Physiology and neurophysiology of speech production 1.3 Neural basis of speech production 1.4 Coarticulation 1.5 Models of speech perception 1.6 Physiology and neurophysiology of speech perception 1.7 Neural basis of speech perception 1.8 Acoustic and articulatory cues in speech perception 1.9 Interaction speech production-speech perception 1.10 Multimodal speech perception 1.11 Cognition and brain studies on speech 1.12 Multilingual studies 1.13 L1 acquisition and bilingual acquisition 1.14 L2 acquisition by children and adults 1.15 Speech and hearing disorders 1.16 Singing voice: production and perception 1.17 Speech and other biosignals 1.18 Special Session: Intelligibility under the microscope [[]] 2 PHONETICS, PHONOLOGY, AND PROSODY [[]] 2.1 Phonetics and phonology 2.2 Language descriptions 2.3 Linguistic systems 2.4 Discourse and dialog structures 2.5 Acoustic phonetics 2.6 Phonation, voice quality 2.7 Articulatory and acoustic features of prosody 2.8 Perception of prosody 2.9 Phonological processes and models 2.10 Laboratory phonology 2.11 Phonetic universals 2.12 Sound changes 2.13 Sociophonetics 2.14 Phonetics of L1-L2 interaction [[]] 3 ANALYSIS OF PARALINGUISTICS IN SPEECH AND LANGUAGE [[]] 3.1 Analysis of speaker states 3.2 Analysis of speaker traits 3.3 Automatic analysis of speaker states and traits 3.4 Pathological speech and language 3.5 Non-verbal communication 3.6 Social and vocal signals 3.7 Sentiment analysis and opinion mining 3.8 Paralinguistics in singing 3.9 Perception of paralinguistic phenomena 3.10 Phonetic and linguistic aspects of paralinguistics 3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity 3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders [[]] 4 SPEAKER AND LANGUAGE IDENTIFICATION [[]] 4.1 Language identification and verification 4.2 Dialect and accent recognition 4.3 Speaker verification and identification 4.4 Features for speaker and language recognition 4.5 Robustness to variable and degraded channels 4.6 Speaker confidence estimation 4.7 Speaker diarization 4.8 Higher-level knowledge in speaker and language recognition 4.9 Evaluation of speaker and language identification systems 4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances 4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge [[]] 5 ANALYSIS OF SPEECH AND AUDIO SIGNALS [[]] 5.1 Speech acoustics 5.2 Speech analysis and representation 5.3 Audio signal analysis and representation 5.4 Speech and audio segmentation and classification 5.5 Voice activity detection 5.6 Pitch and harmonic analysis 5.7 Source separation and computational auditory scene analysis 5.8 Speaker spatial localization 5.9 Voice separation 5.10 Music signal processing and understanding 5.11 Singing analysis 5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations [[]] 6 SPEECH CODING AND ENHANCEMENT [[]] 6.1 Speech coding and transmission 6.2 Low-bit-rate speech coding 6.3 Perceptual audio coding of speech signals 6.4 Noise reduction for speech signals 6.5 Speech enhancement: single-channel 6.6 Speech enhancement: multi-channel 6.7 Speech intelligibility 6.8 Active noise control 6.9 Speech enhancement in hearing aids 6.10 Adaptive beamforming for speech enhancement 6.11 Dereverberation for speech signals 6.12 Echo cancelation for speech signals 6.13 Evaluation of speech transmission, coding and enhancement [[]] 7 SPEECH SYNTHESIS AND SPOKEN LANGUAGE GENERATION [[]] 7.1 Grapheme-to-phoneme conversion for synthesis 7.2 Text processing for speech synthesis 7.3 Signal processing/statistical models for synthesis 7.4 Speech synthesis paradigms and methods 7.5 Articulatory speech synthesis 7.6 Segment-level and/or concatenative synthesis 7.7 Unit selection speech synthesis 7.8 Statistical parametric speech synthesis 7.9 Prosody modeling and generation 7.10 Expression, emotion and personality generation 7.11 Synthesis of singing voices 7.12 Voice modification, conversion and morphing 7.13 Concept-to-speech conversion 7.14 Cross-lingual and multilingual aspects in speech synthesis 7.15 Avatars and talking faces 7.16 Tools and data for speech synthesis 7.17 Evaluation of speech synthesis 7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap 7.19 Special Session: Voice Conversion Challenge 2016 [[]] 8 SPEECH RECOGNITION: SIGNAL PROCESSING, ACOUSTIC MODELING, ROBUSTNESS, ADAPTATION [[]] 8.1 Feature extraction and low-level feature modeling for ASR 8.2 Prosodic features and models 8.3 Robustness against noise, reverberation 8.4 Far field and microphone array speech recognition 8.5 Speaker normalization (e.g., VTLN) 8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN) 8.7 Discriminative acoustic training methods for ASR 8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent) 8.9 Speaker adaptation, speaker adapted training methods 8.10 Pronunciation variants and modeling for speech recognition 8.11 Acoustic confidence measures 8.13 Cross-lingual and multilingual aspects, non-native accents 8.14 Acoustic modeling for conversational speech (dialog, interaction) 8.15 Evaluation of speech recognition [[]] 9 SPEECH RECOGNITION - ARCHITECTURE, SEARCH, AND LINGUISTIC COMPONENTS [[]] 9.1 Lexical modeling and access: units and models 9.2 Automatic lexicon learning 9.3 Supervised/unsupervised morphological models 9.4 Prosodic features and models for language modeling 9.5 Discriminative training methods for language modeling 9.6 Language model adaptation (domain, diachronic adaptation) 9.7 Language modeling for conversational speech (dialog, interaction) 9.8 Neural networks for language modeling 9.9 Search methods, decoding algorithms, lattices, multipass strategies 9.10 New computational strategies, data-structures for ASR 9.11 Computational resource constrained speech recognition 9.12 Confidence measures 9.13 Cross-lingual and multilingual components for speech recognition 9.14 Structured classification approaches [[]] 10 SPEECH RECOGNITION - TECHNOLOGIES AND SYSTEMS FOR NEW APPLICATIONS [[]] 10.1 Multimodal systems 10.2 Applications in education and learning (incl. CALL, assessment of fluency) 10.3 Applications in medical practice (CIS, voice assessment, etc.) 10.4 Speech science in end-user applications 10.5 Rich transcription 10.6 Innovative products and services based on speech technologies 10.7 Sparse, template-based representations 10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models) 10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications 10.10 Special Session: Realism in robust speech processing 10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing 10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education [[]] 11 SPOKEN LANGUAGE PROCESSING - DIALOG, SUMMARIZATION, UNDERSTANDING [[]] 11.1 Spoken dialog systems 11.2 Multimodal human-machine interaction (conversat. agents, human-robot) 11.3 Analysis of verbal, co-verbal and nonverbal behavior 11.4 Interactive systems for speech/language training, therapy, communication aids 11.5 Stochastic modeling for dialog 11.6 Question-answering from speech 11.7 Spoken document summarization 11.8 Systems for spoken language understanding 11.9 Topic spotting and classification 11.10 Entity extraction from speech 11.11 Semantic analysis and classification 11.12 Conversation and interaction 11.13 Evaluation of speech and multimodal dialog systems 11.14 Evaluation of summarization and understanding [[]] 12 SPOKEN LANGUAGE PROCESSING: TRANSLATION, INFORMATION RETRIEVAL, RESOURCES [[]] 12.1 Spoken machine translation 12.2 Speech-to-speech translation systems 12.3 Transliteration 12.4 Voice search 12.5 Spoken term detection 12.6 Audio indexing 12.7 Spoken document retrieval 12.8 Systems for mining spoken data, search or retrieval of speech documents 12.9 Speech and multimodal resources and annotation 12.10 Metadata descriptions of speech, audio and text resources 12.11 Metadata for semantic or content markup 12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts) 12.13 Methodologies and tools for language resource construction and annotation 12.14 Automatic segmentation and labeling of resources 12.15 Multilingual resources 12.16 Evaluation and quality insurance of language resources 12.17 Evaluation of translation and information retrieval systems 12.18 Special Session: Open Data for Under-Resourced Languages [[]] 13 SPEECH AND SPOKEN-LANGUAGE BASED MULTIMODAL PROCESSING AND SYSTEMS [[]] 13.1 Multimodal Speech Recognition 13.2 Multimodal LVCSR Systems 13.3 Multimodal Speech Analysis 13.4 Multimodal Synthesis 13.5 Multimodal Language Analysis 13.6 Multimodal and multimedia language trait recognition 13.7 Multimodal paralinguistics 13.8 Multimodal interactions, interfaces 13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines [[]]