2016

来自cslt Wiki

2016年11月3日 (四) 02:31Lilt（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

目录

1 SPEECH PROCESS
2 MACHINE LEARNING
- 2.1 Learning Methods
- 2.2 Deep Learning
3 SPEAKER RECOGNITION
4 Review

SPEECH PROCESS

Speech Perception, Production And Acquisition

1.1 Models of speech production

1.2 Physiology and neurophysiology of speech production

1.3 Neural basis of speech production

1.4 Coarticulation

1.5 Models of speech perception

1.6 Physiology and neurophysiology of speech perception

1.7 Neural basis of speech perception

1.8 Acoustic and articulatory cues in speech perception

1.9 Interaction speech production-speech perception

1.10 Multimodal speech perception

1.11 Cognition and brain studies on speech

1.12 Multilingual studies

1.13 L1 acquisition and bilingual acquisition

1.14 L2 acquisition by children and adults

1.15 Speech and hearing disorders

1.16 Singing voice: production and perception

1.17 Speech and other biosignals

1.18 Special Session: Intelligibility under the microscope

Phonetics, Phonology, And Prosody

2.1 Phonetics and phonology

2.2 Language descriptions

2.3 Linguistic systems

2.4 Discourse and dialog structures

2.5 Acoustic phonetics

2.6 Phonation, voice quality

2.7 Articulatory and acoustic features of prosody

2.8 Perception of prosody

2.9 Phonological processes and models

2.10 Laboratory phonology

2.11 Phonetic universals

2.12 Sound changes

2.13 Sociophonetics

2.14 Phonetics of L1-L2 interaction

Analysis Of Paralinguistics In Speech And Language

3.1 Analysis of speaker states

3.2 Analysis of speaker traits

3.3 Automatic analysis of speaker states and traits

3.4 Pathological speech and language

3.5 Non-verbal communication

3.6 Social and vocal signals

3.7 Sentiment analysis and opinion mining

3.8 Paralinguistics in singing

3.9 Perception of paralinguistic phenomena

3.10 Phonetic and linguistic aspects of paralinguistics

3.11 Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception & Sincerity

3.12 Special Session: Clinical and neuroscience-inspired vocal biomarkers of neurological and psychiatric disorders

Speaker And Language Identification

4.1 Language identification and verification

4.2 Dialect and accent recognition

4.3 Speaker verification and identification

4.4 Features for speaker and language recognition

4.5 Robustness to variable and degraded channels

4.6 Speaker confidence estimation

4.7 Speaker diarization

4.8 Higher-level knowledge in speaker and language recognition

4.9 Evaluation of speaker and language identification systems

4.10 Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances

4.11 Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge

Analysis Of Speech And Audio Signals

5.1 Speech acoustics

5.2 Speech analysis and representation

5.3 Audio signal analysis and representation

5.4 Speech and audio segmentation and classification

5.5 Voice activity detection

5.6 Pitch and harmonic analysis

5.7 Source separation and computational auditory scene analysis

5.8 Speaker spatial localization

5.9 Voice separation

5.10 Music signal processing and understanding

5.11 Singing analysis

5.12 Special Session: Speech, audio, and language processing techniques applied to bird and animal vocalisations

Speech Coding And Enhancement

6.1 Speech coding and transmission

6.2 Low-bit-rate speech coding

6.3 Perceptual audio coding of speech signals

6.4 Noise reduction for speech signals

6.5 Speech enhancement: single-channel

6.6 Speech enhancement: multi-channel

6.7 Speech intelligibility

6.8 Active noise control

6.9 Speech enhancement in hearing aids

6.10 Adaptive beamforming for speech enhancement

6.11 Dereverberation for speech signals

6.12 Echo cancelation for speech signals

6.13 Evaluation of speech transmission, coding and enhancement

Speech Synthesis And Spoken Language Generation

7.1 Grapheme-to-phoneme conversion for synthesis

7.2 Text processing for speech synthesis

7.3 Signal processing/statistical models for synthesis

7.4 Speech synthesis paradigms and methods

7.5 Articulatory speech synthesis

7.6 Segment-level and/or concatenative synthesis

7.7 Unit selection speech synthesis

7.8 Statistical parametric speech synthesis

7.9 Prosody modeling and generation

7.10 Expression, emotion and personality generation

7.11 Synthesis of singing voices

7.12 Voice modification, conversion and morphing

7.13 Concept-to-speech conversion

7.14 Cross-lingual and multilingual aspects in speech synthesis

7.15 Avatars and talking faces

7.16 Tools and data for speech synthesis

7.17 Evaluation of speech synthesis

7.18 Special Session: Singing Synthesis Challenge: Fill-In the Gap

7.19 Special Session: Voice Conversion Challenge 2016

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation

8.1 Feature extraction and low-level feature modeling for ASR

8.2 Prosodic features and models

8.3 Robustness against noise, reverberation

8.4 Far field and microphone array speech recognition

8.5 Speaker normalization (e.g., VTLN)

8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)

8.7 Discriminative acoustic training methods for ASR

8.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)

8.9 Speaker adaptation, speaker adapted training methods

8.10 Pronunciation variants and modeling for speech recognition

8.11 Acoustic confidence measures

8.13 Cross-lingual and multilingual aspects, non-native accents

8.14 Acoustic modeling for conversational speech (dialog, interaction)

8.15 Evaluation of speech recognition

Speech Recognition - Architecture, Search, And Linguistic Components

9.1 Lexical modeling and access: units and models

9.2 Automatic lexicon learning

9.3 Supervised/unsupervised morphological models

9.4 Prosodic features and models for language modeling

9.5 Discriminative training methods for language modeling

9.6 Language model adaptation (domain, diachronic adaptation)

9.7 Language modeling for conversational speech (dialog, interaction)

9.8 Neural networks for language modeling

9.9 Search methods, decoding algorithms, lattices, multipass strategies

9.10 New computational strategies, data-structures for ASR

9.11 Computational resource constrained speech recognition

9.12 Confidence measures

9.13 Cross-lingual and multilingual components for speech recognition

9.14 Structured classification approaches

Speech Recognition - Technologies And Systems For New Applications

10.1 Multimodal systems

10.2 Applications in education and learning (incl. CALL, assessment of fluency)

10.3 Applications in medical practice (CIS, voice assessment, etc.)

10.4 Speech science in end-user applications

10.5 Rich transcription

10.6 Innovative products and services based on speech technologies

10.7 Sparse, template-based representations

10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)

10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications

10.10 Special Session: Realism in robust speech processing

10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing

10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education

Spoken Language Processing - Dialog, Summarization, Understanding

11.1 Spoken dialog systems

11.2 Multimodal human-machine interaction (conversat. agents, human-robot)

11.3 Analysis of verbal, co-verbal and nonverbal behavior

11.4 Interactive systems for speech/language training, therapy, communication aids

11.5 Stochastic modeling for dialog

11.6 Question-answering from speech

11.7 Spoken document summarization

11.8 Systems for spoken language understanding

11.9 Topic spotting and classification

11.10 Entity extraction from speech

11.11 Semantic analysis and classification

11.12 Conversation and interaction

11.13 Evaluation of speech and multimodal dialog systems

11.14 Evaluation of summarization and understanding

Spoken Language Processing: Translation, Information Retrieval, Resources

12.1 Spoken machine translation

12.2 Speech-to-speech translation systems

12.3 Transliteration

12.4 Voice search

12.5 Spoken term detection

12.6 Audio indexing

12.7 Spoken document retrieval

12.8 Systems for mining spoken data, search or retrieval of speech documents

12.9 Speech and multimodal resources and annotation

12.10 Metadata descriptions of speech, audio and text resources

12.11 Metadata for semantic or content markup

12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)

12.13 Methodologies and tools for language resource construction and annotation

12.14 Automatic segmentation and labeling of resources

12.15 Multilingual resources

12.16 Evaluation and quality insurance of language resources

12.17 Evaluation of translation and information retrieval systems

12.18 Special Session: Open Data for Under-Resourced Languages

Speech And Spoken-Language Based Multimodal Processing And Systems

13.1 Multimodal Speech Recognition

13.2 Multimodal LVCSR Systems

13.3 Multimodal Speech Analysis

13.4 Multimodal Synthesis

13.5 Multimodal Language Analysis

13.6 Multimodal and multimedia language trait recognition

13.7 Multimodal paralinguistics

13.8 Multimodal interactions, interfaces

13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines

MACHINE LEARNING

Learning Methods

14.1 Supervised Learning

14.2 Unsupervised Learning

14.3 Reinforcement Learning

14.4 Learning Theory

14.5 Generative Models

14.6 Discriminative Models

14.7 Probabilistic Models

14.8 Bayesian Methods

14.9 Gaussian Processes

Deep Learning

15.1 Network Architecture

15.2 Autoencoder

15.3 Representation Learning

15.4 Optimization

15.5 Regularization

15.7 Transfer Learning

15.8 Sequence Learning

15.9 Online Learning

SPEAKER RECOGNITION

16.1 Deep learning 16.2 Short utterances 16.3 Challenge

Review

取自“http://index.cslt.org/mediawiki/index.php?title=2016&oldid=23516”