|
|
第1行: |
第1行: |
− | DNN architecture: | + | ==DNN architecture== |
| | | |
| [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks] | | [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks] |
第12行: |
第12行: |
| | | |
| | | |
− | Visualization | + | ==Visualization== |
| | | |
| [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks] | | [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks] |
第19行: |
第19行: |
| | | |
| | | |
− | Speaker recognition: | + | ==Speaker recognition== |
| | | |
| [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances] | | [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances] |
| | | |
| + | [http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge] |
| | | |
− | [[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | [[9.1 Lexical modeling and access: units and models]]
| |
− |
| |
− | [[9.2 Automatic lexicon learning]]
| |
− |
| |
− | [[9.3 Supervised/unsupervised morphological models]]
| |
− |
| |
− | [[9.4 Prosodic features and models for language modeling]]
| |
− |
| |
− | [[9.5 Discriminative training methods for language modeling]]
| |
− |
| |
− | [[9.6 Language model adaptation (domain, diachronic adaptation)]]
| |
− |
| |
− | [[9.7 Language modeling for conversational speech (dialog, interaction)]]
| |
− |
| |
− | [[9.8 Neural networks for language modeling]]
| |
− |
| |
− | [[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
| |
− |
| |
− | [[9.10 New computational strategies, data-structures for ASR]]
| |
− |
| |
− | [[9.11 Computational resource constrained speech recognition]]
| |
− |
| |
− | [[9.12 Confidence measures]]
| |
− |
| |
− | [[9.13 Cross-lingual and multilingual components for speech recognition]]
| |
− |
| |
− | [[9.14 Structured classification approaches]]
| |
− |
| |
− |
| |
− |
| |
− | ==Speech Recognition - Technologies And Systems For New Applications==
| |
− |
| |
− |
| |
− |
| |
− | [[10.1 Multimodal systems]]
| |
− |
| |
− | [[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
| |
− |
| |
− | [[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
| |
− |
| |
− | [[10.4 Speech science in end-user applications]]
| |
− |
| |
− | [[10.5 Rich transcription]]
| |
− |
| |
− | [[10.6 Innovative products and services based on speech technologies]]
| |
− |
| |
− | [[10.7 Sparse, template-based representations]]
| |
− |
| |
− | [[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
| |
− |
| |
− | [[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
| |
− |
| |
− | [[10.10 Special Session: Realism in robust speech processing ]]
| |
− |
| |
− | [[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
| |
− |
| |
− | [[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
| |
− |
| |
− |
| |
− |
| |
− | ==Spoken Language Processing - Dialog, Summarization, Understanding==
| |
− |
| |
− |
| |
− |
| |
− | [[11.1 Spoken dialog systems]]
| |
− |
| |
− | [[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
| |
− |
| |
− | [[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
| |
− |
| |
− | [[11.4 Interactive systems for speech/language training, therapy, communication aids]]
| |
− |
| |
− | [[11.5 Stochastic modeling for dialog]]
| |
− |
| |
− | [[11.6 Question-answering from speech]]
| |
− |
| |
− | [[11.7 Spoken document summarization]]
| |
− |
| |
− | [[11.8 Systems for spoken language understanding]]
| |
− |
| |
− | [[11.9 Topic spotting and classification]]
| |
− |
| |
− | [[11.10 Entity extraction from speech]]
| |
− |
| |
− | [[11.11 Semantic analysis and classification]]
| |
− |
| |
− | [[11.12 Conversation and interaction]]
| |
− |
| |
− | [[11.13 Evaluation of speech and multimodal dialog systems]]
| |
− |
| |
− | [[11.14 Evaluation of summarization and understanding]]
| |
− |
| |
− |
| |
− |
| |
− | ==Spoken Language Processing: Translation, Information Retrieval, Resources==
| |
− |
| |
− |
| |
− |
| |
− | [[12.1 Spoken machine translation]]
| |
− |
| |
− | [[12.2 Speech-to-speech translation systems]]
| |
− |
| |
− | [[12.3 Transliteration]]
| |
− |
| |
− | [[12.4 Voice search]]
| |
− |
| |
− | [[12.5 Spoken term detection]]
| |
− |
| |
− | [[12.6 Audio indexing]]
| |
− |
| |
− | [[12.7 Spoken document retrieval]]
| |
− |
| |
− | [[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
| |
− |
| |
− | [[12.9 Speech and multimodal resources and annotation]]
| |
− |
| |
− | [[12.10 Metadata descriptions of speech, audio and text resources]]
| |
− |
| |
− | [[12.11 Metadata for semantic or content markup]]
| |
− |
| |
− | [[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
| |
− |
| |
− | [[12.13 Methodologies and tools for language resource construction and annotation]]
| |
− |
| |
− | [[12.14 Automatic segmentation and labeling of resources]]
| |
− |
| |
− | [[12.15 Multilingual resources]]
| |
− |
| |
− | [[12.16 Evaluation and quality insurance of language resources]]
| |
− |
| |
− | [[12.17 Evaluation of translation and information retrieval systems]]
| |
− |
| |
− | [[12.18 Special Session: Open Data for Under-Resourced Languages]]
| |
− |
| |
− |
| |
− |
| |
− | ==Speech And Spoken-Language Based Multimodal Processing And Systems==
| |
− |
| |
− |
| |
− |
| |
− | [[13.1 Multimodal Speech Recognition]]
| |
− |
| |
− | [[13.2 Multimodal LVCSR Systems]]
| |
− |
| |
− | [[13.3 Multimodal Speech Analysis]]
| |
− |
| |
− | [[13.4 Multimodal Synthesis]]
| |
− |
| |
− | [[13.5 Multimodal Language Analysis ]]
| |
− |
| |
− | [[13.6 Multimodal and multimedia language trait recognition ]]
| |
− |
| |
− | [[13.7 Multimodal paralinguistics ]]
| |
− |
| |
− | [[13.8 Multimodal interactions, interfaces]]
| |
− |
| |
− | [[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | =MACHINE LEARNING=
| |
− |
| |
− |
| |
− | ==Learning Methods==
| |
− |
| |
− | [[14.1 Supervised Learning]]
| |
− |
| |
− | [[14.2 Unsupervised Learning]]
| |
− |
| |
− | [[14.3 Reinforcement Learning]]
| |
− |
| |
− | [[14.4 Learning Theory]]
| |
− |
| |
− |
| |
− | [[14.5 Generative Models]]
| |
− |
| |
− | [[14.6 Discriminative Models]]
| |
− |
| |
− |
| |
− | [[14.7 Probabilistic Models]]
| |
− |
| |
− | [[14.8 Bayesian Methods]]
| |
− |
| |
− | [[14.9 Gaussian Processes]]
| |
− |
| |
− |
| |
− |
| |
− | ==Deep Learning==
| |
− |
| |
− | [[15.1 Network Architecture]]
| |
− |
| |
− | [[15.2 Autoencoder]]
| |
− |
| |
− | [[15.3 Representation Learning]]
| |
− |
| |
− | [[15.4 Optimization]]
| |
− |
| |
− | [[15.5 Regularization]]
| |
− |
| |
− | [[15.6 Sparsity]]
| |
− |
| |
− | [[15.7 Transfer Learning]]
| |
− |
| |
− | [[15.8 Sequence Learning]]
| |
− |
| |
− | [[15.9 Online Learning]]
| |
− |
| |
− | [[15.10 Tricks]]
| |
− |
| |
− | [[15.12 visualization]]
| |
− |
| |
− | =SPEAKER RECOGNITION=
| |
− |
| |
− | [[16.1 Deep learning]]
| |
− |
| |
− | [[16.2 Short utterances]]
| |
− |
| |
− | [[16.3 Challenge]]
| |
| | | |
| =Review= | | =Review= |