2016年11月9日 (三) 01:04的版本

@@ 第1行： / 第1行： @@
-DNN architecture:
+==DNN architecture==
 [http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1446.pdf Ying Zhang et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks]
@@ 第12行： / 第12行： @@
-Visualization
+==Visualization==
 [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Visualizing_and_Understanding_Genomic.pdf Visualizing and Understanding Genomic Sequences Using Deep Neural Networks]
@@ 第19行： / 第19行： @@
-Speaker recognition:
+==Speaker recognition==
 [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1b/RedDots.rar# INTERSPEECH 2016 Fri-O-2-2 :Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances]
+[http://192.168.0.51:8888/2016/interspeech2016/WELCOME.html# INTERSPEECH 2016 Fri-O-3-2 : Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge]
-[[8.6 New types of neural network models and learning (e.g., new variants of DNN, CNN)]]
-[[9.1 Lexical modeling and access: units and models]]
-[[9.2 Automatic lexicon learning]]
-[[9.3 Supervised/unsupervised morphological models]]
-[[9.4 Prosodic features and models for language modeling]]
-[[9.5 Discriminative training methods for language modeling]]
-[[9.6 Language model adaptation (domain, diachronic adaptation)]]
-[[9.7 Language modeling for conversational speech (dialog, interaction)]]
-[[9.8 Neural networks for language modeling]]
-[[9.9 Search methods, decoding algorithms, lattices, multipass strategies]]
-[[9.10 New computational strategies, data-structures for ASR]]
-[[9.11 Computational resource constrained speech recognition]]
-[[9.12 Confidence measures]]
-[[9.13 Cross-lingual and multilingual components for speech recognition]]
-[[9.14 Structured classification approaches]]
-==Speech Recognition - Technologies And Systems For New Applications==
-[[10.1 Multimodal systems]]
-[[10.2 Applications in education and learning (incl. CALL, assessment of fluency)]]
-[[10.3 Applications in medical practice (CIS, voice assessment, etc.)]]
-[[10.4 Speech science in end-user applications]]
-[[10.5 Rich transcription]]
-[[10.6 Innovative products and services based on speech technologies]]
-[[10.7 Sparse, template-based representations]]
-[[10.8 New paradigms (e.g. artic. models, silent speech interfaces, topic models)]]
-[[10.9 Special Session: Sub-Saharan African languages: from speech fundamentals to applications]]
-[[10.10 Special Session: Realism in robust speech processing ]]
-[[10.11 Special Session: Sharing Research and Education Resources for Understanding Speech Processing]]
-[[10.12 Special Session: Speech and Language Technologies for Human-Machine Conversation-based Language Education]]
-==Spoken Language Processing - Dialog, Summarization, Understanding==
-[[11.1 Spoken dialog systems]]
-[[11.2 Multimodal human-machine interaction (conversat. agents, human-robot)]]
-[[11.3 Analysis of verbal, co-verbal and nonverbal behavior]]
-[[11.4 Interactive systems for speech/language training, therapy, communication aids]]
-[[11.5 Stochastic modeling for dialog]]
-[[11.6 Question-answering from speech]]
-[[11.7 Spoken document summarization]]
-[[11.8 Systems for spoken language understanding]]
-[[11.9 Topic spotting and classification]]
-[[11.10 Entity extraction from speech]]
-[[11.11 Semantic analysis and classification]]
-[[11.12 Conversation and interaction]]
-[[11.13 Evaluation of speech and multimodal dialog systems]]
-[[11.14 Evaluation of summarization and understanding]]
-==Spoken Language Processing: Translation, Information Retrieval, Resources==
-[[12.1 Spoken machine translation]]
-[[12.2 Speech-to-speech translation systems]]
-[[12.3 Transliteration]]
-[[12.4 Voice search]]
-[[12.5 Spoken term detection]]
-[[12.6 Audio indexing]]
-[[12.7 Spoken document retrieval]]
-[[12.8 Systems for mining spoken data, search or retrieval of speech documents]]
-[[12.9 Speech and multimodal resources and annotation]]
-[[12.10 Metadata descriptions of speech, audio and text resources]]
-[[12.11 Metadata for semantic or content markup]]
-[[12.12 Metadata for ling./discourse structure (disfluencies, boundaries, speech acts)]]
-[[12.13 Methodologies and tools for language resource construction and annotation]]
-[[12.14 Automatic segmentation and labeling of resources]]
-[[12.15 Multilingual resources]]
-[[12.16 Evaluation and quality insurance of language resources]]
-[[12.17 Evaluation of translation and information retrieval systems]]
-[[12.18 Special Session: Open Data for Under-Resourced Languages]]
-==Speech And Spoken-Language Based Multimodal Processing And Systems==
-[[13.1 Multimodal Speech Recognition]]
-[[13.2 Multimodal LVCSR Systems]]
-[[13.3 Multimodal Speech Analysis]]
-[[13.4 Multimodal Synthesis]]
-[[13.5 Multimodal Language Analysis ]]
-[[13.6 Multimodal and multimedia language trait recognition ]]
-[[13.7 Multimodal paralinguistics ]]
-[[13.8 Multimodal interactions, interfaces]]
-[[13.9 Special Session: Auditory-visual expressive speech and gesture in humans and machines]]
-=MACHINE LEARNING=
-==Learning Methods==
-[[14.1 Supervised Learning]]
-[[14.2 Unsupervised Learning]]
-[[14.3 Reinforcement Learning]]
-[[14.4 Learning Theory]]
-[[14.5 Generative Models]]
-[[14.6 Discriminative Models]]
-[[14.7 Probabilistic Models]]
-[[14.8 Bayesian Methods]]
-[[14.9 Gaussian Processes]]
-==Deep Learning==
-[[15.1 Network Architecture]]
-[[15.2 Autoencoder]]
-[[15.3 Representation Learning]]
-[[15.4 Optimization]]
-[[15.5 Regularization]]
-[[15.6 Sparsity]]
-[[15.7 Transfer Learning]]
-[[15.8 Sequence Learning]]
-[[15.9 Online Learning]]
-[[15.10 Tricks]]
-[[15.12 visualization]]
-=SPEAKER RECOGNITION=
-[[16.1 Deep learning]]
-[[16.2 Short utterances]]
-[[16.3 Challenge]]
 =Review=