“Task List”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第1行: 第1行:
 
=Task To Do=
 
=Task To Do=
 
==Speech Recognition==
 
==Speech Recognition==
*End-to-End speech recognition
+
===End-to-End speech recognition===
:* Discriminative-Learning code implementation
+
* Discriminative-Learning code implementation
::* Zhiyuan Tang
+
:* Zhiyuan Tang
:*Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
+
*Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
  
*Multi-task
+
===Multi-task===
:* Fusion of speech-recognition and speech-rate
+
* Fusion of speech-recognition and speech-rate
::* Xiangyu Zeng
+
:* Xiangyu Zeng
:* Self-informed neural network structure learning
+
* Self-informed neural network structure learning
::* Mengyuan Zhao
+
:* Mengyuan Zhao
  
*Integrate the class information to HCLG fst for speech recognition
+
===Integrate the class information to HCLG fst for speech recognition===
  
*Distant speech recognition
+
===Distant speech recognition===
:*RNN-DAE: echo or reverberation
+
*RNN-DAE: echo or reverberation
::*Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
+
:*Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
:*Reverberation
+
*Reverberation
::*Mutli-microphones
+
:*Mutli-microphones
::*(Lasso),Xuewei Zhang
+
:*(Lasso),Xuewei Zhang
  
*Voice conversation
+
===Voice conversation===
  
*Sparse DNN
+
==Sparse DNN===
:*Zhiyuan Tang
+
*Zhiyuan Tang
  
*Correlation based SENONE cluster
+
===Correlation based SENONE cluster===
  
*NN Multi-GPU parallel traing
+
===NN Multi-GPU parallel traing===
:*Multi-Machine
+
*Multi-Machine
::*Sheng Su
+
:*Sheng Su
:*Multi-GPU on one Machine
+
*Multi-GPU on one Machine
::*Sheng Su
+
:*Sheng Su
:* nnet3 code test
+
* nnet3 code test
  
*Audio Embedding
+
===Audio Embedding===
:*Ke Ning
+
*Ke Ning
  
*RNN training accelerating
+
===RNN training accelerating===
  
*Data selection
+
===Data selection===
:*Zhiyong Zhang
+
*Zhiyong Zhang
:*Sub-modular data selection
+
*Sub-modular data selection
:*Objective-function loss training self-adaptation
+
*Objective-function loss training self-adaptation
  
*Decoder
+
==Decoder===
:*Confidence output for task-required
+
*Confidence output for task-required
  
  
 
==Speaker Verification==
 
==Speaker Verification==
*binary code
+
===binary code===
:*Lantian Li
+
*Lantian Li
  
*RNN-ivector
+
===RNN-ivector===
:*Lantian Li
+
*Lantian Li
  
*DNN clustering
+
===DNN clustering===
:*Lantian Li
+
*Lantian Li
  
 
=Task DONE=
 
=Task DONE=
*Multi-Mode features based VAD
+
===Multi-Mode features based VAD===
:* Shi Yin
+
* Shi Yin
  
*DNN based Language identification and Speaker identification
+
===DNN based Language identification and Speaker identification===
:* Xuewei Zhang/Zhiyuan Tang
+
* Xuewei Zhang/Zhiyuan Tang
  
*Neural network visulization
+
===Neural network visulization===
:* Mian Wang,DONE
+
* Mian Wang,DONE
  
*Dark knowledge
+
===Dark knowledge===
:* Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu
+
* Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu
  
*Normal RNN speech recognition
+
===Normal RNN speech recognition===
:* Mengyuan Zhao
+
* Mengyuan Zhao
  
*Monmentum-like Hessien-Free acceleration
+
===Monmentum-like Hessien-Free acceleration===
:* Nestrov/Adagrad/AdaDelta/AdaM
+
* Nestrov/Adagrad/AdaDelta/AdaM
:* Zhiyong Zhang/Xiangyu Zeng
+
* Zhiyong Zhang/Xiangyu Zeng
  
*Activation value normalization through time --Batch Normalization
+
===Activation value normalization through time --Batch Normalization===
:* Zhiyong Zhang
+
* Zhiyong Zhang
  
*Mix-training Balance decision tree
+
===Mix-training Balance decision tree===
:* Zhiyong Zhang
+
* Zhiyong Zhang
  
*20-h Chinese data-set release
+
===20-h Chinese data-set release===
:* Xuewei Zhang
+
* Xuewei Zhang
  
*Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method
+
===Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method===
:* nne3 test --Xuewei Zhang
+
* nne3 test --Xuewei Zhang
  
 
=Technical Report To Write=
 
=Technical Report To Write=

2015年10月19日 (一) 11:59的版本

Task To Do

Speech Recognition

End-to-End speech recognition

  • Discriminative-Learning code implementation
  • Zhiyuan Tang
  • Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang

Multi-task

  • Fusion of speech-recognition and speech-rate
  • Xiangyu Zeng
  • Self-informed neural network structure learning
  • Mengyuan Zhao

Integrate the class information to HCLG fst for speech recognition

Distant speech recognition

  • RNN-DAE: echo or reverberation
  • Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
  • Reverberation
  • Mutli-microphones
  • (Lasso),Xuewei Zhang

Voice conversation

Sparse DNN=

  • Zhiyuan Tang

Correlation based SENONE cluster

NN Multi-GPU parallel traing

  • Multi-Machine
  • Sheng Su
  • Multi-GPU on one Machine
  • Sheng Su
  • nnet3 code test

Audio Embedding

  • Ke Ning

RNN training accelerating

Data selection

  • Zhiyong Zhang
  • Sub-modular data selection
  • Objective-function loss training self-adaptation

Decoder=

  • Confidence output for task-required


Speaker Verification

binary code

  • Lantian Li

RNN-ivector

  • Lantian Li

DNN clustering

  • Lantian Li

Task DONE

Multi-Mode features based VAD

  • Shi Yin

DNN based Language identification and Speaker identification

  • Xuewei Zhang/Zhiyuan Tang

Neural network visulization

  • Mian Wang,DONE

Dark knowledge

  • Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu

Normal RNN speech recognition

  • Mengyuan Zhao

Monmentum-like Hessien-Free acceleration

  • Nestrov/Adagrad/AdaDelta/AdaM
  • Zhiyong Zhang/Xiangyu Zeng

Activation value normalization through time --Batch Normalization

  • Zhiyong Zhang

Mix-training Balance decision tree

  • Zhiyong Zhang

20-h Chinese data-set release

  • Xuewei Zhang

Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method

  • nne3 test --Xuewei Zhang

Technical Report To Write

1, DNN-DAE based noise cancellation -- Xiangyu Zeng / Mengyuan Zhao / Zhiyong Zhang  --DONE
2, Speech Rate DNN speech recognition --Shi Yin/Xiangyu Zeng --DONE
3, CNN+fbank feature combination --Mian Wang /Yiye Lin /Mengyuan Zhao /Shi Yin
4, Uyghur low-resource acoustic model enhancement -- Shi Yin / Mengyuan Zhao / Zhiyong Zhang --DONE
5, Uyghur 20h database release --Kaer /Shi Yin --DONE
6,Dark-Knowledge Transfer
   *: Xiangyu Zeng/ Mengyuan Zhao / Zhiyong Zhang

Paper to Write

Project

  • Xiaomi TV
  • Mengyuan Zhao/Zhiyong Zhang
  • TAG-lm & Domain-specific general lm
  • Chinese-English mix-training