Zhiyong Zhang

来自cslt Wiki
2015年1月12日 (一) 11:07Zhangzy讨论 | 贡献的版本

跳转至: 导航搜索

Task To Do

  • 1, RNN speech recognition (Tied-context-dependent-state and End-to-End)
  • 2, Real environment noise cancellation(DNN-DAE/CNN-DAE/RNN-DAE: echo or reverberation)
  • 3, Integrate the class information to HCLG fst for speech recognition
  • 4, Multi-Mode features based VAD
  • 5, DNN based Language identification and Speaker identification
  • 6, Distant speech recognition
  • 7, Voice conversation
  • 8, Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method.
  • 9, Sparse DNN
  • 10, Neural network visulization


Technical Report To Write

  • 1, DNN-DAE based noise cancellation
  • 2, Speech Rate DNN speech recognition
  • 3, CNN+fbank feature combination
  • 4, Uyghur low-resource acoustic model enhancement
  • 5, Uyghur 20h database release
  • 6,

Papers To Read

  • 1, Learned-Norm pooling for deep feedforward and recurrent neural networks


Task schedules

Summary

   --------------------------------------------------------------------------------------------------------
    Priority | Tasks name                    |      Status          |     Notions
   --------------------------------------------------------------------------------------------------------    
        1    | Bi-Softmax                    | ■■■□□□□□□□ | 1400h am training and problem fixing
   --------------------------------------------------------------------------------------------------------
        2    | RNN+DAE                       | □□□□□□□□□□ |
   --------------------------------------------------------------------------------------------------------

Speech Recognition

Multi-lingual Am training

Bi-Softmax

  • Using two distinct softmax for English and Chinese data.
  • Testing on 100h-Ch+100h-En, better performance observed.
  • Now testing the source code on 1400h_8k data, but stange decoding results got.Need to further investigate.