2014-09-22

来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • Investigating layer-based DNN training

Noise training

  • First draft of the noisy training journal paper
  • Check abnormal behavior with large sigma (Yinshi, Liuchao)

Drop out & Rectification & convolutive network

  • Drop out
  • No performance improvement found yet.
  • [1]
  • Rectification
  • Dropout NA problem was caused by large magnitude of weights
  • Convolutive network
  1. Test more configurations


Denoising & Farfield ASR

  • Lasso-based de-reverberation is done with the REVERBERATION toolkit
  • Start to compose the experiment section for the SL paper.

VAD

  • Noise model training done. Under testing.
  • Need to investigate the performance reduction in babble noise. Call Jia.


Speech rate training

  • Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

[2]

  • Seems ROS model is superior to the normal one with faster speech
  • Need to check distribution of ROS on WSJ
  • Suggest to extract speech data of different ROS, construct a new test set
  • Suggest to use Tencent training data
  • Suggest to remove silence when compute ROS

Scoring

  • Pitch & rythmn done.
  • Harmonics hold


Confidence

  • Basic confidence by using lattice-based posterior + DNN posterior + ROS done
  • 23% detection error achieved by balanced model


LM development

Domain specific LM

h2. domain specific count dumped h2. ngram generation is on going


h2. NUM tag LM:

  • HCLG union seems better than G union, when integrating grammar + LM (25->23)
  • Boost specific words like wifi if TAG model does not work for a particular word.


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • probably over-fitting with the MLP training
  • SSA-based local linear mapping still on running
  • Knowledge vector started
  • document obtained from wiki
  • Character to word conversion
  • Design the transform model


RNN LM

  • Prepare WSJ database
  • Trained model 10000 x 4 + 320 + 10000
  • Better performance obtained (4.16-3.47)
  • gigaword sampling for Chinese data

Speaker ID

  • Second model done

Emotion detection

  • delivered to Sinovoice

Translation

  • v3.0 demo released
  • still slow

QA

  • Huilan framework design done
  • Investigate better framework