2014-08-22

来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • WJS sparse DNN does not obtain further improvement

Noise training

  • Noisy training journal paper almost done.

Drop out & Rectification & convolutive network

  • Change learning to 0.001, the training process can be started:
    1. check the drop probability
    2. check learning rate
    3. continuous training
  • Rectification
  1. Rectification itself failed with large weights.
  2. Including L1 penalty enables the training but got very bad performance.
  3. Try to set the maximum value with rectifier
  • Convolutive network
  1. Test more configurations


Denoising & Farfield ASR

  • Lasso-based dereverberation obtained reasonable results
  1. spectrum based lasso outperforms fbank based lasso.
  2. temporal-frequency based lasso outperforms just temporal based lasso.
  3. using 200 frame to estimate utterance-based lasso coefficients is possible, with marginal performance degradation.
  4. using lasso can solve the problem of dynamic reverberation.
  5. Need to investigate static reverberation.
  6. The 1/3 paper has been checked in to cvs.

VAD

  • Found some problems in Puqiang's speech data. Some files are labelled incorrectly.


Speech rate training

  • Append an additional dimension to the feature vector, indicating the rate of speech
  • The ROS is computed as words per second

Scoring

  • Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

Confidence

  • Knowledge prepared
  • First experiment with combining lattice-based confidence and DNN confidence.
  • Further step will add ROS.


Embedded decoder

  • Chatting LM released (80k)
  • Train two smaller network: 500x4+600, 400x4+500: on going
  • Build a new graph with MPE3 am and chatting LM.

LM development

Domain specific LM

h2. G determinization problem solved

h2. NUM tag LM:

27h JS test: 20.16 vs 20.19 2h JS test: 17.48 vs 17.49

h2. Analyze the property of the tag LM: (1) random NUM should obtain better perfomance; (2) other words are not seriously impacted.


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Interest group setup, reading scheduled every Thusday
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation


RNN LM

  • New toolkit from Thomas obtained
  • Need more investigation on the toolkit


Speaker ID

  • Second model done


Translation

  • Failed due to out of memory
  • Re-train the model with limitation on iteration number. Goes to 8th iteration