2014-09-05

来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • Investigating layer-based DNN training

Noise training

  • Noisy training journal paper almost done.

Drop out & Rectification & convolutive network

  • Drop out
  • No performance improvement found yet.
  • [1]
  • Rectification
  • Dropout NA problem was caused by large magnitude of weights
  • Convolutive network
  1. Test more configurations


Denoising & Farfield ASR

  • Lasso-based dereverberation obtained reasonable results
  • optimize the training parameters by the development set
  • Found similar alpha for both near and far recordings. Need more investigation.

VAD

  • Noise model training stuck by local minimal.
  • Some discrepancy between CSLT results & Puqiang results

[2]

  • check if the label is really problematic
  • check if short-time spike noise is the major problem (can be solved by spike filtering)
  • check if low-energy babble noise caused mismatch (can be solved by global energy detection)

Speech rate training

  • Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

[3]

  • Seems ROS model is superior to the normal one with faster speech
  • Need to check distribution of ROS on WSJ
  • Suggest to extract speech data of different ROS, construct a new test set
  • Suggest to use Tencent training data
  • Suggest to remove silence when compute ROS

Scoring

  • hold

Confidence

  • Implement a tool for data labeling, correcting some errors.
  • Finished extraction of two features: DNN posterior + lattice posterior

LM development

Domain specific LM

h2. G determinization problem solved.

h2. NUM tag LM:

  • Seems OK with the tag LM.

[4]


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Interest group setup, reading scheduled every Thusday
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • Investigate more iterations to obtain a better more
  • Checking the discrepancy between the matlab nnet tool & sklearn.

RNN LM

  • Prepare WSJ database
  • Trained model 10000 x 4 + 320 + 10000
  • Start to test on n-best rescore


Speaker ID

  • Second model done

Emotion detection

  • delivered to Sinovoice

Translation

  • v2.0 demo ready

QA

  • Labeled 1000 utterances as the evaluation
  • 35% 11-class accuracy
  • EA not done yet