2014-08-22

Resoruce Building

Leftover questions

Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training
NN LM

AM development

Sparse DNN

WJS sparse DNN does not obtain further improvement

Noise training

Noisy training journal paper almost done.

Drop out & Rectification & convolutive network

Change learning to 0.001, the training process can be started:
1. check the drop probability
2. check learning rate
3. continuous training

Rectification

Rectification itself failed with large weights.
Including L1 penalty enables the training but got very bad performance.
Try to set the maximum value with rectifier

Convolutive network

Test more configurations

Denoising & Farfield ASR

Lasso-based dereverberation obtained reasonable results

spectrum based lasso outperforms fbank based lasso.
temporal-frequency based lasso outperforms just temporal based lasso.
using 200 frame to estimate utterance-based lasso coefficients is possible, with marginal performance degradation.
using lasso can solve the problem of dynamic reverberation.
Need to investigate static reverberation.
The 1/3 paper has been checked in to cvs.

VAD

Found some problems in Puqiang's speech data. Some files are labelled incorrectly.

Speech rate training

Append an additional dimension to the feature vector, indicating the rate of speech
The ROS is computed as words per second

Scoring

Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

Confidence

Knowledge prepared
First experiment with combining lattice-based confidence and DNN confidence.
Further step will add ROS.

Embedded decoder

Chatting LM released (80k)
Train two smaller network: 500x4+600, 400x4+500: on going
Build a new graph with MPE3 am and chatting LM.

LM development

Domain specific LM

h2. G determinization problem solved

h2. NUM tag LM:

27h JS test: 20.16 vs 20.19 2h JS test: 17.48 vs 17.49

h2. Analyze the property of the tag LM: (1) random NUM should obtain better perfomance; (2) other words are not seriously impacted.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Interest group setup, reading scheduled every Thusday
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

RNN LM

New toolkit from Thomas obtained
Need more investigation on the toolkit

Speaker ID

Second model done

Translation

Failed due to out of memory
Re-train the model with limitation on iteration number. Goes to 8th iteration