2014-08-22
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
- NN LM
AM development
Sparse DNN
- WJS sparse DNN does not obtain further improvement
Noise training
- Noisy training journal paper almost done.
Drop out & Rectification & convolutive network
- Change learning to 0.001, the training process can be started:
- check the drop probability
- check learning rate
- continuous training
- Rectification
- Rectification itself failed with large weights.
- Including L1 penalty enables the training but got very bad performance.
- Try to set the maximum value with rectifier
- Convolutive network
- Test more configurations
Denoising & Farfield ASR
- Lasso-based dereverberation obtained reasonable results
- spectrum based lasso outperforms fbank based lasso.
- temporal-frequency based lasso outperforms just temporal based lasso.
- using 200 frame to estimate utterance-based lasso coefficients is possible, with marginal performance degradation.
- using lasso can solve the problem of dynamic reverberation.
- Need to investigate static reverberation.
- The 1/3 paper has been checked in to cvs.
VAD
- Found some problems in Puqiang's speech data. Some files are labelled incorrectly.
Speech rate training
- Append an additional dimension to the feature vector, indicating the rate of speech
- The ROS is computed as words per second
Scoring
- Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.
Confidence
- Knowledge prepared
- First experiment with combining lattice-based confidence and DNN confidence.
- Further step will add ROS.
Embedded decoder
- Chatting LM released (80k)
- Train two smaller network: 500x4+600, 400x4+500: on going
- Build a new graph with MPE3 am and chatting LM.
LM development
Domain specific LM
h2. G determinization problem solved
h2. NUM tag LM:
27h JS test: 20.16 vs 20.19 2h JS test: 17.48 vs 17.49
h2. Analyze the property of the tag LM: (1) random NUM should obtain better perfomance; (2) other words are not seriously impacted.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Interest group setup, reading scheduled every Thusday
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
RNN LM
- New toolkit from Thomas obtained
- Need more investigation on the toolkit
Speaker ID
- Second model done
Translation
- Failed due to out of memory
- Re-train the model with limitation on iteration number. Goes to 8th iteration