2014-08-29
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
- NN LM
AM development
Sparse DNN
- WJS sparse DNN does not obtain further improvement
Noise training
- Error found in data setting. Re-run the test with gamma=20,30
- Re-run test with gamma=1,0.1
- Noisy training journal paper almost done.
Drop out & Rectification & convolutive network
- Change learning to 0.001, the training process can be started:
- change the drop probability from 0.5 to 0.2. Frame accuracy is improved. WER seems problematic.
- Experiment learning rate 1 and 8, NA
- Rectification
- Rectification itself failed with large weights.
- Including L1 penalty enables the training but got very bad performance.
- Try to set the maximum value with rectifier
- Convolutive network
- Test more configurations
Denoising & Farfield ASR
- Lasso-based dereverberation obtained reasonable results
- Found some specious problems with frequency-dependent Lasso.
- Proposed full frequency Lasso & full frequency-temporal Lasso.
- good performance was obtained with F-dependent Lasso
- Near data: 10.79 -> 10.35 (lamdba=0.05)
- Far data : 40.53 -> 35.65 (lambda=0.15)
VAD
- Some discrepancy between CSLT results & Puqiang results
- check if the label is really problematic
- check if short-time spike noise is the major problem (can be solved by spike filtering)
- check if low-energy babble noise caused mismatch (can be solved by global energy detection)
- test noise data trained model
Speech rate training
- Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
- Seems ROS model is superior to the normal one with faster speech
- Need to check distribution of ROS on WSJ
- Suggest to extract speech data of different ROS, construct a new test set
- Suggest to use Tencent training data
- Suggest to remove silence when compute ROS
Scoring
- hold
Confidence
- Implement a tool for data labeling
- Finished extraction of two features: DNN posterior + lattice posterior
LM development
Domain specific LM
h2. G determinization problem re-open.
h2. NUM tag LM:
27h JS test: 20.16 vs 20.19 2h JS test: 17.48 vs 17.49
h2. Analyze the property of the tag LM: (1) random NUM should obtain better performance; (2) other words are not seriously impacted.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Interest group setup, reading scheduled every Thusday
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
RNN LM
- New toolkit from Thomas obtained.
- Prepare WSJ database, re-test RNN.
Speaker ID
- Second model done
Emotion detection
- initial performance obtained
Translation
- Failed due to out of memory
- Re-start the training due to some errors in grid