2014-09-05
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
- NN LM
AM development
Sparse DNN
- Investigating layer-based DNN training
Noise training
- Noisy training journal paper almost done.
Drop out & Rectification & convolutive network
- Drop out
- No performance improvement found yet.
- [1]
- Rectification
- Dropout NA problem was caused by large magnitude of weights
- Convolutive network
- Test more configurations
Denoising & Farfield ASR
- Lasso-based dereverberation obtained reasonable results
- optimize the training parameters by the development set
- Found similar alpha for both near and far recordings. Need more investigation.
VAD
- Noise model training stuck by local minimal.
- Some discrepancy between CSLT results & Puqiang results
- check if the label is really problematic
- check if short-time spike noise is the major problem (can be solved by spike filtering)
- check if low-energy babble noise caused mismatch (can be solved by global energy detection)
Speech rate training
- Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
- Seems ROS model is superior to the normal one with faster speech
- Need to check distribution of ROS on WSJ
- Suggest to extract speech data of different ROS, construct a new test set
- Suggest to use Tencent training data
- Suggest to remove silence when compute ROS
Scoring
- hold
Confidence
- Implement a tool for data labeling, correcting some errors.
- Finished extraction of two features: DNN posterior + lattice posterior
LM development
Domain specific LM
h2. G determinization problem solved.
h2. NUM tag LM:
- Seems OK with the tag LM.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Interest group setup, reading scheduled every Thusday
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
- Investigate more iterations to obtain a better more
- Checking the discrepancy between the matlab nnet tool & sklearn.
RNN LM
- Prepare WSJ database
- Trained model 10000 x 4 + 320 + 10000
- Start to test on n-best rescore
Speaker ID
- Second model done
Emotion detection
- delivered to Sinovoice
Translation
- v2.0 demo ready
QA
- Labeled 1000 utterances as the evaluation
- 35% 11-class accuracy
- EA not done yet