2014-05-23
来自cslt Wiki
目录
Resoruce Building
- Release management has been started
- Blaster 0.1 & vivian 0.0 system release
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++)
Noise training
- All experiments completed. Combing experiments.
GFbank
- WSJ clean condition done. Obtained the same performance as the time domain implementation
- Should experiment with the Tencent training set.
Multilingual ASR
- Multilingual LM decoding
- Fixing the non-tag bug
English model
RESULTS: (state-gauss = 10000 100000) 1. Shujutang 100h chi-eng 16k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 23.86 | 20.95 | 20.90 | 20.84 | 20.81 | 2. Shujutang 100H chi-eng 8k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 26.27 | 23.63 | 23.14 | 22.93 | 23.00 | 3. voxforge pure eng 16k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 21.38 | 24.89 | 24.50 | 23.31 | 23.13 | 4. fisher pure eng 8k: Not finish yet. LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 40.65 | - | - | - | - |
- Need to experiment with Gigabyptes LM
- Need to check the AM settings and LM used in the Kaldi egs/fisher
Denoising & Farfield ASR
- Baseline: close-talk model decode far-field speech: 92.65
- Will investigate DAE model
Kaiser Window
- Test on different numbers of Fbanks: no significant difference between 23/30/40 (both 8k/16k) [#223]
- Test on Kaiser & Povey window: no significant difference obtained for both 8k/16k [#224, #225]
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- click here
- Need to test small scale network
- 600-800 network
- 100 X 4 + 2
Scoring
- Fixing bug for the stream mode
LM development
Domain specific LM
- English lexicon done, build HCLG
- Re-build LM with the new lexicon
- Tested on Dianxin dev set
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training
QA
FST-based matching
- Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
- THRAX toolkit for grammar to FST
- Investigate determinization of G embedding
- Refer to Kaldi new code