Sinovoice-2014-12-10
来自cslt Wiki
目录
DNN training
Environment setting
- Another 3 3T disks are ready for RADI-0.
- Another GPU machine was purchased.
Corpora
- Scripts for confidence generation is ready for auto transcription
- 300h telephone speech data (Sinovoice recording) were done
470 hour 8k training
- 300h incremental training (IT) done
Model | CE | MPE1 | MPE2 | MPE3 | MPE4 |
---|---|---|---|---|---|
4k states | 23.27/22.85 | 21.35/18.87 | 21.18/18.76 | 21.07/18.54 | 20.93/18.32 |
8k states | 22.16/22.22 | 20.55/18.03 | 20.36/17.94 | 20.32/17.78 | 20.29/17.80 |
8k states + IT | - | 20.04/17.38 | 20.01/17.32 | 20.07/17.44 | 19.94/17.65 |
6000 hour 16k training
- Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
- Testing results go down to 13% WER.
Model | WER | RT |
---|---|---|
small LM, it 4, -5/-9 | 15.80 | - |
large LM, it 4, -5/-9 | 15.30 | - |
large LM, it 4, -6/-9 | 15.36 | - |
large LM, it 4, -7/-9 | 15.25 | - |
large LM, it 5, -5/-9 | 14.17 | - |
large LM, it 5, -5/-10 | 13.77 | - |
Adaptation
DNN Decoder
- Comparison between CLG and HCLG decoder
- CLG decoder uses less memory in decoding
- HCLG is faster and more accurate than HCLG, and more amiable to beam control [here http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156]
- std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().