Sinovoice-2014-12-10
来自cslt Wiki
DNN training
Environment setting
- Another 3 3T disks are ready for RAID-0.
- Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.
Corpora
- Scripts for confidence generation is ready for auto transcription.
- 300h telephone speech data (Sinovoice recording) were done.
- Adaptation data 900 sentences ready.
470 hour 8k training
- 300h incremental training (IT) done
Model |
CE |
MPE1 |
MPE2 |
MPE3 |
MPE4
|
4k states |
23.27/22.85 |
21.35/18.87 |
21.18/18.76 |
21.07/18.54 |
20.93/18.32
|
8k states |
22.16/22.22 |
20.55/18.03 |
20.36/17.94 |
20.32/17.78 |
20.29/17.80
|
8k states + IT |
- |
20.04/17.38 |
20.01/17.32 |
20.07/17.44 |
19.94/17.65
|
6000 hour 16k training
- Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
- Testing results go down to 13.77% WER (Sinovoice results: 11.78).
Model |
WER |
RT
|
small LM, it 4, -5/-9 |
15.80 |
1.18
|
large LM, it 4, -5/-9 |
15.30 |
1.50
|
large LM, it 4, -6/-9 |
15.36 |
1.30
|
large LM, it 4, -7/-9 |
15.25 |
1.30
|
large LM, it 5, -5/-9 |
14.17 |
1.10
|
large LM, it 5, -5/-10 |
13.77 |
1.29
|
Adaptation
- Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
- 50 sentences for adaptation, 834 sentences for testing
- WER from 14.56 to 11.13
- Hidden layer adaptation is better than input and output adaptation
- Before Linear adaptation is better than after-linear adaptation
- Results are here
DNN Decoder
- Comparison between CLG and HCLG decoder
- CLG decoder uses less memory in decoding
- HCLG is faster and more accurate than HCLG, and more amiable to beam control here
- std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().