Sinovoice-2014-02-25
来自cslt Wiki
目录
DNN training
Environment setting
- Two queues: 100.q dedicated to decoding, all.q dedicated to GMM training/MPE lattice generation
- disk203-disk210: distributed disks, for parallel jobs
- /nfs/disk1: train212 GPU task; /nfs/disk2: train215 GPU task
Corpora
- PICC data are under labeling (200h), ready in one week.
- 105h data from BJ mobile
- Now totally 1121h (470 + 346 + 105 + 200) telephone speech will be ready soon.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data
Telephone model training
470 + 300h + BJ mobile 105h training
(1) 105 BJ mobile re-training without NOISE: 33.97% WER
(2) 105 BJ mobile re-training with NOISE phone in training, but decoding without NOISE: 34.27%
(3) (2) + noise-decoding (with noise phone in lexicon/LM), still under investigation
BJ mobile incremental training
(1) Original 470 + 300 model: 30.24% WER
(2) Incremental DT training with 105h BJ data, 27.01% WER
6000 hour 16k training
Training progress
- Ran CE DNN to iteration 11 (8400 states, 80000 pdf)
- Testing results go down to 12.46% WER (Sinovoice results: 10.46).
Model | WER | RT |
---|---|---|
small LM, it 4, -5/-9 | 15.80 | 1.18 |
large LM, it 4, -5/-9 | 15.30 | 1.50 |
large LM, it 4, -6/-9 | 15.36 | 1.30 |
large LM, it 4, -7/-9 | 15.25 | 1.30 |
large LM, it 5, -5/-9 | 14.17 | 1.10 |
large LM, it 5, -5/-10 | 13.77 | 1.29 |
large LM, it 6, -5/-9 | 13.64 | - |
large LM, it 6, -5/-10 | 13.25 | - |
large LM, it 7, -5/-9 | 13.29 | - |
large LM, it 7, -5/-10 | 12.87 | - |
large LM, it 8, -5/-9 | 13.09 | - |
large LM, it 8, -5/-10 | 12.69 | - |
large LM, it 9, -5/-9 | 12.87 | - |
large LM, it 9, -5/-10 | 12.55 | - |
large LM, it 10, -5/-9 | 12.83 | - |
large LM, it 10, -5/-10 | 12.48 | - |
large LM, it 11, -5/-9 | 12.87 | - |
large LM, it 11, -5/-10 | 12.46 | - |
- Additional xEnt training with DNN alignment, should be completed in 2 days
- DT training is still on the queue, waiting for lattice generation
- First version of DT model would be trained with online data (1700h)
Training Analysis
- Shared tree GMM model training completed, WER% is similar to non-shared model .
- Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
di-syl jt-ph Xent 15.42% 14.78% MPE1 14.46% 14.23% MPE2 14.22% 14.09% MPE3 14.26% 13.80% MPE4 14.24% 13.68%
Auto Transcription
- PICC development set decoding obtained 45% WER.
- PICC auto-trans incremental DT training completed
Threshold WER org: 45.03% 0.9: 41.89% 0.8: 41.64%
- Current training data with 0.8 involve 80k sentences, amounting to about 60h data.
- Sampling 60h labelled data to enrich the training
DNN Decoder
- Online decoder
- Integration almost completed
- Initial CMN implementation finished
- The first step is to tune the prior prob of the global CMN, and then consider re-training with DT.