Sinovoice-2014-03-18
来自cslt Wiki
目录
Environment setting
- Raid215 is a bit slow. Move some den-lattice and alignment to Raid212.
Corpora
- PICC data are under labeling (200h) done.
- Huibei telecom data labeling (108h) done.
- Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- LM corpus done
Acoustic modeling
Telephone model training
1000h Training
- Xent completed. Compiling lattices.
- Need to test the xEnt performance
PICC dedicated training
- Need to collect financial text data and retrain the LM
- Need to comb word list and training text
6000 hour 16k training
Training progress
- 6000h/CSLT phone set alignment/denlattice completed
- 6000h/jt phone set alignment/denlattice completed
Train Analysis
- The Qihang model used a subset of the 6k data
- 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
- GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
- Seems the database is not very consistent as well
- Xiaoming will try to reproduce the Qihang training using the big database
- Test 1700h model and 6000h model on T test
model/testcase | ditu | due1| entity1 | rec1 | shiji | zaixian1 | zaixian2 | kuaisu ------------------------------------------------------------------------------------------------ 1700h_mpe | 12.18 | 12.93 | 5.29 | 3.69 | 21.73 | 25.38 | 19.45 | 12.50 ------------------------------------------------------------------------------------------------ 6000h_xEnt | 11.13 | 10.12 | 4.64 | 2.80 | 17.67 | 27.45 | 23.23 | 10.98
- 6000h data is general better than 1700h for careful reading or domain specific recording
- 6000h with MPE/jt phoneset is on training, but better performance is expected
- Suggest test the 6000 model on jidong
- Suggest online test prefers online training data
Hubei telecom
- Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%
xEnt org: - wer_15 29.05 MPE iter1:wer_14 29.23;wer_15 29.38 MPE iter2:wer_14 29.05;wer_15 29.11 MPE iter3:wer_14 29.32;wer_15 29.28 MPE iter4:wer_14 29.29;wer_15 29.28
- retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data
xEnt org: - wer_15 29.05 MPE iter1: - wer_15: 29.36
- Incremental training with Hubei telecome based on the model (470+300+BJmobile)
- MPE4 modeltraining done: org: 27.30, Hubei model: 25.42
Language modeling
- Training data ready
- Focus on PICC test set
DNN Decoder
Online decoder adaptation
- Finished alignment/den-lattice
- 1st round MPE training on going, 2 days/iteration