Sinovoice-2014-03-11
来自cslt Wiki
目录
Environment setting
- Raid212/Raid215/Disk212 done
Corpora
- PICC data are under labeling (200h) done.
- Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- LM training text need be prepared in 2 days.
Acoustic modeling
Telephone model training
1000h Training
- Training recipe prepared
- Expect to finish in 7 days
PICC dedicated training
Baseline (470+300h): 45.03 + PICC 188h incremental training (th=0.9): 41.89 + PICC 188h incremental training (th=0.8): 41.64 + PICC 188h labelled training: 34.78 + PICC 188h labelled training + PICC text LM: 29.18
6000 hour 16k training
Training progress
- Ran DNN MPE to iteration 5.
- Receipe
- 100h MPE training
- 1700h MPE alignment/lattice
- 1700h MPE training
- 1 week to complete 3 MPE iterations
- MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)
- MPE3 result: 1e-9: 10.48% (8.43%), 1e-10: 10.12% (8.05%)
- MPE4 result: 1e-9: 10.34% (8.31%), 1e-10: 10.03% (7.97%)
- MPE5 result:
Training Analysis
- Shared tree GMM model training completed, WER% is similar to non-shared model .
- Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
di-syl jt-ph GMM: - 20.86% Xent 15.42% 14.78% MPE1 14.46% 14.23% MPE2 14.22% 14.09% MPE3 14.26% 13.80% MPE4 14.24% 13.68%
- HTK training on the same database
- HLDA: 18.22
- HLDA+MPE: 14.40
Hubei telecom
- Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%
xEnt org: - wer_15 29.05 MPE iter1:wer_14 29.23;wer_15 29.38 MPE iter2:wer_14 29.05;wer_15 29.11 MPE iter3:wer_14 29.32;wer_15 29.28 MPE iter4:wer_14 29.29;wer_15 29.28
- retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data
xEnt org: - wer_15 29.05 MPE iter1: - wer_15: 29.36
Language modeling
- Need transfer the training text
DNN Decoder
Online decoder
- CMN code delivered. Integration is done
- CMN pipe code delivered. Model adaptation is on going