Sinovoice-2014-04-01

来自cslt Wiki
跳转至: 导航搜索

Environment setting

  • Create cluster accounts for Wufei and Xiaoxi.

Corpora

  • New Beijing Mobile labeling done (109h).
  • Next will label the corrupted speech from Huawei(97h).
  • 300h text transcription will be ready in April.
  • Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
  • LM corpus preparation done.

Acoustic modeling

Telephone model training

1000h Training

  • Baseline: 8k states, 470+300 MPE4, 20.29
  • Jietong phone, 200 hour seed, 10k states training:
  • Error in training found.
  • Xent run into 5 iterations
  • CSLT phone, 8k states training
  • MPE1: 20.60
  • MPE2: 20.37
  • MPE3: 20.37

6000 hour 16k training

Training progress

  • 6000h/CSLT phone set alignment/denlattice completed
  • Xent: 12.83
  • MPE1: 9.21
  • 6000h/jt phone set alignment/denlattice completed
  • denlattice re-run


Train Analysis

  • The Qihang model used a subset of the 6k data
  • 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
  • GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
  • Seems the database is still not very consistent
  • Xiaoming kicked off the job to reproduce the Qihang training using this subset

Multilanguage Training

  • Prepare Chinglish data
  • Prepare shared DNN structure for multilingual training

Language modeling

  • Training data ready
  • Xiaoxi and Wufei will collaborate to make familiar the training & testing process
  • Initial optimization from telecom.

DNN Decoder

Online decoder adaptation

  • Incremental training finished (stream mode)
  • Online decoder completed
  • Test the proportion of DNN forward and Graph search in decoding