“Sinovoice-2014-04-08”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Environment setting
 
(相同用户的一个中间修订版本未显示)
第20行: 第20行:
 
* Jietong phone, 200 hour seed, 10k states training:
 
* Jietong phone, 200 hour seed, 10k states training:
 
:* Xent 5 iteration: 23.26  
 
:* Xent 5 iteration: 23.26  
:* Xent 15 iteration done.
+
:* Xent 16 iteration: 22.90
  
  
第33行: 第33行:
 
===Training progress===
 
===Training progress===
  
* 6000h/CSLT phone set alignment/denlattice completed
+
* 6000h/CSLT phone set training
 
:* Xent: 12.83
 
:* Xent: 12.83
 
:* MPE1: 9.21
 
:* MPE1: 9.21
 
:* MPE2: 9.13
 
:* MPE2: 9.13
  
* 6000h/jt phone set alignment/denlattice completed
+
* 6000h/jt phone set phone set training
 
:* lattice done, MPE run 4 days
 
:* lattice done, MPE run 4 days
  

2014年4月9日 (三) 04:34的最后版本

Environment setting

  • Sinovoice internal server deployment. The current configuration involves Gitlab + Trac.
  • Some access permission problems remain, need to investigate other tools.

Corpora

  • 300h Guangxi telecom text transcription prepared. 150h before 18th, April.
  • Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
  • Xiaona will prepare noise database. Start from telephone speech.

Acoustic modeling

Telephone model training

1000h Training

  • Baseline: 8k states, 470+300 MPE4, 20.29
  • Jietong phone, 200 hour seed, 10k states training:
  • Xent 5 iteration: 23.26
  • Xent 16 iteration: 22.90


  • CSLT phone, 8k states training
  • MPE1: 20.60
  • MPE2: 20.37
  • MPE3: 20.37
  • MPE4: 20.37

6000 hour 16k training

Training progress

  • 6000h/CSLT phone set training
  • Xent: 12.83
  • MPE1: 9.21
  • MPE2: 9.13
  • 6000h/jt phone set phone set training
  • lattice done, MPE run 4 days


Train Analysis

  • The Qihang model used a subset of the 6k data
  • 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
  • GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
  • Seems the database is still not very consistent
  • Xiaoming kicked off the job to reproduce the Qihang training using this subset

Multilanguage Training

  • Prepare Chinglish data: Wang Dong provide info.
  • Prepare shared DNN structure for multilingual training

Noise robust feature

  • GFbank can be propagated to Sinovoice
  • Liuchao will prepare fast computing code

Language modeling

  • Training process was delivered.
  • Some problems in encoding.
  • Training text retrieval based on topic models (initial key words)


DNN Decoder

decoder optimization

  • Test computation cost of each step
  • beam 9/5000: netforward 65%
  • beam 13/7000: netforward 28%
  • Liuchao will verify the proportion number with CSLT engine.
  • Zhiyong & Liuchao will deliver the frame-skipping approach.
  • Investigate BigLM retrieval optimization.