“Sinovoice-2014-03-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=Environment setting= * Raid215 is a bit slow. Move some den-lattice and alignment to Raid212. =Corpora= * Labeling Beijing Mobile? * Now totally 1229h (470 + 346 +...”创建新页面)
 
 
(相同用户的4个中间修订版本未显示)
第1行: 第1行:
 
=Environment setting=
 
=Environment setting=
 
* Raid215 is a bit slow. Move some den-lattice and alignment to Raid212.
 
  
 
=Corpora=
 
=Corpora=
  
* Labeling Beijing Mobile?
+
* Labeling Beijing Mobile.
 +
* Next will label the corrupted audio
 
* Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
 
* Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
 
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
 
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
第16行: 第15行:
 
===1000h Training===
 
===1000h Training===
  
 +
*Baseline: 8k states, 470+300 MPE4, 20.29
 
* Jietong phone, 200 hour seed, 10k states training:
 
* Jietong phone, 200 hour seed, 10k states training:
 +
:*MPE1: 21.91
 +
:*MPE2: 21.71
 +
:*MPE3: 21.68
 +
:*MPE4: 21.86
  
*Baseline: 8k states, 470+300 MPE4, 20.29
+
* CSLT phone, 8k states training
*MPE1: 21.91
+
:*MPE1: 20.60
*MPE2: 21.71
+
:*MPE2: 20.37
*MPE3: 21.68
+
*MPE4: 21.86
+
 
+
* CSLT phone, 8k states traiing
+
*MPE1: 20.60
+
*MPE2: 20.37
+
  
 
===PICC dedicated training===
 
===PICC dedicated training===
第50行: 第48行:
 
:* Seems the database is still not very consistent
 
:* Seems the database is still not very consistent
 
:* Xiaoming will try to reproduce the Qihang training using this subset
 
:* Xiaoming will try to reproduce the Qihang training using this subset
 
+
:* Test the 6000 model on jidong data, obtained 2% absolute improvement.
* Tested the 1700h model and 6000h model on the T test sets
+
 
+
<pre>
+
  model/testcase  |  ditu |  due1| entity1 | rec1 | shiji | zaixian1 | zaixian2 | kuaisu
+
  ------------------------------------------------------------------------------------------------
+
    1700h_mpe      |  12.18 | 12.93 | 5.29  |  3.69    |  21.73  | 25.38  | 19.45  | 12.50
+
  ------------------------------------------------------------------------------------------------
+
    6000h_xEnt      |  11.13 | 10.12 | 4.64  |  2.80    |  17.67  | 27.45  | 23.23  | 10.98
+
</pre>
+
 
+
:* 6000h model is general better than the 1700h for careful reading or domain specific recording
+
:* 6000h with MPE/jt phone set is still on training, but better performance is expected
+
:* This indicates that we should prepare domain-specific AM (not only 8k/16k). The online test prefers online training data
+
:* Suggest test the 6000 model on jidong data
+
 
+
 
+
  
 
=Language modeling=
 
=Language modeling=
  
 
* Training data ready
 
* Training data ready
* First focus on PICC test set, try to improve the PPL
+
* Xiaoxi from CSLT may be involved after some patent writing
  
 
=DNN Decoder=
 
=DNN Decoder=
第85行: 第67行:
 
  stream:              MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ]
 
  stream:              MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ]
 
  stream:              MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ]
 
  stream:              MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ]
  stream:              MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]
+
  stream:              MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]
 
</pre>
 
</pre>

2014年3月25日 (二) 06:45的最后版本

Environment setting

Corpora

  • Labeling Beijing Mobile.
  • Next will label the corrupted audio
  • Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
  • LM corpus preparation done.

Acoustic modeling

Telephone model training

1000h Training

  • Baseline: 8k states, 470+300 MPE4, 20.29
  • Jietong phone, 200 hour seed, 10k states training:
  • MPE1: 21.91
  • MPE2: 21.71
  • MPE3: 21.68
  • MPE4: 21.86
  • CSLT phone, 8k states training
  • MPE1: 20.60
  • MPE2: 20.37

PICC dedicated training

  • Need to collect financial text data and retrain the LM
  • Need to comb word list and training text


6000 hour 16k training

Training progress

  • 6000h/CSLT phone set alignment/denlattice completed
  • 6000h/jt phone set alignment/denlattice completed
  • MPE is kicked off


Train Analysis

  • The Qihang model used a subset of the 6k data
  • 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
  • GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
  • Seems the database is still not very consistent
  • Xiaoming will try to reproduce the Qihang training using this subset
  • Test the 6000 model on jidong data, obtained 2% absolute improvement.

Language modeling

  • Training data ready
  • Xiaoxi from CSLT may be involved after some patent writing

DNN Decoder

Online decoder adaptation

  • Incremental training finished (stream mode)
  • 8k sentence test
 non-stream baseline  MPE5: %WER 9.91 [ 4734 / 47753, 235 ins, 509 del, 3990 sub ]
 stream:              MPE1:%WER 9.66 [ 4612 / 47753, 252 ins, 490 del, 3870 sub ]
 stream:              MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ]
 stream:              MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ]
 stream:              MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]