2014年3月25日 (二) 06:45的最后版本

Environment setting

Corpora

Labeling Beijing Mobile.
Next will label the corrupted audio
Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
LM corpus preparation done.

Acoustic modeling

Telephone model training

1000h Training

Baseline: 8k states, 470+300 MPE4, 20.29
Jietong phone, 200 hour seed, 10k states training:

MPE1: 21.91
MPE2: 21.71
MPE3: 21.68
MPE4: 21.86

CSLT phone, 8k states training

MPE1: 20.60
MPE2: 20.37

PICC dedicated training

Need to collect financial text data and retrain the LM
Need to comb word list and training text

6000 hour 16k training

Training progress

6000h/CSLT phone set alignment/denlattice completed
6000h/jt phone set alignment/denlattice completed
MPE is kicked off

Train Analysis

The Qihang model used a subset of the 6k data

2500+950H+tang500h*+20131220, approximately 1700+2400 hours

GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.

Seems the database is still not very consistent
Xiaoming will try to reproduce the Qihang training using this subset
Test the 6000 model on jidong data, obtained 2% absolute improvement.

Language modeling

Training data ready
Xiaoxi from CSLT may be involved after some patent writing

DNN Decoder

Online decoder adaptation

Incremental training finished (stream mode)
8k sentence test

 non-stream baseline  MPE5: %WER 9.91 [ 4734 / 47753, 235 ins, 509 del, 3990 sub ]
 stream:              MPE1：%WER 9.66 [ 4612 / 47753, 252 ins, 490 del, 3870 sub ]
 stream:              MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ]
 stream:              MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ]
 stream:              MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]

@@ 第1行： / 第1行： @@
 =Environment setting=
-* Raid215 is a bit slow. Move some den-lattice and alignment to Raid212.
 =Corpora=
-* Labeling Beijing Mobile?
+* Labeling Beijing Mobile.
+* Next will label the corrupted audio
 * Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
 * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
@@ 第16行： / 第15行： @@
 ===1000h Training===
+*Baseline: 8k states, 470+300 MPE4, 20.29
 * Jietong phone, 200 hour seed, 10k states training:
+:*MPE1: 21.91
+:*MPE2: 21.71
+:*MPE3: 21.68
+:*MPE4: 21.86
-*Baseline: 8k states, 470+300 MPE4, 20.29
+* CSLT phone, 8k states training
-*MPE1: 21.91
+:*MPE1: 20.60
-*MPE2: 21.71
+:*MPE2: 20.37
-*MPE3: 21.68
-*MPE4: 21.86
-* CSLT phone, 8k states traiing
-*MPE1: 20.60
-*MPE2: 20.37
 ===PICC dedicated training===
@@ 第50行： / 第48行： @@
 :* Seems the database is still not very consistent
 :* Xiaoming will try to reproduce the Qihang training using this subset
+:* Test the 6000 model on jidong data, obtained 2% absolute improvement.
-* Tested the 1700h model and 6000h model on the T test sets
-<pre>
-  model/testcase  |   ditu |  due1| entity1 | rec1 | shiji | zaixian1 | zaixian2 | kuaisu
-  ------------------------------------------------------------------------------------------------
-h_mpe       |  12.18 | 12.93 | 5.29   |   3.69     |  21.73  | 25.38   | 19.45   | 12.50
-  ------------------------------------------------------------------------------------------------
-h_xEnt      |  11.13 | 10.12 | 4.64   |   2.80     |  17.67  | 27.45   | 23.23   | 10.98
-</pre>
-:* 6000h model is general better than the 1700h for careful reading or domain specific recording
-:* 6000h with MPE/jt phone set is still on training, but better performance is expected
-:* This indicates that we should prepare domain-specific AM (not only 8k/16k). The online test prefers online training data
-:* Suggest test the 6000 model on jidong data
 =Language modeling=
 * Training data ready
-* First focus on PICC test set, try to improve the PPL
+* Xiaoxi from CSLT may be involved after some patent writing
 =DNN Decoder=
@@ 第85行： / 第67行： @@
   stream:              MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ]
   stream:              MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ]
-  stream:              MPE4:  %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]
+  stream:              MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]
 </pre>

“Sinovoice-2014-03-25”版本间的差异

2014年3月25日 (二) 06:45的最后版本

目录

Environment setting

Corpora

Acoustic modeling

Telephone model training

1000h Training

PICC dedicated training

6000 hour 16k training

Training progress

Train Analysis

Language modeling

DNN Decoder

Online decoder adaptation

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具