“Sinovoice-2014-03-25”版本间的差异
来自cslt Wiki
(以内容“=Environment setting= * Raid215 is a bit slow. Move some den-lattice and alignment to Raid212. =Corpora= * Labeling Beijing Mobile? * Now totally 1229h (470 + 346 +...”创建新页面) |
|||
(相同用户的4个中间修订版本未显示) | |||
第1行: | 第1行: | ||
=Environment setting= | =Environment setting= | ||
− | |||
− | |||
=Corpora= | =Corpora= | ||
− | * Labeling Beijing Mobile | + | * Labeling Beijing Mobile. |
+ | * Next will label the corrupted audio | ||
* Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready. | * Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready. | ||
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | ||
第16行: | 第15行: | ||
===1000h Training=== | ===1000h Training=== | ||
+ | *Baseline: 8k states, 470+300 MPE4, 20.29 | ||
* Jietong phone, 200 hour seed, 10k states training: | * Jietong phone, 200 hour seed, 10k states training: | ||
+ | :*MPE1: 21.91 | ||
+ | :*MPE2: 21.71 | ||
+ | :*MPE3: 21.68 | ||
+ | :*MPE4: 21.86 | ||
− | + | * CSLT phone, 8k states training | |
− | + | :*MPE1: 20.60 | |
− | + | :*MPE2: 20.37 | |
− | + | ||
− | + | ||
− | + | ||
− | * CSLT phone, 8k states | + | |
− | *MPE1: 20.60 | + | |
− | *MPE2: 20.37 | + | |
===PICC dedicated training=== | ===PICC dedicated training=== | ||
第50行: | 第48行: | ||
:* Seems the database is still not very consistent | :* Seems the database is still not very consistent | ||
:* Xiaoming will try to reproduce the Qihang training using this subset | :* Xiaoming will try to reproduce the Qihang training using this subset | ||
− | + | :* Test the 6000 model on jidong data, obtained 2% absolute improvement. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | :* | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
=Language modeling= | =Language modeling= | ||
* Training data ready | * Training data ready | ||
− | * | + | * Xiaoxi from CSLT may be involved after some patent writing |
=DNN Decoder= | =DNN Decoder= | ||
第85行: | 第67行: | ||
stream: MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ] | stream: MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ] | ||
stream: MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ] | stream: MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ] | ||
− | stream: MPE4: | + | stream: MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ] |
</pre> | </pre> |
2014年3月25日 (二) 06:45的最后版本
目录
Environment setting
Corpora
- Labeling Beijing Mobile.
- Next will label the corrupted audio
- Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- LM corpus preparation done.
Acoustic modeling
Telephone model training
1000h Training
- Baseline: 8k states, 470+300 MPE4, 20.29
- Jietong phone, 200 hour seed, 10k states training:
- MPE1: 21.91
- MPE2: 21.71
- MPE3: 21.68
- MPE4: 21.86
- CSLT phone, 8k states training
- MPE1: 20.60
- MPE2: 20.37
PICC dedicated training
- Need to collect financial text data and retrain the LM
- Need to comb word list and training text
6000 hour 16k training
Training progress
- 6000h/CSLT phone set alignment/denlattice completed
- 6000h/jt phone set alignment/denlattice completed
- MPE is kicked off
Train Analysis
- The Qihang model used a subset of the 6k data
- 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
- GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
- Seems the database is still not very consistent
- Xiaoming will try to reproduce the Qihang training using this subset
- Test the 6000 model on jidong data, obtained 2% absolute improvement.
Language modeling
- Training data ready
- Xiaoxi from CSLT may be involved after some patent writing
DNN Decoder
Online decoder adaptation
- Incremental training finished (stream mode)
- 8k sentence test
non-stream baseline MPE5: %WER 9.91 [ 4734 / 47753, 235 ins, 509 del, 3990 sub ] stream: MPE1:%WER 9.66 [ 4612 / 47753, 252 ins, 490 del, 3870 sub ] stream: MPE2: %WER 9.48 [ 4529 / 47753, 251 ins, 477 del, 3801 sub ] stream: MPE3: %WER 9.43 [ 4502 / 47753, 230 ins, 484 del, 3788 sub ] stream: MPE4: %WER 9.39 [ 4482 / 47753, 236 ins, 475 del, 3771 sub ]