“Sinovoice-2014-04-08”版本间的差异
来自cslt Wiki
(→Environment setting) |
|||
(相同用户的一个中间修订版本未显示) | |||
第20行: | 第20行: | ||
* Jietong phone, 200 hour seed, 10k states training: | * Jietong phone, 200 hour seed, 10k states training: | ||
:* Xent 5 iteration: 23.26 | :* Xent 5 iteration: 23.26 | ||
− | :* Xent | + | :* Xent 16 iteration: 22.90 |
第33行: | 第33行: | ||
===Training progress=== | ===Training progress=== | ||
− | * 6000h/CSLT phone set | + | * 6000h/CSLT phone set training |
:* Xent: 12.83 | :* Xent: 12.83 | ||
:* MPE1: 9.21 | :* MPE1: 9.21 | ||
:* MPE2: 9.13 | :* MPE2: 9.13 | ||
− | * 6000h/jt phone set | + | * 6000h/jt phone set phone set training |
:* lattice done, MPE run 4 days | :* lattice done, MPE run 4 days | ||
2014年4月9日 (三) 04:34的最后版本
目录
Environment setting
- Sinovoice internal server deployment. The current configuration involves Gitlab + Trac.
- Some access permission problems remain, need to investigate other tools.
Corpora
- 300h Guangxi telecom text transcription prepared. 150h before 18th, April.
- Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- Xiaona will prepare noise database. Start from telephone speech.
Acoustic modeling
Telephone model training
1000h Training
- Baseline: 8k states, 470+300 MPE4, 20.29
- Jietong phone, 200 hour seed, 10k states training:
- Xent 5 iteration: 23.26
- Xent 16 iteration: 22.90
- CSLT phone, 8k states training
- MPE1: 20.60
- MPE2: 20.37
- MPE3: 20.37
- MPE4: 20.37
6000 hour 16k training
Training progress
- 6000h/CSLT phone set training
- Xent: 12.83
- MPE1: 9.21
- MPE2: 9.13
- 6000h/jt phone set phone set training
- lattice done, MPE run 4 days
Train Analysis
- The Qihang model used a subset of the 6k data
- 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
- GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
- Seems the database is still not very consistent
- Xiaoming kicked off the job to reproduce the Qihang training using this subset
Multilanguage Training
- Prepare Chinglish data: Wang Dong provide info.
- Prepare shared DNN structure for multilingual training
Noise robust feature
- GFbank can be propagated to Sinovoice
- Liuchao will prepare fast computing code
Language modeling
- Training process was delivered.
- Some problems in encoding.
- Training text retrieval based on topic models (initial key words)
DNN Decoder
decoder optimization
- Test computation cost of each step
- beam 9/5000: netforward 65%
- beam 13/7000: netforward 28%
- Liuchao will verify the proportion number with CSLT engine.
- Zhiyong & Liuchao will deliver the frame-skipping approach.
- Investigate BigLM retrieval optimization.