Environment setting

Sinovoice internal server deployment. The current configuration involves Gitlab + Trac.
Some access permission problems remain, need to investigate other tools.

Corpora

300h Guangxi telecom text transcription prepared. 150h before 18th, April.
Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
Xiaona will prepare noise database. Start from telephone speech.

Acoustic modeling

Telephone model training

1000h Training

Baseline: 8k states, 470+300 MPE4, 20.29
Jietong phone, 200 hour seed, 10k states training:

Xent 5 iteration: 23.26
Xent 16 iteration: 22.90

CSLT phone, 8k states training

MPE1: 20.60
MPE2: 20.37
MPE3: 20.37
MPE4: 20.37

6000 hour 16k training

Training progress

6000h/CSLT phone set training

Xent: 12.83
MPE1: 9.21
MPE2: 9.13

6000h/jt phone set phone set training

lattice done, MPE run 4 days

Train Analysis

The Qihang model used a subset of the 6k data

2500+950H+tang500h*+20131220, approximately 1700+2400 hours

GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.

Seems the database is still not very consistent
Xiaoming kicked off the job to reproduce the Qihang training using this subset

Multilanguage Training

Prepare Chinglish data: Wang Dong provide info.
Prepare shared DNN structure for multilingual training

Noise robust feature

GFbank can be propagated to Sinovoice
Liuchao will prepare fast computing code

Language modeling

Training process was delivered.
Some problems in encoding.
Training text retrieval based on topic models (initial key words)

DNN Decoder

decoder optimization

Test computation cost of each step

beam 9/5000: netforward 65%
beam 13/7000: netforward 28%

Liuchao will verify the proportion number with CSLT engine.
Zhiyong & Liuchao will deliver the frame-skipping approach.
Investigate BigLM retrieval optimization.

Sinovoice-2014-04-08

目录

Environment setting

Corpora

Acoustic modeling

Telephone model training

1000h Training

6000 hour 16k training

Training progress

Train Analysis

Multilanguage Training

Noise robust feature

Language modeling

DNN Decoder

decoder optimization

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具