Sinovoice-2014-04-22

h1. Environment setting

Sinovoice internal server deployment. Usage standard draft is released
Email notification is problematic. Need obtain an SMTP server
Will train an redmine administrator for Sinovoice

h1. Corpora

300h Guangxi telecom text transcription prepared. 180h completed.
Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
Xiaona is preparing noise database. Extract noise data from the original wav files.

h1. Acoustic modeling

h2. Telephone model training

h3. 1000h Training

Baseline: 8k states, 470+300 MPE4, 20.29
Jietong phone, 200 hour seed, 10k states training:

Xent 16 iteration: 22.90
MPE1 : 20.89
MPE2 : 20.68
MPE3 : 20.61
MPE4 : 20.56

CSLT phone, 8k states training

MPE1: 20.60
MPE2: 20.37
MPE3: 20.37
MPE4: 20.37

Found a problem on data processing. Some data were cut off incorrectly. Re-training the model.

h2. 6000 hour 16k training

h3. Training progress

Baseline: 1700h, MPE5, JT phone. 9.91

6000h/CSLT phone set training

Xent: 12.83
MPE1: 9.21
MPE2: 9.13
MPE3: 9.10

6000h/jt phone set phone set training

MPE1: 10.63

h3. Train Analysis

The Qihang model used a subset of the 6k data

2500+950H+tang500h*+20131220, approximately 1700+2400 hours

GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.

Seems the database is still not very consistent
Xiaoming kicked off the job to reproduce the Qihang training using this subset

h3. Multilanguage Training

Prepare Chinglish data: will try to select 100h first to train a baseline model
AMIDA database downloading
Prepare shared DNN structure for multilingual training
Baseline Chinese-English system is done
Need some configuration on the size of hidden layers, need more sharing structure
Need investigate knowledge based phone sharing

h3. Noise robust feature

GFbank can be propagated to Sinovoice

1700h JT phone: MPE3: Fbank: 10.48 GFBank: 10.23
Prepare to train the 1000h telephone speech

Liuchao will prepare fast computing code

h1. Language modeling

h2. Domain specific atom-LM construction

h3. Some potential problems

Unclear domain definition
Using the same development set (8k transcription) is not very appropriate

h3. Text data filtering

A telecom specific word list is ready. Will work with Xiaona to prepare a new version of lexicon.
A comparison of document classification is done by LiuRong:

                   财经       IT      健康     体育     旅游     教育      招聘     文化      军事       总体
vsm                0.92    0.906   0.921    0.983   0.954    0.916     0.953    0.996     0.9339   0.94
lda（50）           0.84    0.39    0.79     0.85    0.60     0.368     0.61     0.31      0.86     0.62
w2v （50）          0.69    0.77    0.67     0.59    0.70     0.62      0.74     0.79      0.88     0.73

h1. DNN Decoder

h2. decoder optimization

Test computation cost of each step

beam 9/5000: netforward 65%
beam 13/7000: netforward 28%
This has been verified by Liuchao with the CSLT engine

The acceleration code was checked in to GIT, with small modification on heap management.

h2. Frame-skipping

Zhiyong & Liuchao will deliver the frame-skipping approach.

h2. BigLM optimization

Investigate BigLM retrieval optimization.

Sinovoice-2014-04-22

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具