1 Environment setting
2 Corpora
3 Acoustic modeling
- 3.1 Telephone model training
  - 3.1.1 1000h Training
- 3.2 6000 hour 16k training
4 Language modeling
- 4.1 Training recipe transfer
- 4.2 Domain specific atom-LM construction
  - 4.2.1 Some potential problems
  - 4.2.2 Text data filtering
5 DNN Decoder

Environment setting

Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine.
Delivered part of the Kaldi code. Some fix yet waiting for check in.
Email notification is problematic. Need obtain a smtp server

Corpora

300h Guangxi telecom text transcription prepared. 150h before 18th, April.
Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
Xiaona will prepare noise database. Start from telephone speech.

Acoustic modeling

Telephone model training

1000h Training

Baseline: 8k states, 470+300 MPE4, 20.29
Jietong phone, 200 hour seed, 10k states training:

Xent 16 iteration: 22.90
MPE1 : 20.89

CSLT phone, 8k states training

MPE1: 20.60
MPE2: 20.37
MPE3: 20.37
MPE4: 20.37

6000 hour 16k training

Training progress

6000h/CSLT phone set training

Xent: 12.83
MPE1: 9.21
MPE2: 9.13

6000h/jt phone set phone set training

ran into MPE1.

Train Analysis

The Qihang model used a subset of the 6k data

2500+950H+tang500h*+20131220, approximately 1700+2400 hours

GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.

Seems the database is still not very consistent
Xiaoming kicked off the job to reproduce the Qihang training using this subset

Multilanguage Training

Prepare Chinglish data: contacted with a vendor for 1000 hour mobile recording. Will check how much we need
AMIDA database downloading
Build a baseline system
Prepare shared DNN structure for multilingual training

Noise robust feature

GFbank can be propagated to Sinovoice

Let Mengyuan prepare the experiments

Liuchao will prepare fast computing code

Language modeling

Training recipe transfer

Training process was delivered.
Problems in encoding were solved.
Initial CSLT LM buildup completed.

Domain specific atom-LM construction

Some potential problems

Unclear domain definition
Using the same development set (8k transcription) is not very appropriate

Text data filtering

Prepare word list
VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient.
An enhanced toolkit was delivered.
A telecom specific word list is ready, several stop words are ready

DNN Decoder

decoder optimization

Test computation cost of each step

beam 9/5000: netforward 65%
beam 13/7000: netforward 28%

Sinovoice change in Kaldi delivered and ready to check-in
Need to verify the speed of the CSLT engine

Frame-skipping

Zhiyong & Liuchao will deliver the frame-skipping approach.

BigLM optimization

Investigate BigLM retrieval optimization.

Sinovoice-2014-04-15

目录

Environment setting

Corpora

Acoustic modeling

Telephone model training

1000h Training

6000 hour 16k training

Training progress

Train Analysis

Multilanguage Training

Noise robust feature

Language modeling

Training recipe transfer

Domain specific atom-LM construction

Some potential problems

Text data filtering

DNN Decoder

decoder optimization

Frame-skipping

BigLM optimization

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具