Sinovoice-2014-01-20

来自cslt Wiki

跳转至：导航、搜索

目录

1 DNN training
2 DNN Decoder

DNN training

Environment setting

Accounts re-arrangement done on the SGE cluster. NO ROOT TO WORK.
Changed NFS server to 40 processes, hope to increase disk reading.
Agree to withdraw root/sudo privilege.
Agree to create a RAID-0 with another 3 3T disks

Corpora

Changed the data labeling strategy: gender and noise length will not be labelled for the following several corpora.
Automatic labeling

Xiaoming will work with Zhiyong to discover how to generate transcriptions with confidence score held.
The first step is to investigate the raw accuracy on the domain-dependent test, and then decide if it is appropriate to use automatic labeling

Xiao Na will prepare 300h telephone speech data (Sinovoice recording). This will be used to improve the 8k model.

470 hour 8k training

MPE training done

Model	CE	MPE1	MPE2	MPE3	MPE4
4k states	23.27/22.85	21.35/18.87	21.18/18.76	21.07/18.54	20.93/18.32
8k states	22.16/22.22	20.55/18.03	20.36/17.94	20.32/17.78	20.29/17.80

6000 hour 16k training

Feature extraction done: solved several problems in the data: (1) short wave (2) unmatched file length (3) unmatched sample rate.
Training has gone to tri4b, quick increase of states/pdfs.
DNN training will be started on Tuesday.

DNN Decoder

Sinovoice decoder: some errors in FST building. Many triphones were lost after C composing. Problems in cdgen?
Kaldi decoder:

A minor difference between CLG/HCLG results was found. Debugging into the problem.
CLG RT is comparable to the HCLG, roughly 0.3-0.4 in CSLT grid-2.
Additional optimization on pdf-pre-computing will be investigated.
Code deliver today.

取自“http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-01-20&oldid=9107”