Sinovoice-2014-12-10

来自cslt Wiki

2014年2月10日 (一) 05:42Cslt（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

目录

1 DNN training
2 DNN Decoder

DNN training

Environment setting

Another 3 3T disks are ready for RADI-0.
Another GPU machine was purchased.

Corpora

Scripts for confidence generation is ready for auto transcription
300h telephone speech data (Sinovoice recording) were done

470 hour 8k training

300h incremental training (IT) done

Model	CE	MPE1	MPE2	MPE3	MPE4
4k states	23.27/22.85	21.35/18.87	21.18/18.76	21.07/18.54	20.93/18.32
8k states	22.16/22.22	20.55/18.03	20.36/17.94	20.32/17.78	20.29/17.80
8k states + IT	-	20.04/17.38	20.01/17.32	20.07/17.44	19.94/17.65

6000 hour 16k training

Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
Testing results go down to 13% WER.

Model	WER	RT
small LM, it 4, -5/-9	15.80	-
large LM, it 4, -5/-9	15.30	-
large LM, it 4, -6/-9	15.36	-
large LM, it 4, -7/-9	15.25	-
large LM, it 5, -5/-9	14.17	-
large LM, it 5, -5/-10	13.77	-

Adaptation

DNN Decoder

Comparison between CLG and HCLG decoder

CLG decoder uses less memory in decoding
HCLG is faster and more accurate than HCLG, and more amiable to beam control [here http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156]
std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().

取自“http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-12-10&oldid=9157”