2014年2月11日 (二) 14:00的最后版本

DNN training

Environment setting

Another 3 3T disks are ready for RAID-0.
Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.

Corpora

Scripts for confidence generation is ready for auto transcription.
300h telephone speech data (Sinovoice recording) were done.
Adaptation data 900 sentences ready.

470 hour 8k training

300h incremental training (IT) done

Model	CE	MPE1	MPE2	MPE3	MPE4
4k states	23.27/22.85	21.35/18.87	21.18/18.76	21.07/18.54	20.93/18.32
8k states	22.16/22.22	20.55/18.03	20.36/17.94	20.32/17.78	20.29/17.80
8k states + IT	-	20.04/17.38	20.01/17.32	20.07/17.44	19.94/17.65

6000 hour 16k training

Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
Testing results go down to 13.77% WER (Sinovoice results: 11.78).

Model	WER	RT
small LM, it 4, -5/-9	15.80	1.18
large LM, it 4, -5/-9	15.30	1.50
large LM, it 4, -6/-9	15.36	1.30
large LM, it 4, -7/-9	15.25	1.30
large LM, it 5, -5/-9	14.17	1.10
large LM, it 5, -5/-10	13.77	1.29

Adaptation

Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
50 sentences for adaptation, 834 sentences for testing
WER from 14.56 to 11.13
Hidden layer adaptation is better than input and output adaptation
Before Linear adaptation is better than after-linear adaptation
Results are here

DNN Decoder

Comparison between CLG and HCLG decoder

CLG decoder uses less memory in decoding
HCLG is faster and more accurate than CLG, and more amiable to beam control here

Faster decoder

std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
The RT of the latest decoder on train203 is 0.25

Online decoder

Chao will focus on interface change and CMN adaptation.

@@ 第3行： / 第3行： @@
 ==Environment setting==
-* Another 3 3T disks are ready for RADI-0.
+* Another 3 3T disks are ready for RAID-0.
-* Another GPU machine was purchased.
+* Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.
 ==Corpora==
-* Scripts for confidence generation is ready for auto transcription
+* Scripts for confidence generation is ready for auto transcription.
-* 300h telephone speech data (Sinovoice recording) were done
+* 300h telephone speech data (Sinovoice recording) were done.
+* Adaptation data 900 sentences ready.
 ==470 hour 8k training==
@@ 第29行： / 第29行： @@
 * Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
-* Testing results go down to 13% WER.
+* Testing results go down to 13.77% WER (Sinovoice results: 11.78).
 {| class="wikitable"
 ! Model !! WER !! RT
 |-
-|small LM, it 4,  -5/-9  ||15.80 || -
+|small LM, it 4,  -5/-9  ||15.80 || 1.18
 |-
-|large LM, it 4, -5/-9   || 15.30 || -
+|large LM, it 4, -5/-9   || 15.30 || 1.50
 |-
-|large LM,  it 4, -6/-9   || 15.36 || -
+|large LM,  it 4, -6/-9   || 15.36 || 1.30
 |-
-|large LM, it 4, -7/-9    || 15.25 || -
+|large LM, it 4, -7/-9    || 15.25 || 1.30
 |-
-|large LM, it 5, -5/-9    || 14.17 || -
+|large LM, it 5, -5/-9    || 14.17 || 1.10
 |-
-|large LM, it 5,  -5/-10 || 13.77 || -
+|large LM, it 5,  -5/-10 || 13.77 || 1.29
 |-
 |}
 ==Adaptation==
+* Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
+* 50 sentences for adaptation, 834 sentences for testing
+* WER  from 14.56 to 11.13
+* Hidden layer adaptation is better than input and output adaptation
+* Before Linear adaptation is better than after-linear adaptation
+* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=158 here]
 =DNN Decoder=
@@ 第54行： / 第61行： @@
 * Comparison between CLG and HCLG decoder
 :* CLG decoder uses less memory in decoding
-:* HCLG is faster and more accurate than HCLG, and more amiable to beam control [here http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156]
+:* HCLG is faster and more accurate than CLG, and more amiable to beam control [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156 here]
+* Faster decoder
 :* std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
+:* The RT of the latest decoder on train203 is 0.25
+* Online decoder
+:* Chao will focus on interface change and CMN adaptation.

“Sinovoice-2014-12-10”版本间的差异

2014年2月11日 (二) 14:00的最后版本

目录

DNN training

Environment setting

Corpora

470 hour 8k training

6000 hour 16k training

Adaptation

DNN Decoder

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具