2014年2月17日 (一) 06:59 Cslt

2014-02-17T06:59:41Z

Cslt：以内容“=DNN training= ==Environment setting== * 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough. * The new machine has been added into the SGE env. ==Corpora== * B...”创建新页面

2014-02-17T06:51:09Z

以内容“=DNN training= ==Environment setting== * 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough. * The new machine has been added into the SGE env. ==Corpora== * B...”创建新页面

新页面

=DNN training=

==Environment setting==

* 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough.
* The new machine has been added into the SGE env.

==Corpora==
* Beijing mobile 120h speech data are ready.
* PICC data are under labeling (200h), ready in two weeks.
* Now totally 1100h telephone speech will be ready soon.

==470 hour 8k training==

* 470 + 300h + Beijing mobile 120h training
:* Re-train the whole models including gmm+dnn, with noise model involved.
:* Train noise model by treating noise as a special phone
:* The noise should be treated specifically in L construction
:* 7.2h per iteration, the xEnt training might be finished in 1 week
:* Run incremental DT training on the CSLT cluster, by mapping noise to the silence phone.

==6000 hour 16k training==

* Ran CE DNN to iteration 8 (8400 states, 80000 pdf)
* Testing results go down to 12.69% WER (Sinovoice results: 10.70).

{| class="wikitable"
! Model !! WER !! RT
|-
|small LM, it 4, -5/-9 ||15.80 || 1.18
|-
|large LM, it 4, -5/-9 || 15.30 || 1.50
|-
|large LM, it 4, -6/-9 || 15.36 || 1.30
|-
|large LM, it 4, -7/-9 || 15.25 || 1.30
|-
|large LM, it 5, -5/-9 || 14.17 || 1.10
|-
|large LM, it 5, -5/-10 || 13.77 || 1.29
|-
|large LM, it 6, -5/-9 || 13.64 || -
|-
|large LM, it 6, -5/-10 || 13.25 || -
|-
|large LM, it 7, -5/-9 || 13.29 || -
|-
|large LM, it 7, -5/-10 || 12.87 || -
|-
|large LM, it 8, -5/-9 || 13.09 || -
|-
|large LM, it 8, -5/-10 || 12.69 || -
|-
|}

* A new round of training with shared trees for tone variations has been kicked off and run into dnn training again.
* Need to test the new gmm model, need to compare to Xiaoming's original settings

==Adaptation==

* Adaptation with 10, 20, 30 sentences are conducted
* 30 sentences can reach reasonable performance (from 14.6 to 11.2).
* Hidden layer adaptation is better than input and output adaptation
* Cross entropy regularization with P=0.3 is reasonable
* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=158 here]

=DNN Decoder=

* Faster decoder
:* The new RT is reported [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=160 here]
:* The RT of the latest decoder on train203 is 0.144 (HCLG) 0.148 (CLG).

* Online decoder
:* Interface design completed
:* CMN strategy is clear: (1) global CMN model be trained first (2) Apply the model in decoding directly (3) may need to adapt the DNN model slightly with the feature.

@@ 第66行： / 第66行： @@
 * Cross entropy regularization with P=0.3 is reasonable
 * Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=158 here]
 =DNN Decoder=

Sinovoice-2014-02-17 - 版本历史

2014年2月17日 (一) 06:59 Cslt

Cslt：以内容“=DNN training= ==Environment setting== * 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough. * The new machine has been added into the SGE env. ==Corpora== * B...”创建新页面