“Sinovoice-2014-12-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=DNN training= ==Environment setting== * Another 3 3T disks are ready for RADI-0. * Another GPU machine was purchased. ==Corpora== * Scripts for confidence generati...”创建新页面)
 
 
(相同用户的10个中间修订版本未显示)
第3行: 第3行:
 
==Environment setting==
 
==Environment setting==
  
* Another 3 3T disks are ready for RADI-0.  
+
* Another 3 3T disks are ready for RAID-0.  
* Another GPU machine was purchased.
+
* Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.
  
 
==Corpora==
 
==Corpora==
* Scripts for confidence generation is ready for auto transcription
+
* Scripts for confidence generation is ready for auto transcription.
* 300h telephone speech data (Sinovoice recording) were done
+
* 300h telephone speech data (Sinovoice recording) were done.
 
+
* Adaptation data 900 sentences ready.
  
 
==470 hour 8k training==
 
==470 hour 8k training==
第29行: 第29行:
  
 
* Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
 
* Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
* Testing results go down to 13% WER.
+
* Testing results go down to 13.77% WER (Sinovoice results: 11.78).
  
 
{| class="wikitable"
 
{| class="wikitable"
! Model !! WER !! RT!!
+
! Model !! WER !! RT  
 
|-
 
|-
|small LM, it 4,  -5/-9  ||15.80 || -
+
|small LM, it 4,  -5/-9  ||15.80 || 1.18
 
|-
 
|-
|large LM, it 4, -5/-9  || 15.30 || -
+
|large LM, it 4, -5/-9  || 15.30 || 1.50
 
|-
 
|-
large LM,  it 4, -6/-9  || 15.36 || -
+
|large LM,  it 4, -6/-9  || 15.36 || 1.30
 
|-
 
|-
large LM, it 4, -7/-9    || 15.25 || -
+
|large LM, it 4, -7/-9    || 15.25 || 1.30
 
|-
 
|-
large LM, it 5, -5/-9    || 14.17 || -
+
|large LM, it 5, -5/-9    || 14.17 || 1.10
 
|-
 
|-
|small LM, it 5,  -5/-10 || 13.77 || -
+
|large LM, it 5,  -5/-10 || 13.77 || 1.29
 
|-
 
|-
 
|}
 
|}
  
 
==Adaptation==
 
==Adaptation==
 +
 +
* Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
 +
* 50 sentences for adaptation, 834 sentences for testing
 +
* WER  from 14.56 to 11.13
 +
* Hidden layer adaptation is better than input and output adaptation
 +
* Before Linear adaptation is better than after-linear adaptation
 +
* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=158 here]
  
 
=DNN Decoder=
 
=DNN Decoder=
第54行: 第61行:
 
* Comparison between CLG and HCLG decoder
 
* Comparison between CLG and HCLG decoder
 
:* CLG decoder uses less memory in decoding
 
:* CLG decoder uses less memory in decoding
:* HCLG is faster and more accurate than HCLG, and more amiable to beam control [here http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156]
+
:* HCLG is faster and more accurate than CLG, and more amiable to beam control [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156 here]
 +
 
 +
* Faster decoder
 
:* std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
 
:* std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
 +
:* The RT of the latest decoder on train203 is 0.25
 +
 +
* Online decoder
 +
:* Chao will focus on interface change and CMN adaptation.

2014年2月11日 (二) 14:00的最后版本

DNN training

Environment setting

  • Another 3 3T disks are ready for RAID-0.
  • Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.

Corpora

  • Scripts for confidence generation is ready for auto transcription.
  • 300h telephone speech data (Sinovoice recording) were done.
  • Adaptation data 900 sentences ready.

470 hour 8k training

  • 300h incremental training (IT) done
Model CE MPE1 MPE2 MPE3 MPE4
4k states 23.27/22.85 21.35/18.87 21.18/18.76 21.07/18.54 20.93/18.32
8k states 22.16/22.22 20.55/18.03 20.36/17.94 20.32/17.78 20.29/17.80
8k states + IT - 20.04/17.38 20.01/17.32 20.07/17.44 19.94/17.65

6000 hour 16k training

  • Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
  • Testing results go down to 13.77% WER (Sinovoice results: 11.78).
Model WER RT
small LM, it 4, -5/-9 15.80 1.18
large LM, it 4, -5/-9 15.30 1.50
large LM, it 4, -6/-9 15.36 1.30
large LM, it 4, -7/-9 15.25 1.30
large LM, it 5, -5/-9 14.17 1.10
large LM, it 5, -5/-10 13.77 1.29

Adaptation

  • Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
  • 50 sentences for adaptation, 834 sentences for testing
  • WER from 14.56 to 11.13
  • Hidden layer adaptation is better than input and output adaptation
  • Before Linear adaptation is better than after-linear adaptation
  • Results are here

DNN Decoder

  • Comparison between CLG and HCLG decoder
  • CLG decoder uses less memory in decoding
  • HCLG is faster and more accurate than CLG, and more amiable to beam control here
  • Faster decoder
  • std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
  • The RT of the latest decoder on train203 is 0.25
  • Online decoder
  • Chao will focus on interface change and CMN adaptation.