“Sinovoice-2014-01-20”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=Project management= * Xiaoming and Xiao Na were added into the mail list * Potential Huawei conference-transcribing project was discussed =DNN training= ==Environme...”创建新页面)
 
第1行: 第1行:
=Project management=
 
 
* Xiaoming and Xiao Na were added into the mail list
 
* Potential Huawei conference-transcribing project was discussed
 
 
 
=DNN training=
 
=DNN training=
  
 
==Environment setting==
 
==Environment setting==
  
* New disk space (3T) was created and mounted at /nfs/disk1
+
* Cluster accounts rearrangement
* Jobs with 100 threads work fine on the cluster
+
* Withdraw root/sudo previelege
 +
* Changed NFS server to 40 processes, hope to increase the disk reading speed
 +
* Create a RAID-0 with 3 or 4 3T disks
  
 
==Corpora==
 
==Corpora==
* 60 hour data were cut this week
+
* Change the data labeling strategy: do not label gender and the length of noise in the rest of the corpora.
* Just send out to vendors for labeling
+
* Automatic labeling
* Waiting for out-source platform construction
+
:* Xiaoming will work with Zhiyong to discover how to generate transcriptions with confidence score embedded.
* We assume 60 hour data per week in future
+
:* The first step is to investigate the raw accuracy on the domain-dependent test, and then decide the quality of automatic labeling
  
 
==470 hour 8k training==
 
==470 hour 8k training==
  
* CE training done
+
* MPE training done
* MPE training partially done
+
  
 
{| class="wikitable"
 
{| class="wikitable"
 
! Model !! CE !! MPE1!! MPE2 !! MPE3 !! MPE4
 
! Model !! CE !! MPE1!! MPE2 !! MPE3 !! MPE4
 
|-
 
|-
|4k states||23.27/22.85 || 21.35/18.87 || 21.18/18.76 || 21.07/18.54
+
|4k states ||23.27/22.85 || 21.35/18.87 || 21.18/18.76 || 21.07/18.54 || 20.93/18.32
 
|-
 
|-
|8k states ||22.16/22.22 || - ||20.36/17.94 || - ||
+
|8k states ||22.16/22.22 || 20.55/18.03 ||20.36/17.94 || 20.32/17.78 || 20.29/17.80
 
|-
 
|-
 
|}
 
|}
第33行: 第29行:
 
==6000 hour 16k training==
 
==6000 hour 16k training==
  
* Audio files ready. Files with incorrect sampling rates were removed
+
* Feature extraction done: solved three problems in the data: (1) short wave (2) unmatched file length (3) unmatched sample rate
* Lexicon and LM were ready
+
* Training goes to tri4b, quick increase of states/pdfs
* Making MFCC features
+
* DNN training could be started from Tuesday
* Initial model (6 iterations etc) can be delivered before the spring holiday
+
  
 
=DNN Decoder=
 
=DNN Decoder=
* Initial trail of DNN decoder based on the Sinovoice code was failed, largely due to FST compiler
+
 
* Change the strategy to an integrated approach: use the sinovoice system to control connections, and use Kaldi base for asr engine
+
* Sinovoice decoder: some errors in FST building. Many triphones are lost after graph building. Problems in cdgen?
* Xiaoming will do some investigation on the Sinovoice FST compiler, while Liu Chao will focus on the Kaldi-based decoder
+
* Kaldi decoder:  
 +
:* A minor difference between CLG/HCLG results was find. Debugging into the problem.
 +
:* CLG RT is comparable to the HCLG RT, 0.3-0.4 in CSLT grid-2.
 +
:* Additional optimization on pdf-pre-computing will be investigated.
 +
:* Code deliver today.

2014年1月20日 (一) 07:43的版本

DNN training

Environment setting

  • Cluster accounts rearrangement
  • Withdraw root/sudo previelege
  • Changed NFS server to 40 processes, hope to increase the disk reading speed
  • Create a RAID-0 with 3 or 4 3T disks

Corpora

  • Change the data labeling strategy: do not label gender and the length of noise in the rest of the corpora.
  • Automatic labeling
  • Xiaoming will work with Zhiyong to discover how to generate transcriptions with confidence score embedded.
  • The first step is to investigate the raw accuracy on the domain-dependent test, and then decide the quality of automatic labeling

470 hour 8k training

  • MPE training done
Model CE MPE1 MPE2 MPE3 MPE4
4k states 23.27/22.85 21.35/18.87 21.18/18.76 21.07/18.54 20.93/18.32
8k states 22.16/22.22 20.55/18.03 20.36/17.94 20.32/17.78 20.29/17.80

6000 hour 16k training

  • Feature extraction done: solved three problems in the data: (1) short wave (2) unmatched file length (3) unmatched sample rate
  • Training goes to tri4b, quick increase of states/pdfs
  • DNN training could be started from Tuesday

DNN Decoder

  • Sinovoice decoder: some errors in FST building. Many triphones are lost after graph building. Problems in cdgen?
  • Kaldi decoder:
  • A minor difference between CLG/HCLG results was find. Debugging into the problem.
  • CLG RT is comparable to the HCLG RT, 0.3-0.4 in CSLT grid-2.
  • Additional optimization on pdf-pre-computing will be investigated.
  • Code deliver today.