“Sinovoice-2014-02-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(1位用户的4个中间修订版本未显示)
第50行: 第50行:
 
|large LM, it 5,  -5/-10 || 13.77 || 1.29
 
|large LM, it 5,  -5/-10 || 13.77 || 1.29
 
|-
 
|-
|large LM, it 6, -5/-9    || 13.64 || -
+
|large LM, it 6, -5/-9    || 13.64 || 1.12
 
|-
 
|-
|large LM, it 6,  -5/-10 || 13.25 || -
+
|large LM, it 6,  -5/-10 || 13.25 || 1.33
 
|-
 
|-
|large LM, it 7, -5/-9    || 13.29 || -
+
|large LM, it 7, -5/-9    || 13.29 || 1.12
 
|-
 
|-
|large LM, it 7,  -5/-10 || 12.87 || -
+
|large LM, it 7,  -5/-10 || 12.87 || 1.17
 
|-
 
|-
 
|large LM, it 8, -5/-9    || 13.09 || -
 
|large LM, it 8, -5/-9    || 13.09 || -
第66行: 第66行:
 
|large LM, it 9,  -5/-10 || 12.55 || -
 
|large LM, it 9,  -5/-10 || 12.55 || -
 
|-
 
|-
|large LM, it 10, -5/-9    || 12.83 || -
+
|large LM, it 10, -5/-9    || 12.83 || 1.51
 
|-
 
|-
|large LM, it 10,  -5/-10 || 12.48 || -
+
|large LM, it 10,  -5/-10 || 12.48 || 1.65
 
|-
 
|-
|large LM, it 11, -5/-9    || 12.87 || -
+
|large LM, it 11, -5/-9    || 12.87 || 1.61
 
|-
 
|-
|large LM, it 11,  -5/-10 || 12.46 || -
+
|large LM, it 11,  -5/-10 || 12.46 || 1.28
 
|-
 
|-
 
|}
 
|}
第80行: 第80行:
 
* First version of DT model would be trained with online data (1700h)
 
* First version of DT model would be trained with online data (1700h)
  
==Training Analysis===
+
===Training Analysis===
 
* Shared tree GMM model training completed, WER% is similar to non-shared model .
 
* Shared tree GMM model training completed, WER% is similar to non-shared model .
* Selected 100h online data, trained two systems: (1) di-syllable system (as the one used in the current training) (2) jt-phone system
+
* Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
 
<pre>
 
<pre>
 
         di-syl      jt-ph
 
         di-syl      jt-ph
 
Xent    15.42%      14.78%       
 
Xent    15.42%      14.78%       
 
MPE1    14.46%      14.23%
 
MPE1    14.46%      14.23%
MPE2    14.22%
+
MPE2    14.22%      14.09%
MPE3    14.26%
+
MPE3    14.26%      13.80%
 +
MPE4    14.24%      13.68%
 
</pre>
 
</pre>
  

2014年2月27日 (四) 07:12的最后版本

DNN training

Environment setting

  • Two queues: 100.q dedicated to decoding, all.q dedicated to GMM training/MPE lattice generation
  • disk203-disk210: distributed disks, for parallel jobs
  • /nfs/disk1: train212 GPU task; /nfs/disk2: train215 GPU task

Corpora

  • PICC data are under labeling (200h), ready in one week.
  • 105h data from BJ mobile
  • Now totally 1121h (470 + 346 + 105 + 200) telephone speech will be ready soon.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data

Telephone model training

470 + 300h + BJ mobile 105h training

(1) 105 BJ mobile re-training without NOISE: 33.97% WER

(2) 105 BJ mobile re-training with NOISE phone in training, but decoding without NOISE: 34.27%

(3) (2) + noise-decoding (with noise phone in lexicon/LM), still under investigation

BJ mobile incremental training

(1) Original 470 + 300 model: 30.24% WER

(2) Incremental DT training with 105h BJ data, 27.01% WER

6000 hour 16k training

Training progress

  • Ran CE DNN to iteration 11 (8400 states, 80000 pdf)
  • Testing results go down to 12.46% WER (Sinovoice results: 10.46).
Model WER RT
small LM, it 4, -5/-9 15.80 1.18
large LM, it 4, -5/-9 15.30 1.50
large LM, it 4, -6/-9 15.36 1.30
large LM, it 4, -7/-9 15.25 1.30
large LM, it 5, -5/-9 14.17 1.10
large LM, it 5, -5/-10 13.77 1.29
large LM, it 6, -5/-9 13.64 1.12
large LM, it 6, -5/-10 13.25 1.33
large LM, it 7, -5/-9 13.29 1.12
large LM, it 7, -5/-10 12.87 1.17
large LM, it 8, -5/-9 13.09 -
large LM, it 8, -5/-10 12.69 -
large LM, it 9, -5/-9 12.87 -
large LM, it 9, -5/-10 12.55 -
large LM, it 10, -5/-9 12.83 1.51
large LM, it 10, -5/-10 12.48 1.65
large LM, it 11, -5/-9 12.87 1.61
large LM, it 11, -5/-10 12.46 1.28
  • Additional xEnt training with DNN alignment, should be completed in 2 days
  • DT training is still on the queue, waiting for lattice generation
  • First version of DT model would be trained with online data (1700h)

Training Analysis

  • Shared tree GMM model training completed, WER% is similar to non-shared model .
  • Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
        di-syl      jt-ph
Xent    15.42%      14.78%       
MPE1    14.46%      14.23%
MPE2    14.22%      14.09%
MPE3    14.26%      13.80%
MPE4    14.24%      13.68%

Auto Transcription

  • PICC development set decoding obtained 45% WER.
  • PICC auto-trans incremental DT training completed
Threshold  WER
org:     45.03%
0.9:     41.89%
0.8:     41.64%
  • Current training data with 0.8 involve 80k sentences, amounting to about 60h data.
  • Sampling 60h labelled data to enrich the training

DNN Decoder

  • Online decoder
  • Integration almost completed
  • Initial CMN implementation finished
  • The first step is to tune the prior prob of the global CMN, and then consider re-training with DT.