“Sinovoice-2014-03-04”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=DNN training= ==Environment setting== * Schedule cluster poweroff on 3/06, construct RAID-0 on train212 * The new RAID-0 used three new disks on train212 * Change ...”创建新页面)
 
第152行: 第152行:
 
:* 10ms frame block adaptation: totally wrong
 
:* 10ms frame block adaptation: totally wrong
  
 
+
{| class="wikitable"
prior weight   -1       1       5       10     20     50     100  
+
|-
 
+
!prior weight !!  -1 !!    1 !!    5 !!     10  !!    20 !!    50 !!    100  
200ms           28.29   37.53   35.50   34.08   32.90   32.30   32.77
+
|
500ms           28.29   31.28   30.83   30.22   29.50   29.32   29.36
+
|200ms       ||  28.29 ||  37.53 || 35.50 ||  34.08 ||  32.90 || 32.30 || 32.77
 +
|-
 +
|500ms       ||  28.29 ||  31.28 || 30.83 ||  30.22 ||  29.50 || 29.32 || 29.36
 +
|-
 +
}
  
 
* CMN code delivery
 
* CMN code delivery
 
* Online model adaptation
 
* Online model adaptation

2014年3月4日 (二) 07:23的版本

DNN training

Environment setting

  • Schedule cluster poweroff on 3/06, construct RAID-0 on train212
  • The new RAID-0 used three new disks on train212
  • Change nfs names: disk1 -> /nfs/disk212, the raid disks: /nfs/raid212, disk2->/nfs/raid215

Corpora

  • PICC data are under labeling (200h), ready in one week.
  • 105h data from BJ mobile
  • 127h Hubei telecom
  • Now totally 1121h (470 + 346 + 105 + 200) telephone speech will be ready soon.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data

Telephone model training

470 + 300h + BJ mobile 105h training

Training condition NO NOISE NOISE in LM opt noise NOISE LM + opt noise

No noise: 30.61% - - - noise phone added: 31.88% 30.76% 31.27% 31.07


BJ mobile incremental training

(1) Original 470 + 300 model: 30.24% WER

MPE2 MPE3 MPE3+iLM MPE4+iLM 27.01% 26.72% 25.09% 24.53%


6000 hour 16k training

Training progress

  • Ran CE DNN to iteration 11 (8400 states, 80000 pdf)
  • Testing results go down to 12.49% WER (Sinovoice results: 10.49).
Model WER RT
small LM, it 4, -5/-9 15.80 1.18
large LM, it 4, -5/-9 15.30 1.50
large LM, it 4, -6/-9 15.36 1.30
large LM, it 4, -7/-9 15.25 1.30
large LM, it 5, -5/-9 14.17 1.10
large LM, it 5, -5/-10 13.77 1.29
large LM, it 6, -5/-9 13.64 1.12
large LM, it 6, -5/-10 13.25 1.33
large LM, it 7, -5/-9 13.29 1.12
large LM, it 7, -5/-10 12.87 1.17
large LM, it 8, -5/-9 13.09 -
large LM, it 8, -5/-10 12.69 -
large LM, it 9, -5/-9 12.87 -
large LM, it 9, -5/-10 12.55 -
large LM, it 10, -5/-9 12.83 1.51
large LM, it 10, -5/-10 12.48 1.65
large LM, it 11, -5/-9 12.87 1.61
large LM, it 11, -5/-10 12.46 1.28
large LM, it 12, -5/-9 12.91 1.61
large LM, it 12, -5/-10 12.49 1.28
  • xEnt training is done

Training Analysis

  • Shared tree GMM model training completed, WER% is similar to non-shared model .
  • Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
        di-syl      jt-ph
Xent    15.42%      14.78%       
MPE1    14.46%      14.23%
MPE2    14.22%      14.09%
MPE3    14.26%      13.80%
MPE4    14.24%      13.68%

Hybrid training

  • Receipe
  • 100h MPE training
  • 1700h MPE alignment/lattice
  • 1700h MPE training
  • 1 week to complete 3 MPE iterations
  • MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)


Auto Transcription

PICC

  • PICC development set decoding obtained 45% WER.
  • PICC auto-trans incremental DT training completed
Threshold  WER
org:     45.03%
0.9:     41.89%
0.8:     41.64%
  • Current training data with 0.8 involve 80k sentences, amounting to about 60h data.
  • Sampling 60h labelled data to enrich the training
  • Prepare to compare the unsupervised incremental training and supervised training

Hubei telecom

  • Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%

xEnt org: - wer_15 29.05 MPE iter1:wer_14 29.23;wer_15 29.38 MPE iter2:wer_14 29.05;wer_15 29.11 MPE iter3:wer_14 29.32;wer_15 29.28 MPE iter4:wer_14 29.29;wer_15 29.28

  • retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data

xEnt org: - wer_15 29.05 MPE iter1: - wer_15: 29.36


DNN Decoder

Online decoder

  • Various CMN implementation test
  • 200ms/500ms frame block adaptation
  • 10ms frame block adaptation: totally wrong
}
  • CMN code delivery
  • Online model adaptation
prior weight -1 1 5 10 20 50 100 200ms 28.29 37.53 35.50 34.08 32.90 32.30 32.77
500ms 28.29 31.28 30.83 30.22 29.50 29.32 29.36