2014-06-27

来自cslt Wiki
2014年6月27日 (五) 01:05Cslt讨论 | 贡献的版本

跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++++++++)


Noise training

  • Paper writing on going

GFbank

  • Running into Sinovoice 8k 1400 + 100 mixture training.
  • FBank/GFbank, stream/non-stream MPE completed:
                                   Huawei disanpi     BJ mobile   8k English data       
FBank non-stream (MPE4)             20.44%              22.28%      24.36%
FBank stream (MPE1)                 20.17%              22.50%      21.63%
GFbank stream    (MPE4)             20.69%              22.84%      24.45%
GFbank non-stream (MPE)             -                     -           -

Multilingual ASR

                                   HW 30h (HW TR LM not involved)     HW30h (HW TR LM involved)
FBank non-stream (MPE4)             22.23                                   21.38
Fbank stream (monolang)             21.64                                   20.72

Denoising & Farfield ASR

  • correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
  • how about the output cmvn test?
  • deliver the recording to /nfs/disk/perm/data/corpora/reverberant

Original model:

xEnt model:
               middle-field    far-field
    dev93       74.79          96.68
    eval92      63.42          94.75

MPE model:


MPE adaptation: 

               middle-field    far-field
    dev93       63.71          94.84
    eval92      52.67          90.45

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
  • report forms

Scoring

  • refine the model with AMIDA database. Local minimum observed.
  • ivector-based speaker detection seems find, reach 96% with 100 speakers


Embedded decoder


AM: 600x4+800 xent9 model: 

bigLM 1e-9
--------------------------------------------------------------------
voc size  |       150k       20k      10k     5k 
-------------------------------------------------------------------- 
graph size|                 9.1M    7.2M    5.5M
--------------------------------------------------------------------
Acc       |       15.96
--------------------------------------------------------------------
RT:       |     
--------------------------------------------------------------------

bigLM 1e-7 
--------------------------------------------------------------------
voc size  |       150k       20k      10k     5k 
-------------------------------------------------------------------- 
graph size|       111       78      61      44
--------------------------------------------------------------------
Acc       |       19.94     23.35   25.92   29.35
--------------------------------------------------------------------
RT:       |       1.69      1.06    1.07    0.98
--------------------------------------------------------------------

HCLG 1e-6 
--------------------------------------------------------------------
voc size  |       150k       20k      10k     5k 
-------------------------------------------------------------------- 
graph size|       98         49       34      24           
--------------------------------------------------------------------
Acc       |       22.49      25.51    27.71   30.71
--------------------------------------------------------------------
RT:       |       0.89       0.70     0.68    0.64
--------------------------------------------------------------------

HCLG 1e-5
--------------------------------------------------------------------
voc size  |       150k       20k      10k     5k 
-------------------------------------------------------------------- 
graph size|       21        6.9       5.5     4.1     
--------------------------------------------------------------------
Acc       |       26.60     29.14     31.02   33.37
--------------------------------------------------------------------
RT:       |       0.68       0.61     0.58    0.56
--------------------------------------------------------------------


LM development

Domain specific LM

  • Baiduzhidao + Weibeo extraction done with various thresholds
  • Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
  • Check proportion of tags int HW 30 h data!!!

Word2Vector

W2V based doc classification

  • Full Gaussian based doc vector
  • represent each doc with a Gaussian distribution of the word vectors it involved.
  • using k-nn to conduct classification
             mean Eur Distance     KL distance  diagonal KL  baseline (NB with mean)

Acc (50dim)    81.84                79.65          -              69.7
  • svm-based classification


                       mean Eur Distance     KL distance    diagonal KL         LDA

2-class Acc (50dim)       -                     -               -                -
8-class Acc (50dim)       -                     -               -                -

Semantic word tree

  • Version v2.0 released (filter with query log)
  • Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
  • Version v3.0 under going. Further refinement with Baidu Baike hierarchy


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training