2014-06-20

来自cslt Wiki
2014年6月20日 (五) 02:37Cslt讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

Resoruce Building

  • release management combing done.

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (+++++++)
  • Paper revision done.

Noise training

  • Paper writing will be started this week

GFbank

  • Running into Sinovoice 8k 1400 + 100 mixture training.
  • GFbank 14 xEnt iteration completed:
                                  Huawei disanpi     BJ mobile   8k English data       

FBank non-stream (MPE4) 20.44% 22.28% 24.36% GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -

Multilingual ASR

                                  HW 30h (HW TR LM not involved)     HW30h (HW TR LM involved)

FBank non-stream (MPE4) 22.23 21.38 Fbank stream (monolang) 21.64 20.72

GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -

Denoising & Farfield ASR

  • Replay may cause time delay. This should be solved by cross-correlation detection.
  • Single-layer network with more hidden units. failed.
  • Looks like the problem resides in large magnitude on output data.
  • New recordings (one almost near mic & one far field 2 meters)

Original model:

xEnt model:

              middle-field    far-field
   dev93       74.79          96.68
   eval92      63.42          94.75

MPE model:


MPE adaptation:

              middle-field    far-field
   dev93       63.71          94.84
   eval92      52.67          90.45


VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD



Scoring

  • Collect more data with human scoring to train discriminative models


Embedded decoder

FSA size:

threshold 1e-5 1e-6 1e-7 1e-8 1e-9 5k 480k 5.5M 44M - 1.1G 10k 731k 7M 61M 20k 1.2M 8.8M 78M(301M)


600 X 4+800 AM, beam9: 
        150k       20k     10k      5k 
WER     15.96       -       -       -
RT       X         0.94     -       -

LM development

Domain specific LM

  • Baiduzhidao + Weibeo extraction done with various thresholds
  • Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
  • Check proportion of tags int HW 30 h data

Word2Vector

W2V based doc classification

  • Full Gaussian based doc vector
  • represent each doc with a Gaussian distribution of the word vectors it involved.
  • using k-nn to conduct classification
            mean Eur Distance     KL distance    baseline (NB with mean)

Acc (50dim) 81.84 79.65 69.7


Semantic word tree

  • First version based on pattern match done
  • Filter with query log
  • Further refinement with Baidu Baike hierarchy


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training