2014-06-20

Resoruce Building

release management combing done.

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (+++++++)
Paper revision done.

Noise training

Paper writing will be started this week

GFbank

Running into Sinovoice 8k 1400 + 100 mixture training.
FBank/GFbank, stream/non-stream MPE completed:

                                   Huawei disanpi     BJ mobile   8k English data       
FBank non-stream (MPE4)             20.44%              22.28%      24.36%
FBank stream (MPE1)             20.17%              22.50%      21.63%
GFbank stream    (MPE4)           20.69%                22.84%       24.45%
GFbank non-stream (MPE)             -                     -           -

Multilingual ASR

                                   HW 30h (HW TR LM not involved)     HW30h (HW TR LM involved)
FBank non-stream (MPE4)             22.23                                   21.38
Fbank stream (monolang)             21.64                                   20.72

GFbank stream    (MPE4)             -                     -           -
GFbank non-stream (MPE)             -                     -           -

Denoising & Farfield ASR

Replay may cause time delay. This should be solved by cross-correlation detection.
Single-layer network with more hidden units. failed.
Looks like the problem resides in large magnitude on output data.
New recordings (one almost near mic & one far field 2 meters)

Original model:

xEnt model:
               middle-field    far-field
    dev93       74.79          96.68
    eval92      63.42          94.75

MPE model:


MPE adaptation: 

               middle-field    far-field
    dev93       63.71          94.84
    eval92      52.67          90.45

VAD

DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD

Scoring

Collect more data with human scoring to train discriminative models

Embedded decoder

FSA size: 
threshold  1e-5    1e-6   1e-7   1e-8    1e-9
5k         480k    5.5M   44M     -      1.1G
10k        731k     7M    61M
20k        1.2M    8.8M   78M(301M)

600 X 4+800 AM, beam9: 
        150k       20k     10k      5k 
WER     15.96       -       -       -
RT       X         0.94     -       -

LM development

Domain specific LM

Baiduzhidao + Weibeo extraction done with various thresholds
Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.

Check proportion of tags int HW 30 h data

Word2Vector

W2V based doc classification

Full Gaussian based doc vector

represent each doc with a Gaussian distribution of the word vectors it involved.
using k-nn to conduct classification

             mean Eur Distance     KL distance    baseline (NB with mean)

Acc (50dim)    81.84            79.65                  69.7

Semantic word tree

First version based on pattern match done
Filter with query log
Further refinement with Baidu Baike hierarchy

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Inconsistent pattern in WER were found on Tenent test sets
probably need to use another test set to do investigation.

Investigate MS RNN LM training

2014-06-20

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

GFbank

Multilingual ASR

Denoising & Farfield ASR

VAD

Scoring

Embedded decoder

LM development

Domain specific LM

Word2Vector

W2V based doc classification

Semantic word tree

NN LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具