“2014-06-27”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
W2V based doc classification
Multilingual ASR
 
(2位用户的10个中间修订版本未显示)
第25行: 第25行:
 
                                   Huawei disanpi    BJ mobile  8k English data       
 
                                   Huawei disanpi    BJ mobile  8k English data       
 
FBank non-stream (MPE4)            20.44%              22.28%      24.36%
 
FBank non-stream (MPE4)            20.44%              22.28%      24.36%
FBank stream (MPE1)                20.17%              22.50%      21.63%
+
FBank stream (MPE4)                19.46%              22.00%      21.19%
 
GFbank stream    (MPE4)            20.69%              22.84%      24.45%
 
GFbank stream    (MPE4)            20.69%              22.84%      24.45%
 
GFbank non-stream (MPE)            -                    -          -
 
GFbank non-stream (MPE)            -                    -          -
第33行: 第33行:
  
 
<pre>
 
<pre>
                                   HW 30h (HW TR LM not involved)    HW30h (HW TR LM involved)
+
                                   HW 27h (HW TR LM not involved)    HW27h (HW TR LM involved)
 +
Fbank stream (monolang)            21.64                                  20.72
 
FBank non-stream (MPE4)            22.23                                  21.38
 
FBank non-stream (MPE4)            22.23                                  21.38
Fbank stream (monolang)             21.64                                  20.72
+
FBank stream (MPE4)                 21.99                                    - 
 
+
 
</pre>
 
</pre>
  
第67行: 第67行:
  
 
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
 
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
* 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
+
* 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD  
 
+
* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/27/Dnn_vad_VS_energy_vad_20140619.xlsx report form]
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/27/Dnn_vad_VS_energy_vad_20140619.xlsx|report forms]
+
  
 
===Scoring===
 
===Scoring===
第83行: 第82行:
 
AM: 600x4+800 xent9 model:  
 
AM: 600x4+800 xent9 model:  
  
bigLM 1e-9
 
--------------------------------------------------------------------
 
voc size  |      150k      20k      10k    5k
 
--------------------------------------------------------------------
 
graph size|                9.1M    7.2M    5.5M
 
--------------------------------------------------------------------
 
Acc      |      15.96
 
--------------------------------------------------------------------
 
RT:      |   
 
--------------------------------------------------------------------
 
  
bigLM 1e-7
 
--------------------------------------------------------------------
 
voc size  |      150k      20k      10k    5k
 
--------------------------------------------------------------------
 
graph size|      111      78      61      44
 
--------------------------------------------------------------------
 
Acc      |      19.94    23.35  25.92  29.35
 
--------------------------------------------------------------------
 
RT:      |      1.69      1.06    1.07    0.98
 
--------------------------------------------------------------------
 
  
HCLG 1e-6
+
pruning threshold: 1e-5, Nobiglm
--------------------------------------------------------------------
+
------------------------------------------------------------------------------------------
voc size  |       150k       20k     10k    5k  
+
            |   150k   |  80k    |    40k    |    20k   |    10k    |      5k   |
--------------------------------------------------------------------  
+
------------------------------------------------------------------------------------------
graph size|       98        49      34      24         
+
       wer    |    26.60  |  27.16  |    28.11    |    29.14  |  31.02    |    33.37  |
--------------------------------------------------------------------
+
------------------------------------------------------------------------------------------
Acc      |       22.49      25.51   27.71  30.71
+
      RT    |   0.68  |  0.66  |   0.61    |    0.61    |    0.58    |    0.56    |
--------------------------------------------------------------------
+
------------------------------------------------------------------------------------------
RT:      |       0.89      0.70     0.68   0.64
+
graph size  |     21M  |    14M  |    9.1M     |    6.9M   |    5.5M    |    4.1M    |
--------------------------------------------------------------------
+
------------------------------------------------------------------------------------------
  
HCLG 1e-5
+
YINSHI:2014-Jun-24,Wednesday,10:7:0
--------------------------------------------------------------------
+
 
voc size  |       150k       20k     10k    5k  
+
 
--------------------------------------------------------------------  
+
pruning threshold: 1e-6, Nobiglm
graph size|      21        6.9      5.5     4.1     
+
------------------------------------------------------------------------------------------
--------------------------------------------------------------------
+
            |   150k   |  80k    |    40k    |    20k   |    10k    |      5k   |
Acc      |      26.60    29.14    31.02   33.37
+
------------------------------------------------------------------------------------------
--------------------------------------------------------------------
+
      wer    |    22.49  |  23.05  |    24.15    |    25.51  |  27.71    |    30.71  |
RT:      |       0.68      0.61     0.58   0.56
+
------------------------------------------------------------------------------------------
--------------------------------------------------------------------
+
      RT    |    0.89  |  0.84  |    0.76    |    0.70    |    0.68    |    0.64    |
 +
------------------------------------------------------------------------------------------
 +
graph size |     98M  |    86M  |    67M    |    49M    |    34M    |    24M    |
 +
------------------------------------------------------------------------------------------
 +
 
 +
YINSHI:2014-Jun-27,Saturday,0:52:35
 +
 
 +
 
 +
pruning threshold: 1e-6.5, biglm
 +
------------------------------------------------------------------------------------------
 +
            |    150k  |  80k    |    40k    |    20k    |    10k    |      5k    |
 +
------------------------------------------------------------------------------------------
 +
       wer    |    21.12  |  21.75  |    22.92    |    24.39  |  26.89    |    30.01  |
 +
------------------------------------------------------------------------------------------
 +
       RT    |    1.45  |  1.25  |    1.16     |    1.11    |    1.02    |    0.94    |
 +
------------------------------------------------------------------------------------------
 +
graph size  |     38M  |    35M  |    30M    |    25M    |    20M    |    15M    |
 +
------------------------------------------------------------------------------------------
 +
 
 +
YINSHI:2014-Jun-27,Saturday,0:58:27
 +
 
 +
 
 +
pruning threshold: 1e-5.5, Nobiglm
 +
------------------------------------------------------------------------------------------
 +
            |   150k  |  80k    |    40k    |    20k    |    10k    |      5k    |
 +
------------------------------------------------------------------------------------------
 +
       wer    |    24.46  |  25.05  |    26.05    |    27.11  |  29.36    |    32.01   |
 +
------------------------------------------------------------------------------------------
 +
      RT   |   0.71  |  0.69  |    0.66     |    0.63    |   0.60    |    0.58    |
 +
------------------------------------------------------------------------------------------
 +
graph size  |    39M  |    32M  |    25M    |    19M    |    14M    |    9.2M    |
 +
------------------------------------------------------------------------------------------
  
 
</pre>
 
</pre>
第135行: 第145行:
  
 
* Baiduzhidao + Weibeo extraction done with various thresholds
 
* Baiduzhidao + Weibeo extraction done with various thresholds
* Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
+
* Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing.
 +
* Check proportion of tags int HW 30h data
  
* Check proportion of tags int HW 30 h data!!!
 
  
 
==Word2Vector==
 
==Word2Vector==
第159行: 第169行:
 
                       mean Eur Distance    KL distance    diagonal KL        LDA
 
                       mean Eur Distance    KL distance    diagonal KL        LDA
  
2-class Acc (50dim)      -                    -              -               -
+
2-class Acc (50dim)      95.57                -              -             95.80
8-class Acc (50dim)      -                    -              -                -
+
8-class Acc (50dim)      88.79                -              -                -
  
 
</pre>
 
</pre>
  
 
==Semantic word tree==
 
==Semantic word tree==
 +
 
:* Version v2.0 released (filter with query log)
 
:* Version v2.0 released (filter with query log)
 
:* Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
 
:* Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)

2014年6月27日 (五) 05:53的最后版本

Resoruce Building

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++++++++)


Noise training

  • Paper writing on going

GFbank

  • Running into Sinovoice 8k 1400 + 100 mixture training.
  • FBank/GFbank, stream/non-stream MPE completed:
                                   Huawei disanpi     BJ mobile   8k English data       
FBank non-stream (MPE4)             20.44%              22.28%      24.36%
FBank stream (MPE4)                 19.46%              22.00%      21.19%
GFbank stream    (MPE4)             20.69%              22.84%      24.45%
GFbank non-stream (MPE)             -                     -           -

Multilingual ASR

                                   HW 27h (HW TR LM not involved)     HW27h (HW TR LM involved)
Fbank stream (monolang)             21.64                                   20.72
FBank non-stream (MPE4)             22.23                                   21.38
FBank stream (MPE4)                 21.99                                     -  

Denoising & Farfield ASR

  • correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
  • how about the output cmvn test?
  • deliver the recording to /nfs/disk/perm/data/corpora/reverberant

Original model:

xEnt model:
               middle-field    far-field
    dev93       74.79          96.68
    eval92      63.42          94.75

MPE model:


MPE adaptation: 

               middle-field    far-field
    dev93       63.71          94.84
    eval92      52.67          90.45

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
  • report form

Scoring

  • refine the model with AMIDA database. Local minimum observed.
  • ivector-based speaker detection seems find, reach 96% with 100 speakers


Embedded decoder


AM: 600x4+800 xent9 model: 



pruning threshold: 1e-5, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    26.60  |   27.16  |    28.11    |    29.14   |   31.02    |    33.37   |
------------------------------------------------------------------------------------------
       RT    |    0.68   |   0.66   |    0.61     |    0.61    |    0.58    |    0.56    |
------------------------------------------------------------------------------------------
 graph size  |     21M   |    14M   |    9.1M     |    6.9M    |    5.5M    |    4.1M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-24,Wednesday,10:7:0 


pruning threshold: 1e-6, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    22.49  |   23.05  |    24.15    |    25.51   |   27.71    |    30.71   |
------------------------------------------------------------------------------------------
       RT    |    0.89   |   0.84   |    0.76     |    0.70    |    0.68    |    0.64    |
------------------------------------------------------------------------------------------
 graph size  |     98M   |    86M   |     67M     |    49M     |    34M     |     24M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-27,Saturday,0:52:35 


pruning threshold: 1e-6.5, biglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    21.12  |   21.75  |    22.92    |    24.39   |   26.89    |    30.01   |
------------------------------------------------------------------------------------------
       RT    |    1.45   |   1.25   |    1.16     |    1.11    |    1.02    |    0.94    |
------------------------------------------------------------------------------------------
 graph size  |     38M   |    35M   |     30M     |    25M     |    20M     |     15M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-27,Saturday,0:58:27 


pruning threshold: 1e-5.5, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    24.46  |   25.05  |    26.05    |    27.11   |   29.36    |    32.01   |
------------------------------------------------------------------------------------------
       RT    |    0.71   |   0.69   |    0.66     |    0.63    |    0.60    |    0.58    |
------------------------------------------------------------------------------------------
 graph size  |     39M   |    32M   |     25M     |    19M     |    14M     |    9.2M    |
------------------------------------------------------------------------------------------


LM development

Domain specific LM

  • Baiduzhidao + Weibeo extraction done with various thresholds
  • Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing.
  • Check proportion of tags int HW 30h data


Word2Vector

W2V based doc classification

  • Full Gaussian based doc vector
  • represent each doc with a Gaussian distribution of the word vectors it involved.
  • using k-nn to conduct classification
             mean Eur Distance     KL distance  diagonal KL  baseline (NB with mean)

Acc (50dim)    81.84                79.65          -              69.7
  • svm-based classification


                       mean Eur Distance     KL distance    diagonal KL         LDA

2-class Acc (50dim)       95.57                 -               -              95.80
8-class Acc (50dim)       88.79                 -               -                -

Semantic word tree

  • Version v2.0 released (filter with query log)
  • Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
  • Version v3.0 under going. Further refinement with Baidu Baike hierarchy


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training