“2014-06-06”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“==Resoruce Building== * Release management has been started == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the...”创建新页面)
 
 
第3行: 第3行:
  
 
== Leftover questions==
 
== Leftover questions==
* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
+
* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.  
 
* Multi GPU training: Error encountered
 
* Multi GPU training: Error encountered
 
* Multilanguage training
 
* Multilanguage training
第13行: 第13行:
  
 
=== Sparse DNN ===
 
=== Sparse DNN ===
* GA-based block sparsity (+++++)
+
* GA-based block sparsity (++++++)
  
 
===Noise training===
 
===Noise training===
:* All experiments completed.
 
 
:* Paper writing will be started this week
 
:* Paper writing will be started this week
  
 
===GFbank===
 
===GFbank===
  
* Test on Tencent database is done. Better performance observed than Fbank
+
* Running into Sinovoice 8k 1400 + 100  mixture training.  
* Equal-loudness pre-filter added, slightly better performance was obtained
+
* GFbank 14 xEnt iteration completed:
* Running into Sinovoice 8k 1400 + 100  mixture training. 9 xEnt iteration completed.  
+
                                  Huawei disanpi    BJ mobile  8k English data
 
+
FBank non-stream (17 iteration)    22.01%              26.63%    -
 +
GFbank stream    (14 iteration)    22.47%;            27.52%    -
  
 
===Multilingual ASR===
 
===Multilingual ASR===
 +
 +
                                  Huawei disanpi    BJ mobile  8k English data
 +
FBank non-stream                      -                  -          -
  
 
* Multilingual LM decoding  
 
* Multilingual LM decoding  
* non-tag bug investigation with some digital string recordings
+
* TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect.
* Revert to hanzi numbers
+
* Investigate with free-loop grammar.  
 
+
* Non-tag test should be conducted on both Baidu & micro blob data
 
+
* Should test the 8k shujutang data on the mixture model.
===English model===
+
 
+
<pre>
+
 
+
(state-gauss = 10000 100000, various LM, beam 13)
+
 
+
1. Shujutang 100h chi-eng 16k:
+
 
+
  LM/AM  |  xEnt  |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
+
--------- --------- --------- --------- --------- ---------
+
  wsj  |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |
+
  cmu  |  22.22  |    -   |    -    |    -    |  18.83  |
+
  giga  |  21.77  |    -    |    -    |    -    |  18.61  |
+
  armid  |  20.45  |    -    |    -    |    -    |    -    |
+
 
+
 
+
2. Shujutang 100H chi-eng 8k:
+
 
+
  LM/AM  |  xEnt  |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
+
--------- --------- --------- --------- --------- ---------
+
  wsj  |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |
+
  cmu  |  24.11  |    -    |    -    |    -    |  20.36  |
+
  giga  |  23.11  |    -    |    -    |    -    |  20.11  |
+
  armid  |    -    |    -    |    -    |    -    |    -    |
+
 
+
 
+
3. voxforge pure eng 16k:
+
 
+
  LM/AM  |  xEnt  |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
+
--------- --------- --------- --------- --------- ---------
+
  wsj  |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |
+
  cmu  |  24.00  |    -    |    -    |    -    |  21.33  |
+
  giga  |  18.75  |    -    |    -    |    -    |  22.45  |
+
  armid  |    -    |    -    |    -    |    -    |    -    |
+
 
+
4. fisher pure eng 8k:
+
Not finish yet.
+
  LM/AM  |  xEnt  |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
+
--------- --------- --------- --------- --------- ---------
+
  wsj  |  40.65  |  36.16  |  35.94  |  35.88  |  35.80  |
+
  cmu  |  35.07  |    -    |    -    |    -    |  31.16  |
+
  giga  |  41.18  |    -    |    -    |    -    |  36.23  |
+
  armid  |    -    |    -    |    -    |    -    |    -    |
+
</pre>
+
  
  
 
===Denoising & Farfield ASR===
 
===Denoising & Farfield ASR===
  
* Investigating DAE model
+
* Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER.
* Kaldi-based MSE obj training toolkit preparation
+
* Use more training data to do adaptation.
 +
* Record the wave with a single speaker & near-field microphone and do test again.
  
 
===VAD===
 
===VAD===
第88行: 第48行:
 
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
 
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
 
* Need to test small scale network (+)
 
* Need to test small scale network (+)
:* 600-800 network
+
:* 600-800 network test
:* 100 X 4 + 2
+
:* 100 X 4 + 2 network training
  
 
===Scoring===
 
===Scoring===
  
* Bug for the stream mode fixed
+
* Collect more data with human scoring to train discriminative models
  
  
 
===Embedded decoder===
 
===Embedded decoder===
  
* word list graph test passed
+
1200 X 4 + 10k AM:
* wlist2LG toolkit checked in
+
* Prepare to deliver Android compiler options (.mk)
+
* Interface design should be completed in one day
+
* Prepare HCLG for 20k LM, decoding on progress.
+
  
 +
        150k      20k    10k      5k
 +
WER    42.23    43.45    44.54  46.07
 +
RT      1h31    48m      44m    43m
  
 
==LM development==
 
==LM development==
第110行: 第69行:
  
 
* Retrieve both Baidu & microblog
 
* Retrieve both Baidu & microblog
* PPL testing
+
* Need to check into gitLab(+).
* Need to check into gitLab.
+
 
 +
 
 +
==Word2Vector==
 +
 
 +
* Design network spider
 +
* Design semantic related word tree
 +
:* First version based on pattern match done
 +
:* Filter with query log
 +
:* Further refinement with Baidu Baike hierarchy
 +
 
  
 
===NN LM===
 
===NN LM===

2014年6月6日 (五) 07:02的最后版本

Resoruce Building

  • Release management has been started

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++++++)

Noise training

  • Paper writing will be started this week

GFbank

  • Running into Sinovoice 8k 1400 + 100 mixture training.
  • GFbank 14 xEnt iteration completed:
                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream (17 iteration) 22.01% 26.63% - GFbank stream (14 iteration) 22.47%; 27.52% -

Multilingual ASR

                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream - - -

  • Multilingual LM decoding
  • TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect.
  • Investigate with free-loop grammar.
  • Non-tag test should be conducted on both Baidu & micro blob data
  • Should test the 8k shujutang data on the mixture model.


Denoising & Farfield ASR

  • Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER.
  • Use more training data to do adaptation.
  • Record the wave with a single speaker & near-field microphone and do test again.

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • Need to test small scale network (+)
  • 600-800 network test
  • 100 X 4 + 2 network training

Scoring

  • Collect more data with human scoring to train discriminative models


Embedded decoder

1200 X 4 + 10k AM:

       150k       20k     10k      5k 

WER 42.23 43.45 44.54 46.07 RT 1h31 48m 44m 43m

LM development

Domain specific LM

  • Retrieve both Baidu & microblog
  • Need to check into gitLab(+).


Word2Vector

  • Design network spider
  • Design semantic related word tree
  • First version based on pattern match done
  • Filter with query log
  • Further refinement with Baidu Baike hierarchy


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training