“2014-06-06”版本间的差异

2014年6月6日 (五) 07:02的最后版本

Resoruce Building

Release management has been started

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (++++++)

Noise training

Paper writing will be started this week

GFbank

Running into Sinovoice 8k 1400 + 100 mixture training.
GFbank 14 xEnt iteration completed:

                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream (17 iteration) 22.01% 26.63% - GFbank stream (14 iteration) 22.47%; 27.52% -

Multilingual ASR

                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream - - -

Multilingual LM decoding
TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect.
Investigate with free-loop grammar.
Non-tag test should be conducted on both Baidu & micro blob data
Should test the 8k shujutang data on the mixture model.

Denoising & Farfield ASR

Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER.
Use more training data to do adaptation.
Record the wave with a single speaker & near-field microphone and do test again.

VAD

DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
Need to test small scale network (+)

600-800 network test
100 X 4 + 2 network training

Scoring

Collect more data with human scoring to train discriminative models

Embedded decoder

1200 X 4 + 10k AM:

       150k       20k     10k      5k

WER 42.23 43.45 44.54 46.07 RT 1h31 48m 44m 43m

LM development

Domain specific LM

Retrieve both Baidu & microblog
Need to check into gitLab(+).

Word2Vector

Design network spider
Design semantic related word tree

First version based on pattern match done
Filter with query log
Further refinement with Baidu Baike hierarchy

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Inconsistent pattern in WER were found on Tenent test sets
probably need to use another test set to do investigation.

Investigate MS RNN LM training

@@ 第3行： / 第3行： @@
 == Leftover questions==
-* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
+* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
 * Multi GPU training: Error encountered
 * Multilanguage training
@@ 第13行： / 第13行： @@
 === Sparse DNN ===
-* GA-based block sparsity (+++++)
+* GA-based block sparsity (++++++)
 ===Noise training===
-:* All experiments completed.
 :* Paper writing will be started this week
 ===GFbank===
-* Test on Tencent database is done. Better performance observed than Fbank
+* Running into Sinovoice 8k 1400 + 100  mixture training.
-* Equal-loudness pre-filter added, slightly better performance was obtained
+* GFbank 14 xEnt iteration completed:
-* Running into Sinovoice 8k 1400 + 100  mixture training. 9 xEnt iteration completed.
+                                   Huawei disanpi     BJ mobile   8k English data
+FBank non-stream (17 iteration)     22.01%              26.63%     -
+GFbank stream    (14 iteration)     22.47%;             27.52%     -
 ===Multilingual ASR===
+                                   Huawei disanpi     BJ mobile   8k English data
+FBank non-stream                      -                   -           -
 * Multilingual LM decoding
-* non-tag bug investigation with some digital string recordings
+* TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect.
-* Revert to hanzi numbers
+* Investigate with free-loop grammar.
+* Non-tag test should be conducted on both Baidu & micro blob data
+* Should test the 8k shujutang data on the mixture model.
-===English model===
-<pre>
-(state-gauss = 10000 100000, various LM, beam 13)
-. Shujutang 100h chi-eng 16k:
-  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
---------- --------- --------- --------- --------- ---------
-   wsj   |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |
-   cmu   |  22.22  |    -    |    -    |    -    |  18.83  |
-   giga  |  21.77  |    -    |    -    |    -    |  18.61  |
-  armid  |  20.45  |    -    |    -    |    -    |    -    |
-. Shujutang 100H chi-eng 8k:
-  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
---------- --------- --------- --------- --------- ---------
-   wsj   |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |
-   cmu   |  24.11  |    -    |    -    |    -    |  20.36  |
-   giga  |  23.11  |    -    |    -    |    -    |  20.11  |
-  armid  |    -    |    -    |    -    |    -    |    -    |
-. voxforge pure eng 16k:
-  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
---------- --------- --------- --------- --------- ---------
-   wsj   |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |
-   cmu   |  24.00  |    -    |    -    |    -    |  21.33  |
-   giga  |  18.75  |    -    |    -    |    -    |  22.45  |
-  armid  |    -    |    -    |    -    |    -    |    -    |
-. fisher pure eng 8k:
-Not finish yet.
-  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
---------- --------- --------- --------- --------- ---------
-   wsj   |  40.65  |  36.16  |  35.94  |  35.88  |  35.80  |
-   cmu   |  35.07  |    -    |    -    |    -    |  31.16  |
-   giga  |  41.18  |    -    |    -    |    -    |  36.23  |
-  armid  |    -    |    -    |    -    |    -    |    -    |
-</pre>
 ===Denoising & Farfield ASR===
-* Investigating DAE model
+* Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER.
-* Kaldi-based MSE obj training toolkit preparation
+* Use more training data to do adaptation.
+* Record the wave with a single speaker & near-field microphone and do test again.
 ===VAD===
@@ 第88行： / 第48行： @@
 * DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
 * Need to test small scale network (+)
-:* 600-800 network
+:* 600-800 network test
-:* 100 X 4 + 2
+:* 100 X 4 + 2 network training
 ===Scoring===
-* Bug for the stream mode fixed
+* Collect more data with human scoring to train discriminative models
 ===Embedded decoder===
-* word list graph test passed
+X 4 + 10k AM:
-* wlist2LG toolkit checked in
-* Prepare to deliver Android compiler options (.mk)
-* Interface design should be completed in one day
-* Prepare HCLG for 20k LM, decoding on progress.
+k       20k     10k      5k
+WER     42.23     43.45    44.54   46.07
+RT       1h31     48m       44m     43m
 ==LM development==
@@ 第110行： / 第69行： @@
 * Retrieve both Baidu & microblog
-* PPL testing
+* Need to check into gitLab(+).
-* Need to check into gitLab.
+==Word2Vector==
+* Design network spider
+* Design semantic related word tree
+:* First version based on pattern match done
+:* Filter with query log
+:* Further refinement with Baidu Baike hierarchy
 ===NN LM===

“2014-06-06”版本间的差异

2014年6月6日 (五) 07:02的最后版本

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

GFbank

Multilingual ASR

Denoising & Farfield ASR

VAD

Scoring

Embedded decoder

LM development

Domain specific LM

Word2Vector

NN LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具