“2013-09-13”版本间的差异

2013年9月13日 (五) 04:06的最后版本

Data sharing

LM count files still undelivered!

DNN progress

Sparse DNN

Cutting 50% of the weights, and then start to run into sticky with learning rate 0.0025. Continuous pruning until 1.6% weights left.

performance chart

The test results show not much gains with noise.
1/8 sparsity shows no evident performance reduction, as we observed before and is consistent the results reported by MS.

FBank features

CMN shows similar impact to MFCC & FBank. Since MFCC involves summary of various random channels, the mean and covariance of the dimensions are less random. This leads to two possible impacts: first, the dimensions are relatively stable therefore CMVN does not contribute much; on other hand, estimation of mean and variance is more accurate so CMVN leads to more reliable results. This means CMVN leads to unpredictable performance improvement for MFCC & Fbank, depending on the data set.

Performance chart

Choose various Fbank dimension, keep LDA output dimension as 100. FB30 seems the best.

Performance chart

Choose FBank 40, test various LDA output dimension. The results show LDA is still helpful, and dimension 200 is sufficient.

Performance chart

We need to investigate non-linear discriminative approach which is simple but leads to less information lost.
We can also test a simple 'the same dimension DCT'. If the performance is still worse than FB, we confirm that the problem is due to noisy channel accumulation.
Need to investigate Gammatone filter banks. The same idea as FB, that we want to keep the information as much as possible. And it is possible to combine FB and GFB to pursue a better performance.

Tencent exps

N/A

DNN Confidence estimation

Lattice-based confidence show better performance with DNN with before.
Accumulated DNN confidence is done. The confidence values are much more reasonable.
Prepare MLP/DNN-based confidence integration.

Noisy training

We trained model with a random noise approach, which samples half of the training data and add 15db white noise. We hope this rand-noise learning will improve the performance of data in noise while keeping the discriminative power of the model in clean speech.

performance chat

The results are largely consistent with our expectation, that the performance on noisy data were greatly improved, while the performance on clean speech is not hurted much.

We are looking forward to the noisy training which introduces some noises randomly online in training.

Car noise training. It shows limited impact of car noise.

Performance chart

DNN confidence

The non-acoustic lattice-based confidence is done. The phone-based accumulated confidence is done. Chart

It looks like the acoustic information does not contribute much to the lattice based confidence. Which means that we need a better way to combine the acoustic and the linguistic sources with models e.g., MLP.

@@ 第7行： / 第7行： @@
 === Sparse DNN ===
-* Cutting 50% of the weights, and then start to run into sticky with learning rate 0.0025.
+* Cutting 50% of the weights, and then start to run into sticky with learning rate 0.0025. Continuous pruning until 1.6% weights left.
+[http://192.168.0.50:3000/series/?q=&action=view&series=75%2C70%2C69%2C66%2C65%2C64%2C63%2C62%2C61%2C60%2C59&chart_type=bar performance chart]
-[http://192.168.0.50:3000/series/?q=&action=view&series[]=75&series[]=70&series[]=69&series[]=66&series[]=65&series[]=64&series[]=63&series[]=62&series[]=61&series[]=60&series[]=59 Performance chart]
+* The test results show not much gains with noise.
+* 1/8 sparsity shows no evident performance reduction, as we observed before and is consistent the results reported by MS.
+=== FBank features ===
+* CMN shows similar impact to MFCC & FBank. Since MFCC involves summary of various random channels, the mean and covariance of the dimensions are less random. This leads to two possible impacts: first, the dimensions are relatively stable therefore CMVN does not contribute much; on other hand, estimation of mean and variance is more accurate so CMVN leads to more reliable results. This means CMVN leads to unpredictable performance improvement for MFCC & Fbank, depending on the data set.
-* The comparison shows very similar performance.
+[http://192.168.0.50:3000/series/?q=&action=view&series=53%2C51%2C45%2C44%2C33%2C32%2C31%2C29&chart_type=bar Performance chart]
-* Cut more weights based on up-to-now sparse model. Lead to iterative sparsity.
-* Test on noisy data with the sparse.
-=== FBank features ===
+* Choose various Fbank dimension, keep LDA output dimension as 100. FB30 seems the best.
+[http://192.168.0.50:3000/series/?q=&action=view&series=36%2C34%2C29&chart_type=bar Performance chart]
-Test on 100 hour data, structure 100_1200_1200_1200_1200_3580. Test on clean & 15db noiy speech.
+* Choose FBank 40, test various LDA output dimension. The results show LDA is still helpful, and dimension 200 is sufficient.
-{|class="wikitable"
+[http://192.168.0.50:3000/series/?q=&action=view&series=56%2C54%2C43%2C36&chart_type=bar Performance chart]
-!set !! MFCC !! GFCC !! FB    || MFCC + 15db  || GFCC + 15db || FB + 15db
-|-
-|map     ||23.75 ||22.95 || 20.88 || 65.24  || 62.99 || 62.20
-|-
-|2044    ||21.47 ||20.93 || 19.69  || 48.93 || 46.34 ||45.75
-|-
-|notetp3 ||13.17 ||15.43  || 12.79  || 55.91 ||52.46 ||54.56
-|-
-|record1900 ||8.10|| 7.32  || 7.38  || 25.43  ||26.62 || 23.97
-|-
-|general ||34.41||31.57 || 31.88  || 70.95   || 66.04 || 65.93
-|-
-|online1 ||33.02||31.83  || 31.54  || 50.40  || 46.61 || 48.06
-|-
-|online2 ||25.99||25.20  || 24.89  || 48.45  || 44.49 ||45.83
-|-
-|speedup ||23.52||22.97  ||21.54  || 64.78 || 60.38 ||61.52
-|-
-|}
-* FB feature is much better than both MFCC and GFCC. Probably due to the less information lost without DCT.
+* We need to investigate non-linear discriminative approach which is simple but leads to less information lost.
-* In noisy environment, GFCC obtains comparable or better performance compared to FB.
-* We need to investigate how many FBs are the most appropriate.
-* Inspired by the assumption of information lost with DCT, we need to test how another transform,  LDA,  leads to the similar information lost. We need to investigate which is the suitable dimension number for the LDA. We need to investigate non-linear discriminative approach which is simple but leads to less information lost.
-* Another assumption for the better performance with FB is that the FB is more suitable for CMN. DCT accumulates a number of noisy channels and thus exhibits more uncertain. This in turn can hardly be normalized by CMN. We need to test the performance of FB and MFCC when no CMN is introduced.
 * We can also test a simple 'the same dimension DCT'. If the performance is still worse than FB, we confirm that the problem is due to noisy channel accumulation.
 * Need to investigate Gammatone filter banks. The same idea as FB, that we want to keep the information as much as possible. And it is possible to combine FB and GFB to pursue a better performance.
@@ 第59行： / 第39行： @@
 * Accumulated DNN confidence is done. The confidence values are much more reasonable.
 * Prepare MLP/DNN-based confidence integration.
 ==Noisy training ==
-Reading the table in the last section, we observe very disapointting performance reduction with noise. And we did not see too much advantage for FB and GFCC. We examine how if we introduce the noise in training. In this experiment, 15db noise are introduced in all the training data (100 hours), and the test utterances are in various noise level. Just give the performance on the test set online1. More performance is here:
+* We trained model with a random noise approach, which samples half of the training data and add 15db white noise. We hope this rand-noise learning will improve the performance of data in noise while keeping the discriminative power of the model in clean speech.
+[http://192.168.0.50:3000/series/?q=&action=view&series=76%2C76.0%2C76.1%2C76.2%2C76.3%2C74%2C73%2C72%2C71%2C45&chart_type=bar performance chat]
+* The results are largely consistent with our expectation, that the performance on noisy data were greatly improved, while the performance on clean speech is not hurted much.
+* We are looking forward to the noisy training which introduces some noises randomly online in training.
-http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=118
+* Car noise training. It shows limited impact of car noise.
-{|class='wikitable'
+[http://192.168.0.50:3000/series/?q=&action=view&series=78%2C78.0%2C78.1%2C78.2%2C78.3%2C45&chart_type=bar Performance chart]
-!    SNR  !!        MFCC     !!  GFCC
-|-
-|clean      || 45.63         || 38.12
-|-
-|20db       || 32.41         || 30.54
-|-
-|15db(matched training)      || 35.05       ||  32.80
-|-
-|10db       || 41.06         ||38.53
-|-
-|}
-* It is interesting to see that two factors are important in the noisy training: (1) speech should be clean (2) speech should match the training condition. The best performance is from 20db, which is not very clean and not very mismatch. This is interesting.
+==DNN confidence==
-* We are looking forward to the noisy training which introduces some noises randomly in training.
-==Stream decoding==
+* The non-acoustic lattice-based confidence is done. The phone-based accumulated confidence is done. [http://192.168.0.50:3000/series/chart?data=/json/conf_det.json Chart]
-* The interface for server-side is done. For embedded-side is on development.
+* It looks like the acoustic information does not contribute much to the lattice based confidence. Which means that we need a better way to combine the acoustic and the linguistic sources with models e.g., MLP.
-* Fixed a bug which prompts intermediate results when short pause encountered.
-* Fixed a CMN bug for the last segment.

“2013-09-13”版本间的差异

2013年9月13日 (五) 04:06的最后版本

目录

Data sharing

DNN progress

Sparse DNN

FBank features

Tencent exps

DNN Confidence estimation

Noisy training

DNN confidence

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具