“2013-08-02”版本间的差异

2013年8月2日 (五) 07:54的最后版本

Data sharing

LM count files still undelivered!

DNN progress

Experiments

Discriminative DNN

Use sequential MPE/MMI/bMMI(0.1) (with the DNN-based alignment/denlattices). 100-hour training, network structure: 100 + 4 X 800 + 2100:

TASK	cross-entropy (original)	MPE (it1)	MPE (it2)	MPE (it3)	MMI (it1)	MMI (it2)	MMI (it3)	bMMI(it1)	bMMI(it2)
map	22.98	23.91	23.26	22.84	22.30	21.92	21.64	21.99	21.82
2044	21.94	25.92	24.47	24.10	21.30	21.13	21.11	21.50	22.06
notetp3	14.73	21.64	18.83	19.16	14.68	14.57	14.25	14.52	15.06
record1900	8.45	8.93	7.60	8.46	6.64	6.27	6.07	6.76	6.20
general	34.0	35.29	33.72	33.62	33.80	33.85	33.68	33.27	33.25
online1	34.16	31.70	31.45	31.33	32.70	32.39	32.27	32.51	32.05
online2	27.10	24.56	24.42	24.37	25.18	24.90	24.76	25.02	24.70
speedup	24.1	22.93	21.86	21.60	21.94	22.00	22.26	21.92	21.35

MMI seems less aggressive than MPE. The former provides general performance gains, while the latter shows different behavior for different sets. bMMI seems more robust than MPE but less robust than MMI. More investigation could be done with different boost factors. This observations might be explained by the discrepancy between training data and test data; DT training is more suitable for test sets which are more consistent with the training condition.

Tencent exps

GPU & CPU merge

Hold

Confidence estimation

DNN confidence

We are interested in confidence estimation from DNN output directly. This confidence is naturally 'posterior' and does not rely on graphs so simply to generalize, e.g., when examine which output is the best from multiple decoders.

Removed silence in confidence computing

Score distribution testing is going on.

To be done:

CI Phone posterior-based (instead of state posterior-based) full path(instead of best path) confidence estimation.

Multi-graph decoding based on DNN confidence

Code done. Current implemented serial multigraph support. Simple test validated the change.
General mutigraph relies on a more flexible framework where each graph runs a separated processes and the central control collects these results based on the DNN confidence.

Embedded progress

Test done on the car-1000 test set:


    1,100_800_800_800_800_2108:
      %WER 1.61 [ 188 / 11710, 45 ins, 54 del, 89 sub ]
      %SER 2.24 [ 66 / 2953 ]
      Scored 2953 sentences, 0 not present in hyp.
    2,100_800_800_800_800_3620:
      %WER 1.66 [ 194 / 11710, 45 ins, 56 del, 93 sub ]
      %SER 2.40 [ 71 / 2953 ]
      Scored 2953 sentences, 0 not present in hyp.
    3,100_600_600_600_600_1264:
      %WER 1.61 [ 189 / 11710, 44 ins, 48 del, 97 sub ]
      %SER 2.47 [ 73 / 2953 ]
      Scored 2953 sentences, 0 not present in hyp.

It looks like the simple and fast 600X1264 net is good enough for grammar-based tasks.
Simple test shows RT 1.5 in the ARM board, and RT 0.5 in a popular mobile (bought in 2012).
Trying to build an .so so that the binary can be loaded on Android. Some errors in ATLAS compiling, need about 2 days to solve.

To be done

Shrink the NN structure (4 layer to 2 layer), and test the performance
The Kaldi decoder costs a lot when the graph is large. Need to improve the indexing of the FST structure.
Integrate the DNN FE with the pocket-sphinx decoder.

@@ 第32行： / 第32行： @@
 |}
-* MMI seems more robust than MPE. The former provides general performance gains, while the latter shows different behavior for different sets. bMMI seems more robust than MPE but less robust than MMI. More investigation could be done with different boost factors. This observations might be explained by the discrepancy between training data and test data; DT training is more suitable for test sets which are more consistent with the training condition.
+* MMI seems less aggressive than MPE. The former provides general performance gains, while the latter shows different behavior for different sets. bMMI seems more robust than MPE but less robust than MMI. More investigation could be done with different boost factors. This observations might be explained by the discrepancy between training data and test data; DT training is more suitable for test sets which are more consistent with the training condition.
@@ 第48行： / 第48行： @@
 * Removed silence in confidence computing
+* Score distribution testing is going on.
 * To be done:
-# Large-scale test.
 # CI Phone posterior-based (instead of state posterior-based) full path(instead of best path) confidence estimation.
@@ 第83行： / 第84行： @@
 * It looks like the simple and fast 600X1264 net is good enough for grammar-based tasks.
 * Simple test shows RT 1.5 in the ARM board, and RT 0.5 in a popular mobile (bought in 2012).
+* Trying to build an .so so that the binary can be loaded on Android. Some errors in ATLAS compiling, need about 2 days to solve.
 *To be done
-# Implement a DNN-based demo
 # Shrink the NN structure (4 layer to 2 layer), and test the performance
 # The Kaldi decoder costs a lot when the graph is large. Need to improve the indexing of the FST structure.
 # Integrate the DNN FE with the pocket-sphinx decoder.

“2013-08-02”版本间的差异

2013年8月2日 (五) 07:54的最后版本

目录

Data sharing

DNN progress

Experiments

Tencent exps

GPU & CPU merge

Confidence estimation

DNN confidence

Multi-graph decoding based on DNN confidence

Embedded progress

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具