“2013-07-26”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“== Data sharing == * LM count files still undelivered! == DNN progress == === Experiments === * Sparse DNN on the ARM board <pre> 1200-1200-1200-3536 ...”创建新页面)
 
Experiments
 
(相同用户的13个中间修订版本未显示)
第6行: 第6行:
  
 
=== Experiments ===
 
=== Experiments ===
 +
 +
* Discriminative DNN
 +
 +
Use sequential MPE (with the DNN-based alignment/denlattices). 100-hour training, network structure: 100 + 4 X 800 + 2100:
 +
 +
{| class="wikitable"
 +
! TASK !!cross-entropy (original)  !! MPE (it1) !! MPE (it2) !! MPE (it3) !! MPE (it4)
 +
|-
 +
|map ||22.98|| 23.91 || 23.26 || 22.84 || -
 +
|-
 +
|2044 ||21.94 || 25.92 ||24.47 || 24.10  || -
 +
|-
 +
|notetp3|| 14.73 || 21.64 || 18.83 || 19.16 || -
 +
|-
 +
|record1900||8.45 || 8.93 || 7.60 || 8.46 || -
 +
|-
 +
|general||34.0  || 35.29 || 33.72 || 33.62 || -
 +
|-
 +
|online1||34.16  || 31.70|| 31.45 || 31.33 || -
 +
|-
 +
|online2||27.10  || 24.56|| 24.42 ||  24.37 || -
 +
|-
 +
|speedup|| 24.1  || 22.93 || 21.86 || 21.60 || -
 +
|-
 +
|}
 +
 +
* Conclusions: It seems the MPE criterion provides gains in most test sets, particularly in the online testing. For the sets with reading speech and good acoustic conditions, the MPE does not provide much and even jeopardize. This can be attributed to the training set which is more 'online' style. MMI/bMMI results will be delivered soon, which seems more 'gentle' and adjusted the weights not much.
 +
 +
 
* Sparse DNN on the ARM board
 
* Sparse DNN on the ARM board
  
第29行: 第58行:
  
 
<pre>
 
<pre>
To be done:
 
  
1. SuiteSparse lib
+
*To be done:
2. decrease the hidden layer from 4-2
+
 
3. Test accuracy on large data set
+
# Try SuiteSparse lib
 +
# Test accuracy on large data set
  
 
</pre>
 
</pre>
第44行: 第73行:
  
 
==Confidence estimation==
 
==Confidence estimation==
 +
===DNN confidence===
 +
* We are interested in confidence estimation from DNN output directly. This confidence is naturally 'posterior' and does not rely on graphs so simply to generalize, e.g., when examine which output is the best from multiple decoders.
 +
* The first design employs the best path in decoding/alignment, based on the state-posterior matrix directly. Code finished and the intuitive testing seems ok.
  
1. DNN-based confidence estimation for decoding and alignment is done. The intuitive testing seems ok.
+
* To be done:
  
To DO:
+
# Large-scale test.
 +
# CI Phone posterior-based (instead of state posterior-based) full path(instead of best path) confidence estimation.
  
1. Large scale test.
+
===Multi-graph decoding based on DNN confidence===
2. CI Phone-based full path confidence estimation.
+
* Due to the DNN-based confidence which is independent of decoding graphs, it is simple to compare or integrate results from various graphs. E.g., decoding can be performed on a general big graph and a user-specific graph, and then compare the confidence of the two results and select the best one.  
  
==FST sub-graph integration==
+
* To be done
 
+
1. Coding finished.
+
2. Debuging and Test next week.
+
  
 +
# Coding finished.
 +
# Debuging and Test next week.
  
 
== Embedded progress ==
 
== Embedded progress ==
  
1. Graph search costs a lot when the graph is large. Need to improve the indexing performance.
+
* The DNN FE is now 0.7 RT, and so can be employed in simple grammar tasks.
2. Need to integrate the Kaldi FE with pocket-sphinx decoder.
+
 
 +
*To be done
 +
 
 +
# Implement a DNN-based demo
 +
# Shrink the NN structure (4 layer to 2 layer), and test the performance
 +
# The Kaldi decoder costs a lot when the graph is large. Need to improve the indexing of the FST structure.
 +
# Integrate the DNN FE with the pocket-sphinx decoder.

2013年7月29日 (一) 02:57的最后版本

Data sharing

  • LM count files still undelivered!

DNN progress

Experiments

  • Discriminative DNN

Use sequential MPE (with the DNN-based alignment/denlattices). 100-hour training, network structure: 100 + 4 X 800 + 2100:

TASK cross-entropy (original) MPE (it1) MPE (it2) MPE (it3) MPE (it4)
map 22.98 23.91 23.26 22.84 -
2044 21.94 25.92 24.47 24.10 -
notetp3 14.73 21.64 18.83 19.16 -
record1900 8.45 8.93 7.60 8.46 -
general 34.0 35.29 33.72 33.62 -
online1 34.16 31.70 31.45 31.33 -
online2 27.10 24.56 24.42 24.37 -
speedup 24.1 22.93 21.86 21.60 -
  • Conclusions: It seems the MPE criterion provides gains in most test sets, particularly in the online testing. For the sets with reading speech and good acoustic conditions, the MPE does not provide much and even jeopardize. This can be attributed to the training set which is more 'online' style. MMI/bMMI results will be delivered soon, which seems more 'gentle' and adjusted the weights not much.


  • Sparse DNN on the ARM board
1200-1200-1200-3536                 1200-1200-1200-3536-sparse0.3 (sparsity 1/5)
original atlas:  RT 2.3                         RT 2.3
atlas sparse:    RT 54                          RT 14  
NIST smatmat:    RT 27.3                        RT 5.98
800-800-800-2108                    800-800-800-2108-sparse0.3 (sparsity 2/5):
original atlas: RT 1.3                          RT 1.1
NIST smatmat:   RT 11.9                         RT 5.5

600-600-600-1500 
original atlas: RT 0.9
NIST smatmat:   RT 6.5



*To be done:

# Try SuiteSparse lib
# Test accuracy on large data set

Tencent exps

GPU & CPU merge

  1. Hold

Confidence estimation

DNN confidence

  • We are interested in confidence estimation from DNN output directly. This confidence is naturally 'posterior' and does not rely on graphs so simply to generalize, e.g., when examine which output is the best from multiple decoders.
  • The first design employs the best path in decoding/alignment, based on the state-posterior matrix directly. Code finished and the intuitive testing seems ok.
  • To be done:
  1. Large-scale test.
  2. CI Phone posterior-based (instead of state posterior-based) full path(instead of best path) confidence estimation.

Multi-graph decoding based on DNN confidence

  • Due to the DNN-based confidence which is independent of decoding graphs, it is simple to compare or integrate results from various graphs. E.g., decoding can be performed on a general big graph and a user-specific graph, and then compare the confidence of the two results and select the best one.
  • To be done
  1. Coding finished.
  2. Debuging and Test next week.

Embedded progress

  • The DNN FE is now 0.7 RT, and so can be employed in simple grammar tasks.
  • To be done
  1. Implement a DNN-based demo
  2. Shrink the NN structure (4 layer to 2 layer), and test the performance
  3. The Kaldi decoder costs a lot when the graph is large. Need to improve the indexing of the FST structure.
  4. Integrate the DNN FE with the pocket-sphinx decoder.