2013-08-10

Data sharing

Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.

Distribution graph is obtained. The performance seems bad.
A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very important.

GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.

the code is done. Simple testing is completed.
Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
Problem 2: balance for posterior-based silence detection.

G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
HCLG integration is done. A bug fixed, passed initial test.
Online integration cost is 1 minute. Need to optimize.
Need thorough testing with the Tencent test suite.
Need to tune the subgraph feeding probability.