“2013-08-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“== Data sharing == * LM count files still undelivered! == DNN progress == === Discriminative DNN === * Running 1200-3620 NN, graph generation is done. Should be don...”创建新页面)
 
Embedded progress
 
(相同用户的7个中间修订版本未显示)
第7行: 第7行:
 
=== Discriminative DNN ===
 
=== Discriminative DNN ===
  
* Running 1200-3620 NN, graph generation is done. Should be done in 3 days.
+
* Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.
  
 
=== Sparse DNN ===
 
=== Sparse DNN ===
第14行: 第14行:
  
 
=== Tencent exps ===
 
=== Tencent exps ===
 +
* online support
 +
* garbage model training
 +
* VAD optimization
  
 
==DNN Confidence estimation==
 
==DNN Confidence estimation==
第19行: 第22行:
 
* Distribution graph is obtained. The performance seems bad.
 
* Distribution graph is obtained. The performance seems bad.
 
* A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
 
* A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very important.
+
* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
  
 
* To be done:
 
* To be done:
第25行: 第28行:
 
# No-tone confidence, on going
 
# No-tone confidence, on going
  
 
+
==GFCC DNN ==
===GFCC DNN ==
+
  
 
*GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.  
 
*GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.  
第33行: 第35行:
 
==Stream decoding==
 
==Stream decoding==
 
* the code is done. Simple testing is completed.
 
* the code is done. Simple testing is completed.
* Problem 1: CMN initialization is not prefect. Need to train a better initial CMN model.
+
* Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
 
* Problem 2: balance for posterior-based silence detection.
 
* Problem 2: balance for posterior-based silence detection.
  
第46行: 第48行:
 
== Embedded progress ==
 
== Embedded progress ==
  
* GFCC-based engine test
+
* GFCC-based engine test. Just started.
 
* Attain a performance curve: RT,memory size,package size Vs vocabulary size.
 
* Attain a performance curve: RT,memory size,package size Vs vocabulary size.
* A new demo released for 4600 song names.
+
* A new demo released for [[4600 song names]]. [http://cslt.riit.tsinghua.edu.cn/csltdemo/public/release/easr/easr.v1.0.song.apk download here]

2013年8月20日 (二) 05:17的最后版本

Data sharing

  • LM count files still undelivered!

DNN progress

Discriminative DNN

  • Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.

Sparse DNN

  • Iterative sparse sticky training runs. More sparsity is expected.

Tencent exps

  • online support
  • garbage model training
  • VAD optimization

DNN Confidence estimation

  • Distribution graph is obtained. The performance seems bad.
  • A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
  • The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
  • To be done:
  1. CI phone confidence, on going
  2. No-tone confidence, on going

GFCC DNN

  • GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
  • GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.

Stream decoding

  • the code is done. Simple testing is completed.
  • Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
  • Problem 2: balance for posterior-based silence detection.

Subgraph integration

  • G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
  • HCLG integration is done. A bug fixed, passed initial test.
  • Online integration cost is 1 minute. Need to optimize.
  • Need thorough testing with the Tencent test suite.
  • Need to tune the subgraph feeding probability.

Embedded progress

  • GFCC-based engine test. Just started.
  • Attain a performance curve: RT,memory size,package size Vs vocabulary size.
  • A new demo released for 4600 song names. download here