“2014-03-21”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(相同用户的9个中间修订版本未显示)
第12行: 第12行:
  
 
=== Sparse DNN ===
 
=== Sparse DNN ===
 +
* GA-based block sparsity
 +
:* code ready, testing on pure matrix multiplication
  
* Optimal Brain Damage(OBD).
+
===GMM/DNN co-training===
 
+
# GA-based block sparsity
+
# code ready, testing on pure matrix multiplication
+
 
+
===GMM - DNN co-training===
+
 
* Co-training using Tencent data
 
* Co-training using Tencent data
 +
:* slightly better in GMM modeling when using DNN alignment
 +
:* worse performance when using the re-trained GMMs
  
 
===Noise training===
 
===Noise training===
* Train with wsj database by corrupting data with various noise types
 
:* Single noise injection
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
 
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
 
  
:* Multi noise injection
+
* Single noise injection
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
+
:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
 +
:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
 +
:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
 +
* Multi noise injection
 +
:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
  
  
第41行: 第39行:
 
* gfbank is better than fbank
 
* gfbank is better than fbank
 
* gfbank + fbank seems outperforms others
 
* gfbank + fbank seems outperforms others
 
 
  
 
==Word to Vector==
 
==Word to Vector==
第74行: 第70行:
 
* Investigate MS RNN LM training
 
* Investigate MS RNN LM training
  
===3T Sogou LM===
 
 
*3T + tencent LM combination:
 
*3T + QA model combination
 
  
==Embedded development==
+
==Pronunciation scoring==
* English scoring looks fine
+
* G-score done on 16k English model
 +
* The distribution of frames over phone/frame posterior scores seem highly discriminative
 +
* The distribution of the distance of the test utterance against the reference utterance seems a high discriminative score
  
 
==QA==
 
==QA==
第89行: 第83行:
  
  
==Speech QA==
+
===Speech QA===
 
*Class LM QA
 
*Class LM QA
 
:* Now find that with smaller weight to the class FST, better performance is obtained
 
:* Now find that with smaller weight to the class FST, better performance is obtained
 
:* Now it is very difficult to retrieve the words that can not be found by the original FST
 
:* Now it is very difficult to retrieve the words that can not be found by the original FST
 
:* Test negative weights
 
:* Test negative weights

2014年3月21日 (五) 07:20的最后版本

Resoruce Building

  • Current text resource has been re-arranged and listed

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.

AM development

Sparse DNN

  • GA-based block sparsity
  • code ready, testing on pure matrix multiplication

GMM/DNN co-training

  • Co-training using Tencent data
  • slightly better in GMM modeling when using DNN alignment
  • worse performance when using the re-trained GMMs

Noise training

  • Single noise injection
  • Multi noise injection


AMR compression re-training

  • 1700h AMR training on going

GFbank

  • gfbank is better than gfcc
  • gfbank is better than fbank
  • gfbank + fbank seems outperforms others

Word to Vector

  • Data preparation
  • Prepared 7 category totally 500+ articles
  • Prepared Sogou 9-class text, totally 9*2000 articles
  • Achieved Fudan 11-class text data, only for testing
  • Improved wordvector with multi sense
  • Almost impossible with the toolkit
  • Can think of pre-training vectors and then do clusering
  • WordVecteor-based keyword extraction
  • Decide to use the Sogou data to do extraction
  • Evaluate the keyword in the classification task
  • Wordvector based on classification
  • Decide to use the Sogou data to do extraction


LM development

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • boundary-involved char NNLM training done
  • Test on going
  • Investigate MS RNN LM training


Pronunciation scoring

  • G-score done on 16k English model
  • The distribution of frames over phone/frame posterior scores seem highly discriminative
  • The distribution of the distance of the test utterance against the reference utterance seems a high discriminative score

QA

FST-based matching

  • Code done. Simple test done
  • Ready for large scale test


Speech QA

  • Class LM QA
  • Now find that with smaller weight to the class FST, better performance is obtained
  • Now it is very difficult to retrieve the words that can not be found by the original FST
  • Test negative weights