“2014-03-21”版本间的差异

2014年3月21日 (五) 07:20的最后版本

Resoruce Building

Current text resource has been re-arranged and listed

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.

AM development

Sparse DNN

GA-based block sparsity

code ready, testing on pure matrix multiplication

GMM/DNN co-training

Co-training using Tencent data

slightly better in GMM modeling when using DNN alignment
worse performance when using the re-trained GMMs

Noise training

Single noise injection

Multi noise injection

white+cafe noise training

AMR compression re-training

1700h AMR training on going

GFbank

gfbank is better than gfcc
gfbank is better than fbank
gfbank + fbank seems outperforms others

Word to Vector

Data preparation

Prepared 7 category totally 500+ articles
Prepared Sogou 9-class text, totally 9*2000 articles
Achieved Fudan 11-class text data, only for testing

Improved wordvector with multi sense

Almost impossible with the toolkit
Can think of pre-training vectors and then do clusering

WordVecteor-based keyword extraction

Decide to use the Sogou data to do extraction
Evaluate the keyword in the classification task

Wordvector based on classification

Decide to use the Sogou data to do extraction

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

boundary-involved char NNLM training done
Test on going

Investigate MS RNN LM training

Pronunciation scoring

G-score done on 16k English model
The distribution of frames over phone/frame posterior scores seem highly discriminative
The distribution of the distance of the test utterance against the reference utterance seems a high discriminative score

QA

FST-based matching

Code done. Simple test done
Ready for large scale test

Speech QA

Class LM QA

Now find that with smaller weight to the class FST, better performance is obtained
Now it is very difficult to retrieve the words that can not be found by the original FST
Test negative weights

@@ 第12行： / 第12行： @@
 === Sparse DNN ===
+* GA-based block sparsity
+:* code ready, testing on pure matrix multiplication
-* Optimal Brain Damage(OBD).
+===GMM/DNN co-training===
-# GA-based block sparsity
-# code ready, testing on pure matrix multiplication
-===GMM - DNN co-training===
 * Co-training using Tencent data
+:* slightly better in GMM modeling when using DNN alignment
+:* worse performance when using the re-trained GMMs
 ===Noise training===
-* Train with wsj database by corrupting data with various noise types
-:* Single noise injection
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
-:* Multi noise injection
+* Single noise injection
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
+:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
+:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
+:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
+* Multi noise injection
+:* [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
@@ 第41行： / 第39行： @@
 * gfbank is better than fbank
 * gfbank + fbank seems outperforms others
 ==Word to Vector==
@@ 第74行： / 第70行： @@
 * Investigate MS RNN LM training
-===3T Sogou LM===
-*3T + tencent LM combination:
-*3T + QA model combination
-==Embedded development==
+==Pronunciation scoring==
-* English scoring looks fine
+* G-score done on 16k English model
+* The distribution of frames over phone/frame posterior scores seem highly discriminative
+* The distribution of the distance of the test utterance against the reference utterance seems a high discriminative score
 ==QA==
@@ 第89行： / 第83行： @@
-==Speech QA==
+===Speech QA===
 *Class LM QA
 :* Now find that with smaller weight to the class FST, better performance is obtained
 :* Now it is very difficult to retrieve the words that can not be found by the original FST
 :* Test negative weights

“2014-03-21”版本间的差异

2014年3月21日 (五) 07:20的最后版本

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

GMM/DNN co-training

Noise training

AMR compression re-training

GFbank

Word to Vector

LM development

NN LM

Pronunciation scoring

QA

FST-based matching

Speech QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具