“2014-03-21”版本间的差异

2014年3月21日 (五) 07:10的版本

Resoruce Building

Current text resource has been re-arranged and listed

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.

AM development

Sparse DNN

Optimal Brain Damage(OBD).

GA-based block sparsity
code ready, testing on pure matrix multiplication

GMM - DNN co-training

Co-training using Tencent data

Noise training

Train with wsj database by corrupting data with various noise types

Single noise injection

White noise training Cafe noise training car noise training

Multi noise injection

white+cafe noise training

AMR compression re-training

1700h AMR training on going

GFbank

gfbank is better than gfcc
gfbank is better than fbank
gfbank + fbank seems outperforms others

Word to Vector

Data preparation

Prepared 7 category totally 500+ articles
Prepared Sogou 9-class text, totally 9*2000 articles
Achieved Fudan 11-class text data, only for testing

Improved wordvector with multi sense

Almost impossible with the toolkit
Can think of pre-training vectors and then do clusering

WordVecteor-based keyword extraction

Decide to use the Sogou data to do extraction
Evaluate the keyword in the classification task

Wordvector based on classification

Decide to use the Sogou data to do extraction

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

boundary-involved char NNLM training done
Test on going

Investigate MS RNN LM training

3T Sogou LM

3T + tencent LM combination:
3T + QA model combination

Embedded development

English scoring looks fine

QA

FST-based matching

Code done. Simple test done
Ready for large scale test

Speech QA

Class LM QA

Now find that with smaller weight to the class FST, better performance is obtained
Now it is very difficult to retrieve the words that can not be found by the original FST
Test negative weights

@@ 第1行： / 第1行： @@
 ==Resoruce Building==
 * Current text resource has been re-arranged and listed
+== Leftover questions==
+* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
+* Multi GPU training: Error encountered
+* Multilanguage training
+* Investigating LOUDS FST.
+* CLG embedded decoder plus online compiler.
 == AM development ==
@@ 第10行： / 第17行： @@
 # GA-based block sparsity
 # code ready, testing on pure matrix multiplication
-=== Efficient DNN training ===
-# Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
-===Multi GPU training===
-* Error encountered
 ===GMM - DNN co-training===
-* Initial DNN test done
+* Co-training using Tencent data
-:* tri4b - > DNN   (org)
-:* DNN alignmenment -> tri4b
-:* tri4b alignment -> DNN (re-train)
-<pre>
-  model/testcase              |  test_dev93(cv)       |     test_eval92
-    --------------------------------------------------------------
--80000(org)           |    7.41               |      4.13
-    --------------------------------------------------------------
-    re-train (Keep state #)   |    7.20               |      4.24
-    --------------------------------------------------------------
-    re-train (Free state #)   |    7.29               |      4.31
-    --------------------------------------------------------------
-</pre>
-=== Multilanguage training===
-# Pure Chinese training reached 4.9%
-# Chinese + English reduced to 7.9%
-# English phone set should discriminate beginning phone and ending phone
-# Should set up multilingual network structure which shares low layers but separate languages at high layers
 ===Noise training===
 * Train with wsj database by corrupting data with various noise types
-:* Almost all training conditions are completed
+:* Single noise injection
-:* Interesting results in multi-conditional training (white + cafe) and test on park/station
+[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
+[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
+[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
-===AMR compression re-training===
+:* Multi noise injection
-* WeChat uses AMR compression method, which requires adaptation for our AM
+[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
-* Test AMR & non-AMR model
-<pre>
-model			wav	amr
-xent baseline		4.47
-wav_mpe baseline        4.20	36.77
-amr_mpe_lr_1e-5		6.27	8.95
-amr_mpe_lr_1e-4		7.58	8.68
-amr_xEnt_lr_1e-5	6.89	7.99
-amr_xEnt_lr_1e-4	6.61	7.28
-amr_xEnt_lr_0.08	5.72	6.20
-</pre>
+===AMR compression re-training===
-* Prepare to do adaptation on 1700h
+* 1700h AMR training on going
-* Prepare to do mixing xEnt test
 ===GFbank===
+* gfbank is better than gfcc
+* gfbank is better than fbank
+* gfbank + fbank seems outperforms others
-* Finished the first round of gfbank training & test
-* The same gmm model (mfcc feature) was used to get the alignment
-* Traing fbank & gfbank based on the mfcc alignment
-* Clean training and noise test
-<pre>
-		clean	5dB	10dB	15dB	20dB	25dB
-gfbank		4.22	73.03	39.20	16.41	8.36	5.60
-gfbank_80	4.36	74.41	42.94	18.13	8.59	5.85
-fbank_zmy	3.97	74.78	44.57	18.80	8.54	5.30
-</pre>
-* gfbank + fbank 80 dim training/test
-===Engine optimization===
-* Investigating LOUDS FST.
@@ 第128行： / 第77行： @@
 *3T + tencent LM combination:
-:* Combine the 3T voc (11w) and the tencent 8w voca
-:* re-segmentation
-:* compute PPL with the 3T and tencent LM
-:* compute the best mixing weights
-:* the mixing weight is wrong ....
-:* if we mix the two by equal weight (0.5/0.5), performance is better than the individual
 *3T + QA model combination
-==QA Matching==
-* FST-based matching
-:* Investigating why openfST union does not lead to a determinizable graph
-:* Test the pattern label
-* TF/IDF weight
-:* code is done, TF/IDF weight can be used right now.
 ==Embedded development==
+* English scoring looks fine
-* CLG embedded decoder is almost done. Online compiler is on progress.
+==QA==
-* English scoring is under go
+===FST-based matching===
+:* Code done. Simple test done
+:* Ready for large scale test
 ==Speech QA==
-* N-best with entity LM was analyzed
-:* WER vs QA accuracy is done
-:* The figure shows that WER and QA accuracy is positively related
-:* Addding song names and singer names improve performance in most cases
-:* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA
-:* The results on [[媒体文件:Music_QA_wer.pdf]]
 *Class LM QA
-* Use QA LM as the baseine
+:* Now find that with smaller weight to the class FST, better performance is obtained
-* Tag singer names and song names
+:* Now it is very difficult to retrieve the words that can not be found by the original FST
-* build tag LM
+:* Test negative weights
-* Using graph integration to resolve the tags
-* Adjusting in-tag weight
-* Smaller weight produces more entity recognition
-* Check if the recognized songs/singers are correct/wrong
-<pre>
-, non-merge
-    BaseLine:
-           qa-singer-song
-    songs       41
-    singers     23
-, HCLG-merge
-    Weight means the multiplier of the sub-graph entry.
-  (1) LM:1e-5
-    weight  0.00000001  0.0001   0.001  0.01    1   10
-    songs      20        20      21    19       9    4
-    singers    13        13      13    13       2    2
-</pre>

“2014-03-21”版本间的差异

2014年3月21日 (五) 07:10的版本

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

GMM - DNN co-training

Noise training

AMR compression re-training

GFbank

Word to Vector

LM development

NN LM

3T Sogou LM

Embedded development

QA

FST-based matching

Speech QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具