“2014-03-21”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # GA-based block...”创建新页面)
 
第1行: 第1行:
 
==Resoruce Building==
 
==Resoruce Building==
 
* Current text resource has been re-arranged and listed
 
* Current text resource has been re-arranged and listed
 +
 +
== Leftover questions==
 +
* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
 +
* Multi GPU training: Error encountered
 +
* Multilanguage training
 +
* Investigating LOUDS FST.
 +
* CLG embedded decoder plus online compiler.
  
 
== AM development ==
 
== AM development ==
第10行: 第17行:
 
# GA-based block sparsity
 
# GA-based block sparsity
 
# code ready, testing on pure matrix multiplication  
 
# code ready, testing on pure matrix multiplication  
 
 
=== Efficient DNN training ===
 
 
# Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
 
 
===Multi GPU training===
 
* Error encountered
 
  
 
===GMM - DNN co-training===
 
===GMM - DNN co-training===
* Initial DNN test done
+
* Co-training using Tencent data
:* tri4b - > DNN  (org)
+
:* DNN alignmenment -> tri4b
+
:* tri4b alignment -> DNN (re-train)
+
 
+
<pre>
+
  model/testcase              |  test_dev93(cv)      |    test_eval92
+
    --------------------------------------------------------------
+
    8400-80000(org)          |    7.41              |      4.13
+
    --------------------------------------------------------------
+
    re-train (Keep state #)  |    7.20              |      4.24
+
    --------------------------------------------------------------
+
    re-train (Free state #)  |    7.29              |      4.31
+
    --------------------------------------------------------------
+
</pre>
+
 
+
=== Multilanguage training===
+
 
+
# Pure Chinese training reached 4.9%
+
# Chinese + English reduced to 7.9%
+
# English phone set should discriminate beginning phone and ending phone
+
# Should set up multilingual network structure which shares low layers but separate languages at high layers
+
  
 
===Noise training===
 
===Noise training===
 
 
* Train with wsj database by corrupting data with various noise types
 
* Train with wsj database by corrupting data with various noise types
:* Almost all training conditions are completed
+
:* Single noise injection
:* Interesting results in multi-conditional training (white + cafe) and test on park/station
+
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/7e/White-eps-converted-to.pdf White noise training]
 +
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/ec/Cafe-eps-converted-to.pdf Cafe noise training]
 +
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/39/Car-eps-converted-to.pdf car noise training]
  
===AMR compression re-training===
+
:* Multi noise injection
* WeChat uses AMR compression method, which requires adaptation for our AM
+
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/fc/White_cafe_clean-eps-converted-to.pdf white+cafe noise training]
* Test AMR & non-AMR model
+
  
<pre>
 
model wav amr
 
 
xent baseline 4.47
 
wav_mpe baseline        4.20 36.77
 
 
amr_mpe_lr_1e-5 6.27 8.95
 
amr_mpe_lr_1e-4 7.58 8.68
 
 
amr_xEnt_lr_1e-5 6.89 7.99
 
amr_xEnt_lr_1e-4 6.61 7.28
 
amr_xEnt_lr_0.08 5.72 6.20
 
 
</pre>
 
  
 +
===AMR compression re-training===
  
* Prepare to do adaptation on 1700h  
+
* 1700h AMR training on going
* Prepare to do mixing xEnt test
+
  
 
===GFbank===
 
===GFbank===
  
 +
* gfbank is better than gfcc
 +
* gfbank is better than fbank
 +
* gfbank + fbank seems outperforms others
  
* Finished the first round of gfbank training & test
 
* The same gmm model (mfcc feature) was used to get the alignment
 
* Traing fbank & gfbank based on the mfcc alignment
 
* Clean training and noise test
 
 
<pre>
 
clean 5dB 10dB 15dB 20dB 25dB
 
gfbank 4.22 73.03 39.20 16.41 8.36 5.60
 
gfbank_80 4.36 74.41 42.94 18.13 8.59 5.85
 
fbank_zmy 3.97 74.78 44.57 18.80 8.54 5.30
 
</pre>
 
 
* gfbank + fbank 80 dim training/test
 
 
 
===Engine optimization===
 
 
* Investigating LOUDS FST.
 
  
  
第128行: 第77行:
  
 
*3T + tencent LM combination:
 
*3T + tencent LM combination:
:* Combine the 3T voc (11w) and the tencent 8w voca
 
:* re-segmentation
 
:* compute PPL with the 3T and tencent LM
 
:* compute the best mixing weights
 
:* the mixing weight is wrong ....
 
:* if we mix the two by equal weight (0.5/0.5), performance is better than the individual
 
 
 
*3T + QA model combination
 
*3T + QA model combination
 
==QA Matching==
 
 
* FST-based matching
 
:* Investigating why openfST union does not lead to a determinizable graph
 
:* Test the pattern label
 
 
* TF/IDF weight
 
:* code is done, TF/IDF weight can be used right now.
 
  
 
==Embedded development==
 
==Embedded development==
 +
* English scoring looks fine
  
* CLG embedded decoder is almost done. Online compiler is on progress.
+
==QA==
* English scoring is under go
+
  
 +
===FST-based matching===
 +
:* Code done. Simple test done
 +
:* Ready for large scale test
  
  
 
==Speech QA==
 
==Speech QA==
 
* N-best with entity LM was analyzed
 
:* WER vs QA accuracy is done
 
:* The figure shows that WER and QA accuracy is positively related
 
:* Addding song names and singer names improve performance in most cases
 
:* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA
 
:* The results on [[媒体文件:Music_QA_wer.pdf]]
 
 
 
 
*Class LM QA
 
*Class LM QA
* Use QA LM as the baseine
+
:* Now find that with smaller weight to the class FST, better performance is obtained
* Tag singer names and song names
+
:* Now it is very difficult to retrieve the words that can not be found by the original FST
* build tag LM
+
:* Test negative weights
* Using graph integration to resolve the tags
+
* Adjusting in-tag weight
+
* Smaller weight produces more entity recognition
+
* Check if the recognized songs/singers are correct/wrong
+
 
+
<pre>
+
1, non-merge
+
    BaseLine:
+
          qa-singer-song
+
    songs      41
+
    singers    23
+
 
+
2, HCLG-merge
+
    Weight means the multiplier of the sub-graph entry.
+
  (1) LM:1e-5
+
    weight  0.00000001  0.0001  0.001  0.01    1  10
+
    songs      20        20      21    19      9    4
+
    singers    13        13      13    13      2    2
+
</pre>
+

2014年3月21日 (五) 07:10的版本

Resoruce Building

  • Current text resource has been re-arranged and listed

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.

AM development

Sparse DNN

  • Optimal Brain Damage(OBD).
  1. GA-based block sparsity
  2. code ready, testing on pure matrix multiplication

GMM - DNN co-training

  • Co-training using Tencent data

Noise training

  • Train with wsj database by corrupting data with various noise types
  • Single noise injection

White noise training Cafe noise training car noise training

  • Multi noise injection

white+cafe noise training


AMR compression re-training

  • 1700h AMR training on going

GFbank

  • gfbank is better than gfcc
  • gfbank is better than fbank
  • gfbank + fbank seems outperforms others


Word to Vector

  • Data preparation
  • Prepared 7 category totally 500+ articles
  • Prepared Sogou 9-class text, totally 9*2000 articles
  • Achieved Fudan 11-class text data, only for testing
  • Improved wordvector with multi sense
  • Almost impossible with the toolkit
  • Can think of pre-training vectors and then do clusering
  • WordVecteor-based keyword extraction
  • Decide to use the Sogou data to do extraction
  • Evaluate the keyword in the classification task
  • Wordvector based on classification
  • Decide to use the Sogou data to do extraction


LM development

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • boundary-involved char NNLM training done
  • Test on going
  • Investigate MS RNN LM training

3T Sogou LM

  • 3T + tencent LM combination:
  • 3T + QA model combination

Embedded development

  • English scoring looks fine

QA

FST-based matching

  • Code done. Simple test done
  • Ready for large scale test


Speech QA

  • Class LM QA
  • Now find that with smaller weight to the class FST, better performance is obtained
  • Now it is very difficult to retrieve the words that can not be found by the original FST
  • Test negative weights