“2013-10-11”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“== Data sharing == * LM count files still undelivered! == DNN progress == === Sparse DNN === * Optimal Brain Damage(OBD). Code read. Looking for testing. === Tence...”创建新页面)
 
 
第7行: 第7行:
 
=== Sparse DNN ===
 
=== Sparse DNN ===
  
* Optimal Brain Damage(OBD). Code read. Looking for testing.
+
* Optimal Brain Damage(OBD). Code ready. Looking for testing.
  
 
=== Tencent exps ===
 
=== Tencent exps ===
第15行: 第15行:
 
==Noisy training ==
 
==Noisy training ==
  
 +
* Dirichlet noise random corruption done. Performance show significant improved with noisy test.
 +
 +
* The impact on clean speech is various, some test cases obtained even better performance compared with normal training, e.g., online1 and rec1900.
  
  
第52行: 第55行:
 
       cslm_sum --  the percent of word predicted by cslm
 
       cslm_sum --  the percent of word predicted by cslm
 
       n-gram_sum-- the percent of word predicted by n-gram
 
       n-gram_sum-- the percent of word predicted by n-gram
       cslm_ppl  -- the shor-list ppl calculated by clsm
+
       cslm_ppl  -- the short-list ppl calculated by clsm
  
  
3. CSLM to Ngram fails (with threshold=1e-5), due to the large number of ngrams expanded from the network. So the expansion approach is not suitable. This is something reasonable since the network is highly compacted.
+
3. CSLM to N-gram failed (with threshold=1e-5), due to the large number of n-grams expanded from the network. So the expansion approach is not suitable. This is something reasonable since the network is highly compacted.
  
4. Keep on lattice rescoring with multiple CSLM networks.
+
4. Keep on lattice re-scoring with multiple CSLM networks.

2013年10月18日 (五) 09:21的最后版本

Data sharing

  • LM count files still undelivered!

DNN progress

Sparse DNN

  • Optimal Brain Damage(OBD). Code ready. Looking for testing.

Tencent exps

N/A


Noisy training

  • Dirichlet noise random corruption done. Performance show significant improved with noisy test.
  • The impact on clean speech is various, some test cases obtained even better performance compared with normal training, e.g., online1 and rec1900.


Continuous LM

1. SogouT 3T data cleaning up. Keep on running. The initial results with 7G training text in terms of PPL:

  • SogouQ test: 292
  • Tencent online1: 578
  • Tencent online2: 475


This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the different domain.

2. NN LM

Split the most frequent 10k words into 10 x 1024 sub sets and model in 10 networks.

  • Training data: QA 500M text
  • Test data: Tencent online2
  • Dev data: Tencent online1
short_list   cslm_ppl  cslm_sum  n-gram_sum  all_ppl coverage
  0   -1023    12.12     39.7%     60.30%      122.54   58.86 
  1024-2047    1.75      6.56      93.44       118.92   11.35
  2048-3071    1.41      3.75      96.25       117.16    6.41
  3072-4095    1.23      2.17      97.83       116.24    4.27
  4096-5119    1.26      2.24      97.76       116.13    3.10
  5120-6143    1.18      1.69      98.31       116.82    2.38
  6144-7167    1.15      1.22      98.78       117.19    1.85
  7168-8191    1.13      1.13      98.87       117.34    1.50
  8192-9217    1.07      0.58      99.42       116.06    1.23
  9218-10241   1.06      0.44      99.56       115.86    1.03
  n-gram baseline:                 100%        402
note:coverge  -- the proportion of short-list frequence in the train data
     cslm_sum --  the percent of word predicted by cslm
     n-gram_sum-- the percent of word predicted by n-gram
     cslm_ppl  -- the short-list ppl calculated by clsm


3. CSLM to N-gram failed (with threshold=1e-5), due to the large number of n-grams expanded from the network. So the expansion approach is not suitable. This is something reasonable since the network is highly compacted.

4. Keep on lattice re-scoring with multiple CSLM networks.