2013-10-11

Data sharing

LM count files still undelivered!

DNN progress

Sparse DNN

Optimal Brain Damage(OBD). Code ready. Looking for testing.

Tencent exps

N/A

Noisy training

Dirichlet noise random corruption done. Performance show significant improved with noisy test.

The impact on clean speech is various, some test cases obtained even better performance compared with normal training, e.g., online1 and rec1900.

Continuous LM

1. SogouT 3T data cleaning up. Keep on running. The initial results with 7G training text in terms of PPL:

SogouQ test: 292
Tencent online1: 578
Tencent online2: 475

This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the different domain.

2. NN LM

Split the most frequent 10k words into 10 x 1024 sub sets and model in 10 networks.

Training data: QA 500M text
Test data: Tencent online2
Dev data: Tencent online1

short_list   cslm_ppl  cslm_sum  n-gram_sum  all_ppl coverage
  0   -1023    12.12     39.7%     60.30%      122.54   58.86 
  1024-2047    1.75      6.56      93.44       118.92   11.35
  2048-3071    1.41      3.75      96.25       117.16    6.41
  3072-4095    1.23      2.17      97.83       116.24    4.27
  4096-5119    1.26      2.24      97.76       116.13    3.10
  5120-6143    1.18      1.69      98.31       116.82    2.38
  6144-7167    1.15      1.22      98.78       117.19    1.85
  7168-8191    1.13      1.13      98.87       117.34    1.50
  8192-9217    1.07      0.58      99.42       116.06    1.23
  9218-10241   1.06      0.44      99.56       115.86    1.03
  n-gram baseline:                 100%        402

note:coverge  -- the proportion of short-list frequence in the train data
     cslm_sum --  the percent of word predicted by cslm
     n-gram_sum-- the percent of word predicted by n-gram
     cslm_ppl  -- the short-list ppl calculated by clsm

3. CSLM to N-gram failed (with threshold=1e-5), due to the large number of n-grams expanded from the network. So the expansion approach is not suitable. This is something reasonable since the network is highly compacted.

4. Keep on lattice re-scoring with multiple CSLM networks.