“2013-05-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(某位用户的一个中间修订版本未显示)
第21行: 第21行:
 
! Test Set !! fMMI !! s1/tri1 !! s2/tri1 !! s3/tri1 !! s4/tri1 !! s2/tri2 !! s4/tri2 !! cpu-based (like s4/tri1)
 
! Test Set !! fMMI !! s1/tri1 !! s2/tri1 !! s3/tri1 !! s4/tri1 !! s2/tri2 !! s4/tri2 !! cpu-based (like s4/tri1)
 
|-
 
|-
|map    || || 25.38 || 24.47 || 26.16 || 26.20 || 22.85 || 24.27 || 26.45
+
|map    || 28.58 || 25.38 || 24.47 || 26.16 || 26.20 || 22.85 || 24.27 || 26.45
 
|-
 
|-
|2044  || || 23.58 || 22.82 || 23.84 || 24.13 || 21.45 || 22.76 || 24.66
+
|2044  || 24.79 || 23.58 || 22.82 || 23.84 || 24.13 || 21.45 || 22.76 || 24.66
 
|-
 
|-
|notetp3|| || 16.08 || 14.89 || 15.92 || 15.97 || 14.89 || 14.79 || 16.14
+
|notetp3|| 21.64 || 16.08 || 14.89 || 15.92 || 15.97 || 14.89 || 14.79 || 16.14
 
|-
 
|-
|1900  || ||  8.55 ||  8.43 ||  8.66 ||  8.90 ||  7.30 ||  7.91 ||  8.23
+
|1900  || 8.19 ||  8.55 ||  8.43 ||  8.66 ||  8.90 ||  7.30 ||  7.91 ||  8.23
 
|-
 
|-
|general|| || 36.18 || 34.79 || 35.88 || 35.90 || 33.06 || 33.79 || 38.02
+
|general|| 39.63 || 36.18 || 34.79 || 35.88 || 35.90 || 33.06 || 33.79 || 38.02
 
|-
 
|-
|online1|| || 34.68 || 33.90 || 33.45 || 33.38 || 32.93 || 32.43 || 33.00
+
|online1|| 35.19 || 34.68 || 33.90 || 33.45 || 33.38 || 32.93 || 32.43 || 33.00
 
|-
 
|-
|online2|| || 27.27 || 26.61 || 26.26 || 26.36 || 25.94 || 25.69 || 26.63
+
|online2|| 28.30 || 27.27 || 26.61 || 26.26 || 26.36 || 25.94 || 25.69 || 26.63
 
|-
 
|-
|speedup|| || 24.97 || 24.40 || 24.55 || 25.42 || 23.04 || 23.67 || 27.17
+
|speedup|| 28.45 || 24.97 || 24.40 || 24.55 || 25.42 || 23.04 || 23.67 || 27.17
 
|-
 
|-
 
|}
 
|}
第64行: 第64行:
 
# The conventional recipe seems the best: PLP without HLDA. Referring to the CSLT results, the identical-dimension LDA should work, while ould not be used as dimension reduction.
 
# The conventional recipe seems the best: PLP without HLDA. Referring to the CSLT results, the identical-dimension LDA should work, while ould not be used as dimension reduction.
 
# some linear transform, DCT/LDA should contribution, due to the prior knowledge to avoid the network overtraining.
 
# some linear transform, DCT/LDA should contribution, due to the prior knowledge to avoid the network overtraining.
 
+
# the PLP & FBank comparision seems strange: we should not regulate the same dimension, instead, the best performance. The 41 frame splice is a bit too long. Comparing to the CSLT result, fbank-dct (which is the same as mfcc) 12 * 11 frame slice is good enough. The large dimension of PLP-11 is the delta feature, which has been verified to be harmful in the CSLT experiments.
  
  

2013年5月14日 (二) 01:11的最后版本

Data sharing

  • LM count files still undelivered!

DNN progress

Experiments

  • setups for input layer
s1: mfcc(13), splice +-5[143]
s2: mfcc(13), splice +-5(143), LDA[143]
s3: mfcc(13), delta(39), splice +-5(429), LDA[143]
s4: mfcc(13), delta(39), splice +-5(429), LDA[300]
  • setups for alignment
tri1: triphone training, feature input: mfcc(13), delta[39]. #pdfs 1651, #gaussians 10028
tri2: LDA/MLLT training, feature input: mfcc(13), delta(39), splice +-4(351), LDA[40]. #pdfs 3536, #gaussians 39995
  • other notes
about 100 hours training data
88k LM, biglm decoding (1e-5 / 1e-9)
gpu-based nnet training, in-1200-1200-1200-1200-out
  • results
Test Set fMMI s1/tri1 s2/tri1 s3/tri1 s4/tri1 s2/tri2 s4/tri2 cpu-based (like s4/tri1)
map 28.58 25.38 24.47 26.16 26.20 22.85 24.27 26.45
2044 24.79 23.58 22.82 23.84 24.13 21.45 22.76 24.66
notetp3 21.64 16.08 14.89 15.92 15.97 14.89 14.79 16.14
1900 8.19 8.55 8.43 8.66 8.90 7.30 7.91 8.23
general 39.63 36.18 34.79 35.88 35.90 33.06 33.79 38.02
online1 35.19 34.68 33.90 33.45 33.38 32.93 32.43 33.00
online2 28.30 27.27 26.61 26.26 26.36 25.94 25.69 26.63
speedup 28.45 24.97 24.40 24.55 25.42 23.04 23.67 27.17
  • conclusion
  1. the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases.
  2. the fine training leads to significant better performance than the rough training (see s2/tri1 vs s2/tri2 & s4/tri1 vs s4/tri2)
  3. the delta features do not help, actually harm the perfromance (see s2/tri1 vs s3/tri1 & s4/tri1)
  4. the identical-dimension LDA helps the performance (see s1/tri1 vs s2/tri1)

the best system is s2/tri2: without delta, apply a linear LDA.

  • to be done
  1. experiment with Tencent feature
  2. migrate the bottlenck structure to the 100k task with s2/tri2.


Tencent exps

实验配置:输出状态3399,前后5帧扩展,实验均在8k的sw3下测试

  1. HLDA操作对于NN的影响,PLP:51.7, PLP-HLDA:56.8, 可知对于PLP特征而言,HLDA操作对于NN训练无效
  2. FBank特征试验:FBank:63.5,FBank-DCT-41:57.2, FBank-DCT-28:51.6,可知FBank需做DCT变换,并且维度需要适当的调节
  3. PLP特征与FBank特征比较:PLP-11帧splice:51.7,FBank-DCT-15-41帧splice:51.6,在保持特征维度相似的情况下,2者性能相当。
  4. Momentum实验:模型参数更新中,sgd算法添加动量项,同样的收敛准则,添加动量项的试验收敛次数变少,但性能变差
  5. Minibatch size实验:128:51.6,1024:51.7,minibatch size较大性能略好。
  • Conclusions
  1. The conventional recipe seems the best: PLP without HLDA. Referring to the CSLT results, the identical-dimension LDA should work, while ould not be used as dimension reduction.
  2. some linear transform, DCT/LDA should contribution, due to the prior knowledge to avoid the network overtraining.
  3. the PLP & FBank comparision seems strange: we should not regulate the same dimension, instead, the best performance. The 41 frame splice is a bit too long. Comparing to the CSLT result, fbank-dct (which is the same as mfcc) 12 * 11 frame slice is good enough. The large dimension of PLP-11 is the delta feature, which has been verified to be harmful in the CSLT experiments.


GPU & CPU merge

  1. current schedule is to migrate GPU to CPU. Will start to code in this week.


L-1 sparse initial training

  • experiments on L1 and L2 penalty
  1. LM: 4G gigabyte LM, 1e-5 small LM
  2. AM: 100hour tri4b_nn


Test Set 0 1.00E-06 1.00E-05 2.50E-05 5.00E-05 7.50E-05 1.00E-04
map     61.06 61.18 61.31 61.72 62.89 62.84 62.48
2044   49.53 49.58 49.84 49.94 50.71 51.08 51.08
notetp3 43.44 43.44 43.55 44.58 45.22 44.95 45.76
1900   38.50 38.54 39.00 39.21 40.33 40.53 40.60
general 61.24 61.22 61.69 61.84 62.71 62.83 62.93
online1 58.02 58.05 58.31 58.23 58.83 59.06 59.46
online2 53.62 53.70 54.03 53.93 54.65 54.94 55.51
speedup 57.51 57.49 57.93 58.31 59.75 60.08 59.52
  • Conclusions
  1. L1 and L2 penalty do not work in the current nnet-GPU code.
  2. will work to check the L1 code and change the penalty scheme.


Kaldi/HTK merge

  • HTK2Kaldi: hold.
  • Kaldi2HTK: stuck. various sp models tried but don't help
  • Need thorough debug this week.


Embedded progress

  • Status:
  1. first embedded demo done, 1000 words take 3.2M memory.
  2. accuracy test not yet finished
  3. training acoustic model for sphinx. The toolkit runs well, need prepare data and start parallel training.
  • To be done
  1. finish AM training
  2. run offline test