2013-05-10
来自cslt Wiki
目录
Data sharing
- LM count files still undelivered!
DNN progress
Experiments
- setups for input layer
- s1: mfcc(13), splice +-5[143]
- s2: mfcc(13), splice +-5(143), LDA[143]
- s3: mfcc(13), delta(39), splice +-5(429), LDA[143]
- s4: mfcc(13), delta(39), splice +-5(429), LDA[300]
- setups for alignment
- tri1: triphone training, feature input: mfcc(13), delta[39]. #pdfs 1651, #gaussians 10028
- tri2: LDA/MLLT training, feature input: mfcc(13), delta(39), splice +-4(351), LDA[40]. #pdfs 3536, #gaussians 39995
- other notes
- about 100 hours training data
- 88k LM, biglm decoding (1e-5 / 1e-9)
- gpu-based nnet training, in-1200-1200-1200-1200-out
- results
Test Set | fMMI | s1/tri1 | s2/tri1 | s3/tri1 | s4/tri1 | s2/tri2 | s4/tri2 | cpu-based (like s4/tri1) |
---|---|---|---|---|---|---|---|---|
map | 28.58 | 25.38 | 24.47 | 26.16 | 26.20 | 22.85 | 24.27 | 26.45 |
2044 | 24.79 | 23.58 | 22.82 | 23.84 | 24.13 | 21.45 | 22.76 | 24.66 |
notetp3 | 21.64 | 16.08 | 14.89 | 15.92 | 15.97 | 14.89 | 14.79 | 16.14 |
1900 | 8.19 | 8.55 | 8.43 | 8.66 | 8.90 | 7.30 | 7.91 | 8.23 |
general | 39.63 | 36.18 | 34.79 | 35.88 | 35.90 | 33.06 | 33.79 | 38.02 |
online1 | 35.19 | 34.68 | 33.90 | 33.45 | 33.38 | 32.93 | 32.43 | 33.00 |
online2 | 28.30 | 27.27 | 26.61 | 26.26 | 26.36 | 25.94 | 25.69 | 26.63 |
speedup | 28.45 | 24.97 | 24.40 | 24.55 | 25.42 | 23.04 | 23.67 | 27.17 |
- conclusion
- the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases.
- the fine training leads to significant better performance than the rough training (see s2/tri1 vs s2/tri2 & s4/tri1 vs s4/tri2)
- the delta features do not help, actually harm the perfromance (see s2/tri1 vs s3/tri1 & s4/tri1)
- the identical-dimension LDA helps the performance (see s1/tri1 vs s2/tri1)
the best system is s2/tri2: without delta, apply a linear LDA.
- to be done
- experiment with Tencent feature
- migrate the bottlenck structure to the 100k task with s2/tri2.
Tencent exps
实验配置:输出状态3399,前后5帧扩展,实验均在8k的sw3下测试
- HLDA操作对于NN的影响,PLP:51.7, PLP-HLDA:56.8, 可知对于PLP特征而言,HLDA操作对于NN训练无效
- FBank特征试验:FBank:63.5,FBank-DCT-41:57.2, FBank-DCT-28:51.6,可知FBank需做DCT变换,并且维度需要适当的调节
- PLP特征与FBank特征比较:PLP-11帧splice:51.7,FBank-DCT-15-41帧splice:51.6,在保持特征维度相似的情况下,2者性能相当。
- Momentum实验:模型参数更新中,sgd算法添加动量项,同样的收敛准则,添加动量项的试验收敛次数变少,但性能变差
- Minibatch size实验:128:51.6,1024:51.7,minibatch size较大性能略好。
- Conclusions
- The conventional recipe seems the best: PLP without HLDA. Referring to the CSLT results, the identical-dimension LDA should work, while ould not be used as dimension reduction.
- some linear transform, DCT/LDA should contribution, due to the prior knowledge to avoid the network overtraining.
- the PLP & FBank comparision seems strange: we should not regulate the same dimension, instead, the best performance. The 41 frame splice is a bit too long. Comparing to the CSLT result, fbank-dct (which is the same as mfcc) 12 * 11 frame slice is good enough. The large dimension of PLP-11 is the delta feature, which has been verified to be harmful in the CSLT experiments.
GPU & CPU merge
- current schedule is to migrate GPU to CPU. Will start to code in this week.
L-1 sparse initial training
- experiments on L1 and L2 penalty
- LM: 4G gigabyte LM, 1e-5 small LM
- AM: 100hour tri4b_nn
Test Set | 0 | 1.00E-06 | 1.00E-05 | 2.50E-05 | 5.00E-05 | 7.50E-05 | 1.00E-04 |
---|---|---|---|---|---|---|---|
map | 61.06 | 61.18 | 61.31 | 61.72 | 62.89 | 62.84 | 62.48 |
2044 | 49.53 | 49.58 | 49.84 | 49.94 | 50.71 | 51.08 | 51.08 |
notetp3 | 43.44 | 43.44 | 43.55 | 44.58 | 45.22 | 44.95 | 45.76 |
1900 | 38.50 | 38.54 | 39.00 | 39.21 | 40.33 | 40.53 | 40.60 |
general | 61.24 | 61.22 | 61.69 | 61.84 | 62.71 | 62.83 | 62.93 |
online1 | 58.02 | 58.05 | 58.31 | 58.23 | 58.83 | 59.06 | 59.46 |
online2 | 53.62 | 53.70 | 54.03 | 53.93 | 54.65 | 54.94 | 55.51 |
speedup | 57.51 | 57.49 | 57.93 | 58.31 | 59.75 | 60.08 | 59.52 |
- Conclusions
- L1 and L2 penalty do not work in the current nnet-GPU code.
- will work to check the L1 code and change the penalty scheme.
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: stuck. various sp models tried but don't help
- Need thorough debug this week.
Embedded progress
- Status:
- first embedded demo done, 1000 words take 3.2M memory.
- accuracy test not yet finished
- training acoustic model for sphinx. The toolkit runs well, need prepare data and start parallel training.
- To be done
- finish AM training
- run offline test