“2013-05-17”版本间的差异

2013年5月17日 (五) 04:52的版本

Data sharing

LM count files still undelivered!

DNN progress

Experiments

setups for mfcc/plp

Test Set	fMMI	s1/tri1	s2/tri1	s3/tri1	s4/tri1	s2/tri2	s4/tri2	cpu-based (like s4/tri1)
map	28.58	25.38	24.47	26.16	26.20	22.85	24.27	26.45
2044	24.79	23.58	22.82	23.84	24.13	21.45	22.76	24.66
notetp3	21.64	16.08	14.89	15.92	15.97	14.89	14.79	16.14
1900	8.19	8.55	8.43	8.66	8.90	7.30	7.91	8.23
general	39.63	36.18	34.79	35.88	35.90	33.06	33.79	38.02
online1	35.19	34.68	33.90	33.45	33.38	32.93	32.43	33.00
online2	28.30	27.27	26.61	26.26	26.36	25.94	25.69	26.63
speedup	28.45	24.97	24.40	24.55	25.42	23.04	23.67	27.17

conclusion

the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases.
the fine training leads to significant better performance than the rough training (see s2/tri1 vs s2/tri2 & s4/tri1 vs s4/tri2)
the delta features do not help, actually harm the perfromance (see s2/tri1 vs s3/tri1 & s4/tri1)
the identical-dimension LDA helps the performance (see s1/tri1 vs s2/tri1)

the best system is s2/tri2: without delta, apply a linear LDA.

to be done

experiment with Tencent feature
migrate the bottlenck structure to the 100k task with s2/tri2.

Tencent exps

手动将NN网络的W权重，较小的置零，在保留30%左右的较大权重的条件下，系统性能未见明显衰减。
按照HTK模型的结构，以及HTK align的结构，修改Kaldi的GPU接口，验证并无问题，已开始较大规模数据训练（1000小时），网络结构前后5帧扩展，4个隐层，每层2048节点，输出15000个状态。使用mpe模型alignment，特征为plp特征，未做任何映射。
解码器仍在CLG结构下，修改声学模型计算接口，接入DNN模型，验证无问题，已开始效率优化。

待做实验：

验证不同学习率调节策略，指数下降衰减方式，newbob方式。
验证不同特征在大数据上的作用。
最后层不过softmax，降维得到BN特征实验，类似IBM BN做法

GPU & CPU merge

just started

Kaldi/HTK merge

HTK2Kaldi: hold.
Kaldi2HTK: pdf error problem.
workaround; use the BN feature to train HTK models, so without kaldi training.

Embedded progress

Status:

first embedded demo done, 1000 words take 3.2M memory.
accuracy test finished
training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.

To be done

finish 400hour AM training

@@ 第5行： / 第5行： @@
 == DNN progress ==
 === Experiments ===
-* setups for input layer
+* setups for mfcc/plp
-: s1: mfcc(13), splice +-5[143]
-: s2: mfcc(13), splice +-5(143), LDA[143]
-: s3: mfcc(13), delta(39), splice +-5(429), LDA[143]
-: s4: mfcc(13), delta(39), splice +-5(429), LDA[300]
-* setups for alignment
-: tri1: triphone training, feature input: mfcc(13), delta[39]. #pdfs 1651, #gaussians 10028
-: tri2: LDA/MLLT training, feature input: mfcc(13), delta(39), splice +-4(351), LDA[40]. #pdfs 3536, #gaussians 39995
-* other notes
-: about 100 hours training data
-: 88k LM, biglm decoding (1e-5 / 1e-9)
-: gpu-based nnet training, in-1200-1200-1200-1200-out
-* results
 {| class="wikitable"
 ! Test Set !! fMMI !! s1/tri1 !! s2/tri1 !! s3/tri1 !! s4/tri1 !! s2/tri2 !! s4/tri2 !! cpu-based (like s4/tri1)
@@ 第38行： / 第26行： @@
 |-
 |}
 * conclusion
 #the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases.
@@ 第67行： / 第56行： @@
 #just started
-=== L-1 sparse initial training ===
-* experiments on L1 and L2 penalty
-#LM: 4G gigabyte LM, 1e-5 small LM
-#AM: 100hour tri4b_nn
-{| class="wikitable"
-! Test Set !! 0 !! 1.00E-06 !! 1.00E-05 !! 2.50E-05 !! 5.00E-05 !! 7.50E-05 !! 1.00E-04
-|-
-|map    || 61.06 || 61.18 || 61.31 || 61.72 || 62.89 || 62.84 || 62.48
-|-
-|2044   || 49.53 || 49.58 || 49.84 || 49.94 || 50.71 || 51.08 || 51.08
-|-
-|notetp3|| 43.44 || 43.44 || 43.55 || 44.58 || 45.22 || 44.95 || 45.76
-|-
-|1900   || 38.50 || 38.54 || 39.00 || 39.21 || 40.33 || 40.53 || 40.60
-|-
-|general|| 61.24 || 61.22 || 61.69 || 61.84 || 62.71 || 62.83 || 62.93
-|-
-|online1|| 58.02 || 58.05 || 58.31 || 58.23 || 58.83 || 59.06 || 59.46
-|-
-|online2|| 53.62 || 53.70 || 54.03 || 53.93 || 54.65 || 54.94 || 55.51
-|-
-|speedup|| 57.51 || 57.49 || 57.93 || 58.31 || 59.75 || 60.08 || 59.52
-|-
-|}
-* Conclusions
-#L1 and L2 penalty do not work in the current nnet-GPU code.
-#will work to check the L1 code and change the penalty scheme.
@@ 第103行： / 第61行： @@
 * HTK2Kaldi: hold.
-* Kaldi2HTK: stuck. various sp models tried but don't help
+* Kaldi2HTK: pdf error problem.
-* Need thorough debug this week.
+* workaround; use the BN feature to train HTK models, so without kaldi training.
@@ 第110行： / 第68行： @@
 *Status:
 :# first embedded demo done, 1000 words take 3.2M memory.
-:# accuracy test not yet finished
+:# accuracy test finished
-:# training acoustic model for sphinx. The toolkit runs well, need prepare data and start parallel training.
+:# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
 *To be done
-:# finish AM training
+:# finish 400hour AM training
-:# run offline test

“2013-05-17”版本间的差异

2013年5月17日 (五) 04:52的版本

目录

Data sharing

DNN progress

Experiments

Tencent exps

GPU & CPU merge

Kaldi/HTK merge

Embedded progress

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具