“2013-05-17”版本间的差异

2013年5月17日 (五) 06:23的版本

Test Set	fMMI	s1/tri1	s2/tri1	s3/tri1	s4/tri1	s2/tri2	s4/tri2	cpu-based (like s4/tri1)	plp-s2/tri2
map	28.58	25.38	24.47	26.16	26.20	22.85	24.27	26.45	23.86
2044	24.79	23.58	22.82	23.84	24.13	21.45	22.76	24.66	22.68
notetp3	21.64	16.08	14.89	15.92	15.97	14.89	14.79	16.14	16.46
1900	8.19	8.55	8.43	8.66	8.90	7.30	7.91	8.23	7.68
general	39.63	36.18	34.79	35.88	35.90	33.06	33.79	38.02	34.12
online1	35.19	34.68	33.90	33.45	33.38	32.93	32.43	33.00	33.60
online2	28.30	27.27	26.61	26.26	26.36	25.94	25.69	26.63	26.20
speedup	28.45	24.97	24.40	24.55	25.42	23.04	23.67	27.17	23.62

手动将NN网络的W权重，较小的置零，在保留30%左右的较大权重的条件下，系统性能未见明显衰减(how much in number?)。
按照HTK模型的结构，以及HTK align的结构，修改Kaldi的GPU接口，验证并无问题，已开始较大规模数据训练（1000小时），网络结构前后5帧扩展，4个隐层，每层2048节点，输出15000个状态。使用mpe模型alignment，特征为plp特征，未做任何映射。
解码器仍在CLG结构下，修改声学模型计算接口，接入DNN模型，验证无问题，已开始效率优化。

待做实验：

验证不同学习率调节策略，指数下降衰减方式，newbob方式。
验证不同特征在大数据上的作用。
最后层不过softmax，降维得到BN特征实验，类似IBM BN做法 (this is the linear dim-reduction, it might be worse than the BN...)

Kaldi Monophone: 30.91%  HDecode: 41.40%

first embedded demo done, 1000 words take 3.2M memory.
accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.

#utt	ERR	RT
806	23.33	0.07
887	13.64	0.08
876	17.58	0.07

@@ 第55行： / 第55行： @@
 *Status:
 :# first embedded demo done, 1000 words take 3.2M memory.
-:# accuracy test finished. The test data involves 3 speakers recorded in a car, 1000 address names.
+:# accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
 :# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
@@ 第70行： / 第70行： @@
 *To be done
-:# finish 400hour AM training
+:# finish the large scale AM training