“2013-05-17”版本间的差异
来自cslt Wiki
(→Tencent exps) |
|||
(2位用户的5个中间修订版本未显示) | |||
第31行: | 第31行: | ||
=== Tencent exps === | === Tencent exps === | ||
− | #手动将NN网络的W权重,较小的置零,在保留30% | + | #手动将NN网络的W权重,较小的置零,在保留30%左右的较大权重的条件下,系统性能未见明显衰减(how much in number? it would be interesting to re-train the net after pruning the weights)。 |
#按照HTK模型的结构,以及HTK align的结构,修改Kaldi的GPU接口,验证并无问题,已开始较大规模数据训练(1000小时),网络结构前后5帧扩展,4个隐层,每层2048节点,输出15000个状态。使用mpe模型alignment,特征为plp特征,未做任何映射。 | #按照HTK模型的结构,以及HTK align的结构,修改Kaldi的GPU接口,验证并无问题,已开始较大规模数据训练(1000小时),网络结构前后5帧扩展,4个隐层,每层2048节点,输出15000个状态。使用mpe模型alignment,特征为plp特征,未做任何映射。 | ||
#解码器仍在CLG结构下,修改声学模型计算接口,接入DNN模型,验证无问题,已开始效率优化。 | #解码器仍在CLG结构下,修改声学模型计算接口,接入DNN模型,验证无问题,已开始效率优化。 | ||
第38行: | 第38行: | ||
#验证不同学习率调节策略,指数下降衰减方式,newbob方式。 | #验证不同学习率调节策略,指数下降衰减方式,newbob方式。 | ||
#验证不同特征在大数据上的作用。 | #验证不同特征在大数据上的作用。 | ||
− | #最后层不过softmax,降维得到BN特征实验,类似IBM BN做法 | + | #最后层不过softmax,降维得到BN特征实验,类似IBM BN做法 (if this is a linear dim-reduction, it might be worse than the BN...) |
− | + | ||
=== GPU & CPU merge === | === GPU & CPU merge === | ||
第50行: | 第49行: | ||
* HTK2Kaldi: hold. | * HTK2Kaldi: hold. | ||
* Kaldi2HTK: pdf error problem. | * Kaldi2HTK: pdf error problem. | ||
+ | Kaldi Monophone: 30.91% HDecode: 41.40% | ||
* workaround; use the BN feature to train HTK models, so without kaldi training. | * workaround; use the BN feature to train HTK models, so without kaldi training. | ||
− | |||
== Embedded progress == | == Embedded progress == | ||
*Status: | *Status: | ||
:# first embedded demo done, 1000 words take 3.2M memory. | :# first embedded demo done, 1000 words take 3.2M memory. | ||
− | :# accuracy test finished | + | :# accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names. |
:# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic. | :# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | ! Test Set !! #utt !! ERR !! RT | ||
+ | |- | ||
+ | | || 806 || 23.33 || 0.07 | ||
+ | |- | ||
+ | | || 887 || 13.64 || 0.08 | ||
+ | |- | ||
+ | | || 876 || 17.58 || 0.07 | ||
+ | |- | ||
+ | |} | ||
*To be done | *To be done | ||
− | :# finish | + | :# finish the large scale AM training |
2013年5月17日 (五) 06:24的最后版本
目录
Data sharing
- LM count files still undelivered!
DNN progress
Experiments
- setups for mfcc/plp
Test Set | fMMI | s1/tri1 | s2/tri1 | s3/tri1 | s4/tri1 | s2/tri2 | s4/tri2 | cpu-based (like s4/tri1) | plp-s2/tri2 |
---|---|---|---|---|---|---|---|---|---|
map | 28.58 | 25.38 | 24.47 | 26.16 | 26.20 | 22.85 | 24.27 | 26.45 | 23.86 |
2044 | 24.79 | 23.58 | 22.82 | 23.84 | 24.13 | 21.45 | 22.76 | 24.66 | 22.68 |
notetp3 | 21.64 | 16.08 | 14.89 | 15.92 | 15.97 | 14.89 | 14.79 | 16.14 | 16.46 |
1900 | 8.19 | 8.55 | 8.43 | 8.66 | 8.90 | 7.30 | 7.91 | 8.23 | 7.68 |
general | 39.63 | 36.18 | 34.79 | 35.88 | 35.90 | 33.06 | 33.79 | 38.02 | 34.12 |
online1 | 35.19 | 34.68 | 33.90 | 33.45 | 33.38 | 32.93 | 32.43 | 33.00 | 33.60 |
online2 | 28.30 | 27.27 | 26.61 | 26.26 | 26.36 | 25.94 | 25.69 | 26.63 | 26.20 |
speedup | 28.45 | 24.97 | 24.40 | 24.55 | 25.42 | 23.04 | 23.67 | 27.17 | 23.62 |
Tencent exps
- 手动将NN网络的W权重,较小的置零,在保留30%左右的较大权重的条件下,系统性能未见明显衰减(how much in number? it would be interesting to re-train the net after pruning the weights)。
- 按照HTK模型的结构,以及HTK align的结构,修改Kaldi的GPU接口,验证并无问题,已开始较大规模数据训练(1000小时),网络结构前后5帧扩展,4个隐层,每层2048节点,输出15000个状态。使用mpe模型alignment,特征为plp特征,未做任何映射。
- 解码器仍在CLG结构下,修改声学模型计算接口,接入DNN模型,验证无问题,已开始效率优化。
待做实验:
- 验证不同学习率调节策略,指数下降衰减方式,newbob方式。
- 验证不同特征在大数据上的作用。
- 最后层不过softmax,降维得到BN特征实验,类似IBM BN做法 (if this is a linear dim-reduction, it might be worse than the BN...)
GPU & CPU merge
- just started
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: pdf error problem.
Kaldi Monophone: 30.91% HDecode: 41.40%
- workaround; use the BN feature to train HTK models, so without kaldi training.
Embedded progress
- Status:
- first embedded demo done, 1000 words take 3.2M memory.
- accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
- training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
Test Set | #utt | ERR | RT |
---|---|---|---|
806 | 23.33 | 0.07 | |
887 | 13.64 | 0.08 | |
876 | 17.58 | 0.07 |
- To be done
- finish the large scale AM training