“2013-05-17”版本间的差异
来自cslt Wiki
(以内容“== Data sharing == * LM count files still undelivered! == DNN progress == === Experiments === * setups for input layer : s1: mfcc(13), splice +-5[143] : s2: mfcc(13),...”创建新页面) |
|||
第5行: | 第5行: | ||
== DNN progress == | == DNN progress == | ||
=== Experiments === | === Experiments === | ||
− | * setups for | + | * setups for mfcc/plp |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
{| class="wikitable" | {| class="wikitable" | ||
! Test Set !! fMMI !! s1/tri1 !! s2/tri1 !! s3/tri1 !! s4/tri1 !! s2/tri2 !! s4/tri2 !! cpu-based (like s4/tri1) | ! Test Set !! fMMI !! s1/tri1 !! s2/tri1 !! s3/tri1 !! s4/tri1 !! s2/tri2 !! s4/tri2 !! cpu-based (like s4/tri1) | ||
第38行: | 第26行: | ||
|- | |- | ||
|} | |} | ||
+ | |||
* conclusion | * conclusion | ||
#the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases. | #the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases. | ||
第67行: | 第56行: | ||
#just started | #just started | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
第103行: | 第61行: | ||
* HTK2Kaldi: hold. | * HTK2Kaldi: hold. | ||
− | * Kaldi2HTK: | + | * Kaldi2HTK: pdf error problem. |
− | * | + | * workaround; use the BN feature to train HTK models, so without kaldi training. |
第110行: | 第68行: | ||
*Status: | *Status: | ||
:# first embedded demo done, 1000 words take 3.2M memory. | :# first embedded demo done, 1000 words take 3.2M memory. | ||
− | :# accuracy test | + | :# accuracy test finished |
− | :# training acoustic model for sphinx. The | + | :# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic. |
*To be done | *To be done | ||
− | :# finish AM training | + | :# finish 400hour AM training |
− | + |
2013年5月17日 (五) 04:52的版本
目录
Data sharing
- LM count files still undelivered!
DNN progress
Experiments
- setups for mfcc/plp
Test Set | fMMI | s1/tri1 | s2/tri1 | s3/tri1 | s4/tri1 | s2/tri2 | s4/tri2 | cpu-based (like s4/tri1) |
---|---|---|---|---|---|---|---|---|
map | 28.58 | 25.38 | 24.47 | 26.16 | 26.20 | 22.85 | 24.27 | 26.45 |
2044 | 24.79 | 23.58 | 22.82 | 23.84 | 24.13 | 21.45 | 22.76 | 24.66 |
notetp3 | 21.64 | 16.08 | 14.89 | 15.92 | 15.97 | 14.89 | 14.79 | 16.14 |
1900 | 8.19 | 8.55 | 8.43 | 8.66 | 8.90 | 7.30 | 7.91 | 8.23 |
general | 39.63 | 36.18 | 34.79 | 35.88 | 35.90 | 33.06 | 33.79 | 38.02 |
online1 | 35.19 | 34.68 | 33.90 | 33.45 | 33.38 | 32.93 | 32.43 | 33.00 |
online2 | 28.30 | 27.27 | 26.61 | 26.26 | 26.36 | 25.94 | 25.69 | 26.63 |
speedup | 28.45 | 24.97 | 24.40 | 24.55 | 25.42 | 23.04 | 23.67 | 27.17 |
- conclusion
- the GPU approach is comparable with the CPU approach (see s4/tri1 & CPU results). The former works slightly better in most cases.
- the fine training leads to significant better performance than the rough training (see s2/tri1 vs s2/tri2 & s4/tri1 vs s4/tri2)
- the delta features do not help, actually harm the perfromance (see s2/tri1 vs s3/tri1 & s4/tri1)
- the identical-dimension LDA helps the performance (see s1/tri1 vs s2/tri1)
the best system is s2/tri2: without delta, apply a linear LDA.
- to be done
- experiment with Tencent feature
- migrate the bottlenck structure to the 100k task with s2/tri2.
Tencent exps
- 手动将NN网络的W权重,较小的置零,在保留30%左右的较大权重的条件下,系统性能未见明显衰减。
- 按照HTK模型的结构,以及HTK align的结构,修改Kaldi的GPU接口,验证并无问题,已开始较大规模数据训练(1000小时),网络结构前后5帧扩展,4个隐层,每层2048节点,输出15000个状态。使用mpe模型alignment,特征为plp特征,未做任何映射。
- 解码器仍在CLG结构下,修改声学模型计算接口,接入DNN模型,验证无问题,已开始效率优化。
待做实验:
- 验证不同学习率调节策略,指数下降衰减方式,newbob方式。
- 验证不同特征在大数据上的作用。
- 最后层不过softmax,降维得到BN特征实验,类似IBM BN做法
GPU & CPU merge
- just started
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: pdf error problem.
- workaround; use the BN feature to train HTK models, so without kaldi training.
Embedded progress
- Status:
- first embedded demo done, 1000 words take 3.2M memory.
- accuracy test finished
- training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
- To be done
- finish 400hour AM training