“2013-04-26”版本间的差异
来自cslt Wiki
(以内容“==Data sharing== * AM/lexicon/LM are shared. * LM count files are still in transfering. ==DNN progress== ===400 hour DNN training=== {| class="wikitable" !Test Set!!...”创建新页面) |
|||
(相同用户的19个中间修订版本未显示) | |||
第1行: | 第1行: | ||
==Data sharing== | ==Data sharing== | ||
− | |||
* LM count files are still in transfering. | * LM count files are still in transfering. | ||
第7行: | 第6行: | ||
===400 hour DNN training=== | ===400 hour DNN training=== | ||
{| class="wikitable" | {| class="wikitable" | ||
− | !Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN !! Hybrid | + | !Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN(with fMMI) !! Hybrid (DNN/HMM) |
|- | |- | ||
− | |1900||8.4 || 7.65 || 7.35||6.57 | + | |1900||8.4 || 7.65 || 7.35||6.57 || 7.27 |
|- | |- | ||
− | |2044|| 22.4 ||24.44|| 24.03||21.77 | + | |2044|| 22.4 ||24.44|| 24.03||21.77 || 20.24 |
|- | |- | ||
− | |online1||35.6 ||34.66||34.33||31.44 | + | |online1||35.6 ||34.66||34.33||31.44 || 30.53 |
|- | |- | ||
− | |online2||29.6 ||27.23||26.80||24.10 | + | |online2||29.6 ||27.23||26.80||24.10 || 23.89 |
|- | |- | ||
− | |map||24.5|| 27.54||27.69||23.79 | + | |map||24.5|| 27.54||27.69||23.79 || 22.46 |
|- | |- | ||
− | |notepad||16|| 19.81||21.75||15.81 | + | |notepad||16|| 19.81||21.75||15.81 || 12.74 |
|- | |- | ||
− | |general||36|| 38.52||38.90||33.61 | + | |general||36|| 38.52||38.90||33.61 || 31.55 |
|- | |- | ||
− | |speedup||26.8||27.88||26.81||22.82 | + | |speedup||26.8||27.88||26.81||22.82 || 22.00 |
|- | |- | ||
|} | |} | ||
− | *Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon | + | |
− | + | *Note | |
+ | :Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon | ||
+ | :Our results are with 400 hour AM, 88k LM. ML+bMMI. | ||
+ | :The CSLT structure: 300*[1200*1200*1200*40*1200]*4850. | ||
+ | :The CSLT feature: MFCC+delta MFCC | ||
+ | *To be done: | ||
+ | :#compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850. | ||
===Tencent test result=== | ===Tencent test result=== | ||
− | : | + | :AM: 70h training data |
− | : | + | :LM: 88k LM |
− | : | + | :Test case: general |
+ | :param=700k | ||
− | {class="wikitable" | + | {|class="wikitable" |
− | !Feature !! GMM-bMMI !! DNN !! DNN-MMI | + | !Feature !! GMM !!GMM-bMMI !! DNN !! DNN-MMI !! DNN structure |
− | + | ||
|- | |- | ||
− | |PLP | + | |PLP(-5,+5) [Eryu] || 47 || 38.4 || 26.5 || 23.8 ||300*1200*1200*1200*1200]*1700 |
+ | |- | ||
+ | |PLP+LDA+MLLT(-5,+5)[Jingbo] || 47 || -|| 34 || - ||300*[1007*1007*1007*1007]*3xxx | ||
|- | |- | ||
|} | |} | ||
+ | |||
+ | |||
+ | *To be done: | ||
+ | :#CSLT: reproduce phone-clustered NN (Eryu's results) | ||
+ | :#CSLT: investigate performance of different epoches. | ||
+ | :#Tencent: feature comparison. | ||
+ | :#Tencent: FBank with PLP. With or without LDA. | ||
===GPU & CPU merge=== | ===GPU & CPU merge=== | ||
− | + | # Investigate the possibility to merge GPU and CPU code. | |
+ | :Decision: CUDA code merged to CPU. | ||
===L-1 sparse initial training=== | ===L-1 sparse initial training=== | ||
− | : | + | *Initial trial |
+ | :# L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite. | ||
+ | :# L-1=1e-6, the same results obtained, means le-6 is too small to be effective. | ||
+ | :# L-1=1e-4, start from the first iteration. crashed. Need more investigation. | ||
+ | *To be done | ||
+ | :# Investigate other L-1 choice, starting from the scratch. | ||
==Kaldi/HTK merge== | ==Kaldi/HTK merge== | ||
− | + | * HTK2Kaldi: hold. | |
− | + | * Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%. | |
+ | * Possibly the SP model issue, due to the complicated structure of silence in Kaldi. | ||
+ | * To be done | ||
+ | :Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end. | ||
+ | :Try borrow SP from the HTK model | ||
==Embedded progress== | ==Embedded progress== | ||
− | + | *Status: | |
− | + | :# QA LM training, done. | |
− | : | + | :# PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0). |
+ | |||
+ | |||
+ | *To be done | ||
+ | :# substitute the LM with JSGF grammar involving 1000 words. Finish the initial test. | ||
+ | :# Need to train a new AM. |
2013年4月26日 (五) 07:14的最后版本
目录
Data sharing
- LM count files are still in transfering.
DNN progress
400 hour DNN training
Test Set | Tencent Baseline | bMMI | fMMI | BN(with fMMI) | Hybrid (DNN/HMM) |
---|---|---|---|---|---|
1900 | 8.4 | 7.65 | 7.35 | 6.57 | 7.27 |
2044 | 22.4 | 24.44 | 24.03 | 21.77 | 20.24 |
online1 | 35.6 | 34.66 | 34.33 | 31.44 | 30.53 |
online2 | 29.6 | 27.23 | 26.80 | 24.10 | 23.89 |
map | 24.5 | 27.54 | 27.69 | 23.79 | 22.46 |
notepad | 16 | 19.81 | 21.75 | 15.81 | 12.74 |
general | 36 | 38.52 | 38.90 | 33.61 | 31.55 |
speedup | 26.8 | 27.88 | 26.81 | 22.82 | 22.00 |
- Note
- Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
- Our results are with 400 hour AM, 88k LM. ML+bMMI.
- The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.
- The CSLT feature: MFCC+delta MFCC
- To be done:
- compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.
Tencent test result
- AM: 70h training data
- LM: 88k LM
- Test case: general
- param=700k
Feature | GMM | GMM-bMMI | DNN | DNN-MMI | DNN structure |
---|---|---|---|---|---|
PLP(-5,+5) [Eryu] | 47 | 38.4 | 26.5 | 23.8 | 300*1200*1200*1200*1200]*1700 |
PLP+LDA+MLLT(-5,+5)[Jingbo] | 47 | - | 34 | - | 300*[1007*1007*1007*1007]*3xxx |
- To be done:
- CSLT: reproduce phone-clustered NN (Eryu's results)
- CSLT: investigate performance of different epoches.
- Tencent: feature comparison.
- Tencent: FBank with PLP. With or without LDA.
GPU & CPU merge
- Investigate the possibility to merge GPU and CPU code.
- Decision: CUDA code merged to CPU.
L-1 sparse initial training
- Initial trial
- L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
- L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
- L-1=1e-4, start from the first iteration. crashed. Need more investigation.
- To be done
- Investigate other L-1 choice, starting from the scratch.
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
- Possibly the SP model issue, due to the complicated structure of silence in Kaldi.
- To be done
- Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end.
- Try borrow SP from the HTK model
Embedded progress
- Status:
- QA LM training, done.
- PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0).
- To be done
- substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
- Need to train a new AM.