2013-04-19
来自cslt Wiki
Data sharing
- AM/lexicon/LM are shared.
- LM count files are still in transfering.
DNN progress
400 hour DNN training
Test Set |
Tencent Baseline |
bMMI |
fMMI |
BN |
Hybrid
|
1900 |
8.4 |
7.65 |
7.35 |
6.57
|
2044 |
22.4 |
24.44 |
24.03 |
21.77
|
online1 |
35.6 |
34.66 |
34.33 |
31.44
|
online2 |
29.6 |
27.23 |
26.80 |
24.10
|
map |
24.5 |
27.54 |
27.69 |
23.79
|
notepad |
16 |
19.81 |
21.75 |
15.81
|
general |
36 |
38.52 |
38.90 |
33.61
|
speedup |
26.8 |
27.88 |
26.81 |
22.82
|
}
- Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
- Our results are with 400 hour AM, 88k LM. ML+bMMI
Tencent test result
- AM: 70h training data(2 day, 15 machines, 10 threads)
- LM: 88k LM
- Test case: general
- gmmi-bmmi: 38.7%
- dnn-1: 28% 11 frame window, phone-based tree
- dnn-2: 34% 9 frame window, state-based tree
GPU & CPU merge
- Invesigate the possibility to merge GPU and CPU code. Try to find out an easier way. (1 week)
L-1 sparse initial training
- Start to investigating.
Kaldi/HTK merge
- HTK2Kaldi: the tool with Kaldi does not work.
- Kaldi2HTK: done with implementation. Testing?
Embedded progress
- Some large performance (speed) degradation with the embedded platform(1/60).
- Planning for sparse DNN.
- QA LM training, still failed. Mengyuan need more work on this.