2013-08-23
来自cslt Wiki
目录
Data sharing
- LM count files still undelivered!
DNN progress
Discriminative DNN
- Running 1200-3620 NN, graph generation is done. Training is still running stupidly.
Sparse DNN
- Iterative sparse sticky training runs.
Tencent exps
DNN Confidence estimation
- Tested on a high WER test set. The distribution curve is still bizzard, for both correct and incorrect words, a high peak is around zero.
- Accumulated DNN confidence is on development.
- Generate lattice-based confidence
- Prepare MLP-based confidence integration
GFCC DNN
- GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
- 100 hour GFCC-based DNN, Tencent test results:
No noise-added: 1,MFCC 100_1200_1200_1200_1200_3580 map: %WER 23.75 [ 3474 / 14628, 134 ins, 373 del, 2967 sub ] 2044: %WER 21.47 [ 4991 / 23241, 304 ins, 664 del, 4023 sub ] notetp3: %WER 13.17 [ 244 / 1853, 10 ins, 26 del, 208 sub ] record1900: %WER 8.10 [ 963 / 11888, 217 ins, 299 del, 447 sub ] general: %WER 34.41 [ 12943 / 37619, 779 ins, 785 del, 11379 sub ] online1: %WER 33.02 [ 9388 / 28433, 522 ins, 1465 del, 7401 sub ] online2: %WER 25.99 [ 15363 / 59101, 873 ins, 2408 del, 12082 sub ] speedup: %WER 23.52 [ 1236 / 5255, 72 ins, 213 del, 951 sub ] ---- 2,GFCC 100_1200_1200_1200_1200_3625 map: %WER 22.95 [ 3357 / 14628, 109 ins, 471 del, 2777 sub ] 2044: %WER 20.93 [ 4865 / 23241, 387 ins, 748 del, 3730 sub ] notetp3: %WER 15.43 [ 286 / 1853, 41 ins, 26 del, 219 sub ] record1900: %WER 7.32 [ 870 / 11888, 107 ins, 266 del, 497 sub ] general: %WER 31.57 [ 11878 / 37619, 587 ins, 861 del, 10430 sub ] online1: %WER 31.83 [ 9049 / 28433, 519 ins, 1506 del, 7024 sub ] online2: %WER 25.20 [ 14894 / 59101, 839 ins, 2434 del, 11621 sub ] speedup: %WER 22.97 [ 1207 / 5255, 73 ins, 221 del, 913 sub ] ---- White noise added into the test data: 1,NOISE LEVEL:about 15db 1) MFCC 100_1200_1200_1200_1200_3580 map: %WER 65.24 [ 9544 / 14628, 48 ins, 2841 del, 6655 sub ] 2044: %WER 48.93 [ 11372 / 23241, 176 ins, 2803 del, 8393 sub ] notetp3: %WER 55.91 [ 1036 / 1853, 9 ins, 476 del, 551 sub ] record1900: %WER 25.43 [ 3023 / 11888, 27 ins, 1387 del, 1609 sub ] general: %WER 70.05 [ 26352 / 37619, 141 ins, 5336 del, 20875 sub ] online1: %WER 50.40 [ 14329 / 28433, 431 ins, 3827 del, 10071 sub ] online2: %WER 48.45 [ 28632 / 59101, 664 ins, 7930 del, 20038 sub ] speedup: %WER 64.78 [ 3404 / 5255, 13 ins, 1084 del, 2307 sub ] ---- 2)GFCC 100_1200_1200_1200_1200_3625 map: %WER 62.99 [ 9214 / 14628, 63 ins, 3113 del, 6038 sub ] 2044: %WER 46.34 [ 10769 / 23241, 251 ins, 2897 del, 7621 sub ] notetp3: %WER 52.46 [ 972 / 1853, 18 ins, 545 del, 409 sub ] record1900: %WER 26.62 [ 3164 / 11888, 133 ins, 1181 del, 1850 sub ] general: %WER 66.04 [ 24843 / 37619, 404 ins, 5277 del, 19162 sub ] online1: %WER 46.61 [ 13254 / 28433, 466 ins, 3725 del, 9063 sub ] online2: %WER 44.49 [ 26292 / 59101, 813 ins, 7552 del, 17927 sub ] speedup: %WER 60.38 [ 3173 / 5255, 25 ins, 1061 del, 2087 sub ]
- GFCC is generally better than MFCC, particularly with noise
- noise impact is significantly high. Need de-noise algorithms
- Try noise-robust training
Stream decoding
- The interface for server-side is done. For embedded-side is on development.
To do:
- global CMN initialization.
Subgraph integration
- Compress subgraph HCLG is done. The integration is around 1-2 seconds.
- G.fst integration encounters a problem: after G+L, determinization is halted.
Embedded progress
- GFCC-based engine test. Just started.