2013-08-23

Data sharing

LM count files still undelivered!

DNN progress

Discriminative DNN

Running 1200-3620 NN, graph generation is done. Training is still running stupidly.

Sparse DNN

Iterative sparse sticky training runs.

Tencent exps

DNN Confidence estimation

Tested on a high WER test set. The distribution curve is still bizzard, for both correct and incorrect words, a high peak is around zero.
Accumulated DNN confidence is on development.
Generate lattice-based confidence
Prepare MLP-based confidence integration

GFCC DNN

GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
100 hour GFCC-based DNN, Tencent test results:


No noise-added:

1,MFCC 100_1200_1200_1200_1200_3580
       map: %WER 23.75 [ 3474 / 14628, 134 ins, 373 del, 2967 sub ]
       2044: %WER 21.47 [ 4991 / 23241, 304 ins, 664 del, 4023 sub ]
       notetp3: %WER 13.17 [ 244 / 1853, 10 ins, 26 del, 208 sub ]
       record1900: %WER 8.10 [ 963 / 11888, 217 ins, 299 del, 447 sub ]
       general: %WER 34.41 [ 12943 / 37619, 779 ins, 785 del, 11379 sub ]
       online1: %WER 33.02 [ 9388 / 28433, 522 ins, 1465 del, 7401 sub ]
       online2: %WER 25.99 [ 15363 / 59101, 873 ins, 2408 del, 12082 sub ]
       speedup: %WER 23.52 [ 1236 / 5255, 72 ins, 213 del, 951 sub ]
       ----
2,GFCC 100_1200_1200_1200_1200_3625
       map: %WER 22.95 [ 3357 / 14628, 109 ins, 471 del, 2777 sub ]
       2044: %WER 20.93 [ 4865 / 23241, 387 ins, 748 del, 3730 sub ]
       notetp3: %WER 15.43 [ 286 / 1853, 41 ins, 26 del, 219 sub ]
       record1900: %WER 7.32 [ 870 / 11888, 107 ins, 266 del, 497 sub ]
       general: %WER 31.57 [ 11878 / 37619, 587 ins, 861 del, 10430 sub ]
       online1: %WER 31.83 [ 9049 / 28433, 519 ins, 1506 del, 7024 sub ]
       online2: %WER 25.20 [ 14894 / 59101, 839 ins, 2434 del, 11621 sub ]
       speedup: %WER 22.97 [ 1207 / 5255, 73 ins, 221 del, 913 sub ]
       ----

White noise added into the test data:

1,NOISE LEVEL:about 15db
  1) MFCC 100_1200_1200_1200_1200_3580
    map: %WER 65.24 [ 9544 / 14628, 48 ins, 2841 del, 6655 sub ]
    2044: %WER 48.93 [ 11372 / 23241, 176 ins, 2803 del, 8393 sub ]
    notetp3: %WER 55.91 [ 1036 / 1853, 9 ins, 476 del, 551 sub ]
    record1900: %WER 25.43 [ 3023 / 11888, 27 ins, 1387 del, 1609 sub ]
    general: %WER 70.05 [ 26352 / 37619, 141 ins, 5336 del, 20875 sub ]
    online1: %WER 50.40 [ 14329 / 28433, 431 ins, 3827 del, 10071 sub ]
    online2: %WER 48.45 [ 28632 / 59101, 664 ins, 7930 del, 20038 sub ]
    speedup: %WER 64.78 [ 3404 / 5255, 13 ins, 1084 del, 2307 sub ]
    ----
  2)GFCC 100_1200_1200_1200_1200_3625
    map: %WER 62.99 [ 9214 / 14628, 63 ins, 3113 del, 6038 sub ]
    2044: %WER 46.34 [ 10769 / 23241, 251 ins, 2897 del, 7621 sub ]
    notetp3: %WER 52.46 [ 972 / 1853, 18 ins, 545 del, 409 sub ]
    record1900: %WER 26.62 [ 3164 / 11888, 133 ins, 1181 del, 1850 sub ]
    general: %WER 66.04 [ 24843 / 37619, 404 ins, 5277 del, 19162 sub ]
    online1: %WER 46.61 [ 13254 / 28433, 466 ins, 3725 del, 9063 sub ]
    online2: %WER 44.49 [ 26292 / 59101, 813 ins, 7552 del, 17927 sub ]
    speedup: %WER 60.38 [ 3173 / 5255, 25 ins, 1061 del, 2087 sub ]

GFCC is generally better than MFCC, particularly with noise
noise impact is significantly high. Need de-noise algorithms
Try noise-robust training

Stream decoding

The interface for server-side is done. For embedded-side is on development.

To do:

global CMN initialization.

Subgraph integration

Compress subgraph HCLG is done. The integration is around 1-2 seconds.
G.fst integration encounters a problem: after G+L, determinization is halted.

Embedded progress

GFCC-based engine test. Just started.

2013-08-23

目录

Data sharing

DNN progress

Discriminative DNN

Sparse DNN

Tencent exps

DNN Confidence estimation

GFCC DNN

Stream decoding

Subgraph integration

Embedded progress

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具