“2013-12-13”版本间的差异
来自cslt Wiki
第13行: | 第13行: | ||
# Using matrix splitting can improve computing performance for sparse matrix. Using BSR (block sparse row), when the sparsity is 1/6, the same time cost was obtained. | # Using matrix splitting can improve computing performance for sparse matrix. Using BSR (block sparse row), when the sparsity is 1/6, the same time cost was obtained. | ||
# We can re-arrange the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed. | # We can re-arrange the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed. | ||
− | # There is minor difference between the MKL computing and direct computing. This means computing accuracy does not impact the ASR performance very much. This give some | + | # There is minor difference between the MKL computing and direct computing. This means computing accuracy does not impact the ASR performance very much. This give some rationality for extremely sparse matrix construction. |
=== Efficient DNN training === | === Efficient DNN training === | ||
− | # Moment-based training. NN accuracy decreased with larger moment, but ASR performance increased. | + | # Moment-based training. NN accuracy decreased with a larger moment, but ASR performance increased (e.g., 0.2). |
− | # | + | # Asymmetric window: left 20, right 5. NN accuracy increase by 7%. |
第30行: | 第30行: | ||
===NN LM=== | ===NN LM=== | ||
− | * bigger CSLM with 10240 words output. Performance is better than the | + | * bigger CSLM with 10240 words output. Performance is better than the separately trained 10 networks (and merge). |
第40行: | 第40行: | ||
==Speech QA== | ==Speech QA== | ||
− | * | + | * Class-based QA LM using data from Q db is done. |
− | * Extract some documents from Baidu | + | * Extract some documents from Baidu know-how that are related to music. |
− | * Text-based QA. 121/199 correction (with answers). 58 no answers(24 no attributes in db, 27 no | + | * Text-based QA. 121/199 correction (with answers). 58 no answers(24 no attributes in db, 27 no recorders). 20 with incorrect answers(5 no answers in the db and so obtained incorrect answers from the web; 8 no recorder and so obtained incorrect answers from the web; 3 db error). |
− | * Speech-based QA. WER=8.70%. SEE=32.0%. Almost English wrong. Remove English SEE=27.1%. | + | * Speech-based QA. WER=8.70%. SEE=32.0%. Almost English queries are wrong. Remove English, SEE=27.1%. |
− | * SP-QA accuracy 45.14% in all the input (18*199) | + | * SP-QA accuracy 45.14% in all the input (18*199). |
− | * Will try to recover some ASR errors using QA. | + | * Will try to recover some ASR errors using QA, e.g., pronunciation correction. |
2013年12月16日 (一) 09:54的最后版本
目录
AM development
Sparse DNN
- Optimal Brain Damage(OBD).
- Online OBD held.
- OBD + L1 norm start to investigation.
- Efficient computing
- Using MKL and CSR storage does not help much for sparse matrix computation. When the sparsity is 20%, the computing costs 2 times of the original time.
- Using matrix splitting can improve computing performance for sparse matrix. Using BSR (block sparse row), when the sparsity is 1/6, the same time cost was obtained.
- We can re-arrange the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.
- There is minor difference between the MKL computing and direct computing. This means computing accuracy does not impact the ASR performance very much. This give some rationality for extremely sparse matrix construction.
Efficient DNN training
- Moment-based training. NN accuracy decreased with a larger moment, but ASR performance increased (e.g., 0.2).
- Asymmetric window: left 20, right 5. NN accuracy increase by 7%.
Engine optimization
- Investigating LOUDS FST.
LM development
NN LM
- bigger CSLM with 10240 words output. Performance is better than the separately trained 10 networks (and merge).
Embedded development
- Embedded stream mode on progress.
Speech QA
- Class-based QA LM using data from Q db is done.
- Extract some documents from Baidu know-how that are related to music.
- Text-based QA. 121/199 correction (with answers). 58 no answers(24 no attributes in db, 27 no recorders). 20 with incorrect answers(5 no answers in the db and so obtained incorrect answers from the web; 8 no recorder and so obtained incorrect answers from the web; 3 db error).
- Speech-based QA. WER=8.70%. SEE=32.0%. Almost English queries are wrong. Remove English, SEE=27.1%.
- SP-QA accuracy 45.14% in all the input (18*199).
- Will try to recover some ASR errors using QA, e.g., pronunciation correction.