“2014-06-20”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * release management combing done. == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the im...”创建新页面) |
|||
第30行: | 第30行: | ||
===Multilingual ASR=== | ===Multilingual ASR=== | ||
+ | <pre> | ||
HW 30h (HW TR LM not involved) HW30h (HW TR LM involved) | HW 30h (HW TR LM not involved) HW30h (HW TR LM involved) | ||
FBank non-stream (MPE4) 22.23 21.38 | FBank non-stream (MPE4) 22.23 21.38 | ||
第36行: | 第37行: | ||
GFbank stream (MPE4) - - - | GFbank stream (MPE4) - - - | ||
GFbank non-stream (MPE) - - - | GFbank non-stream (MPE) - - - | ||
+ | |||
+ | </pre> | ||
===Denoising & Farfield ASR=== | ===Denoising & Farfield ASR=== | ||
第46行: | 第49行: | ||
Original model: | Original model: | ||
+ | <pre> | ||
xEnt model: | xEnt model: | ||
middle-field far-field | middle-field far-field | ||
第60行: | 第64行: | ||
eval92 52.67 90.45 | eval92 52.67 90.45 | ||
+ | </pre> | ||
===VAD=== | ===VAD=== | ||
第76行: | 第81行: | ||
===Embedded decoder=== | ===Embedded decoder=== | ||
+ | <pre> | ||
FSA size: | FSA size: | ||
− | |||
threshold 1e-5 1e-6 1e-7 1e-8 1e-9 | threshold 1e-5 1e-6 1e-7 1e-8 1e-9 | ||
5k 480k 5.5M 44M - 1.1G | 5k 480k 5.5M 44M - 1.1G | ||
10k 731k 7M 61M | 10k 731k 7M 61M | ||
20k 1.2M 8.8M 78M(301M) | 20k 1.2M 8.8M 78M(301M) | ||
− | + | </pre> | |
<pre> | <pre> | ||
第108行: | 第113行: | ||
:* using k-nn to conduct classification | :* using k-nn to conduct classification | ||
+ | <pre> | ||
mean Eur Distance KL distance baseline (NB with mean) | mean Eur Distance KL distance baseline (NB with mean) | ||
Acc (50dim) 81.84 79.65 69.7 | Acc (50dim) 81.84 79.65 69.7 | ||
− | + | </pre> | |
==Semantic word tree== | ==Semantic word tree== |
2014年6月20日 (五) 02:39的版本
目录
Resoruce Building
- release management combing done.
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (+++++++)
- Paper revision done.
Noise training
- Paper writing will be started this week
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- GFbank 14 xEnt iteration completed:
Huawei disanpi BJ mobile 8k English data
FBank non-stream (MPE4) 20.44% 22.28% 24.36% GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -
Multilingual ASR
HW 30h (HW TR LM not involved) HW30h (HW TR LM involved) FBank non-stream (MPE4) 22.23 21.38 Fbank stream (monolang) 21.64 20.72 GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -
Denoising & Farfield ASR
- Replay may cause time delay. This should be solved by cross-correlation detection.
- Single-layer network with more hidden units. failed.
- Looks like the problem resides in large magnitude on output data.
- New recordings (one almost near mic & one far field 2 meters)
Original model:
xEnt model: middle-field far-field dev93 74.79 96.68 eval92 63.42 94.75 MPE model: MPE adaptation: middle-field far-field dev93 63.71 94.84 eval92 52.67 90.45
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
Scoring
- Collect more data with human scoring to train discriminative models
Embedded decoder
FSA size: threshold 1e-5 1e-6 1e-7 1e-8 1e-9 5k 480k 5.5M 44M - 1.1G 10k 731k 7M 61M 20k 1.2M 8.8M 78M(301M)
600 X 4+800 AM, beam9: 150k 20k 10k 5k WER 15.96 - - - RT X 0.94 - -
LM development
Domain specific LM
- Baiduzhidao + Weibeo extraction done with various thresholds
- Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
- Check proportion of tags int HW 30 h data
Word2Vector
W2V based doc classification
- Full Gaussian based doc vector
- represent each doc with a Gaussian distribution of the word vectors it involved.
- using k-nn to conduct classification
mean Eur Distance KL distance baseline (NB with mean) Acc (50dim) 81.84 79.65 69.7
Semantic word tree
- First version based on pattern match done
- Filter with query log
- Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training