“2014-06-27”版本间的差异
来自cslt Wiki
(→W2V based doc classification) |
(→Multilingual ASR) |
||
(2位用户的10个中间修订版本未显示) | |||
第25行: | 第25行: | ||
Huawei disanpi BJ mobile 8k English data | Huawei disanpi BJ mobile 8k English data | ||
FBank non-stream (MPE4) 20.44% 22.28% 24.36% | FBank non-stream (MPE4) 20.44% 22.28% 24.36% | ||
− | FBank stream ( | + | FBank stream (MPE4) 19.46% 22.00% 21.19% |
GFbank stream (MPE4) 20.69% 22.84% 24.45% | GFbank stream (MPE4) 20.69% 22.84% 24.45% | ||
GFbank non-stream (MPE) - - - | GFbank non-stream (MPE) - - - | ||
第33行: | 第33行: | ||
<pre> | <pre> | ||
− | HW | + | HW 27h (HW TR LM not involved) HW27h (HW TR LM involved) |
+ | Fbank stream (monolang) 21.64 20.72 | ||
FBank non-stream (MPE4) 22.23 21.38 | FBank non-stream (MPE4) 22.23 21.38 | ||
− | + | FBank stream (MPE4) 21.99 - | |
− | + | ||
</pre> | </pre> | ||
第67行: | 第67行: | ||
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74) | * DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74) | ||
− | * 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD | + | * 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD |
− | + | * [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/27/Dnn_vad_VS_energy_vad_20140619.xlsx report form] | |
− | [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/27/Dnn_vad_VS_energy_vad_20140619.xlsx | + | |
===Scoring=== | ===Scoring=== | ||
第83行: | 第82行: | ||
AM: 600x4+800 xent9 model: | AM: 600x4+800 xent9 model: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | pruning threshold: 1e-5, Nobiglm | |
− | -------------------------------------------------------------------- | + | ------------------------------------------------------------------------------------------ |
− | + | | 150k | 80k | 40k | 20k | 10k | 5k | | |
− | -------------------------------------------------------------------- | + | ------------------------------------------------------------------------------------------ |
− | + | wer | 26.60 | 27.16 | 28.11 | 29.14 | 31.02 | 33.37 | | |
− | -------------------------------------------------------------------- | + | ------------------------------------------------------------------------------------------ |
− | + | RT | 0.68 | 0.66 | 0.61 | 0.61 | 0.58 | 0.56 | | |
− | -------------------------------------------------------------------- | + | ------------------------------------------------------------------------------------------ |
− | + | graph size | 21M | 14M | 9.1M | 6.9M | 5.5M | 4.1M | | |
− | -------------------------------------------------------------------- | + | ------------------------------------------------------------------------------------------ |
− | + | YINSHI:2014-Jun-24,Wednesday,10:7:0 | |
− | -------------------------------------------------------------------- | + | |
− | + | ||
− | -------------------------------------------------------------------- | + | pruning threshold: 1e-6, Nobiglm |
− | graph size| 21 | + | ------------------------------------------------------------------------------------------ |
− | -------------------------------------------------------------------- | + | | 150k | 80k | 40k | 20k | 10k | 5k | |
− | + | ------------------------------------------------------------------------------------------ | |
− | -------------------------------------------------------------------- | + | wer | 22.49 | 23.05 | 24.15 | 25.51 | 27.71 | 30.71 | |
− | RT | + | ------------------------------------------------------------------------------------------ |
− | -------------------------------------------------------------------- | + | RT | 0.89 | 0.84 | 0.76 | 0.70 | 0.68 | 0.64 | |
+ | ------------------------------------------------------------------------------------------ | ||
+ | graph size | 98M | 86M | 67M | 49M | 34M | 24M | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | |||
+ | YINSHI:2014-Jun-27,Saturday,0:52:35 | ||
+ | |||
+ | |||
+ | pruning threshold: 1e-6.5, biglm | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | | 150k | 80k | 40k | 20k | 10k | 5k | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | wer | 21.12 | 21.75 | 22.92 | 24.39 | 26.89 | 30.01 | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | RT | 1.45 | 1.25 | 1.16 | 1.11 | 1.02 | 0.94 | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | graph size | 38M | 35M | 30M | 25M | 20M | 15M | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | |||
+ | YINSHI:2014-Jun-27,Saturday,0:58:27 | ||
+ | |||
+ | |||
+ | pruning threshold: 1e-5.5, Nobiglm | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | | 150k | 80k | 40k | 20k | 10k | 5k | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | wer | 24.46 | 25.05 | 26.05 | 27.11 | 29.36 | 32.01 | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | RT | 0.71 | 0.69 | 0.66 | 0.63 | 0.60 | 0.58 | | ||
+ | ------------------------------------------------------------------------------------------ | ||
+ | graph size | 39M | 32M | 25M | 19M | 14M | 9.2M | | ||
+ | ------------------------------------------------------------------------------------------ | ||
</pre> | </pre> | ||
第135行: | 第145行: | ||
* Baiduzhidao + Weibeo extraction done with various thresholds | * Baiduzhidao + Weibeo extraction done with various thresholds | ||
− | * Looks like the extracted text can improve to some extent, but the major change seems come from pre- | + | * Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing. |
+ | * Check proportion of tags int HW 30h data | ||
− | |||
==Word2Vector== | ==Word2Vector== | ||
第159行: | 第169行: | ||
mean Eur Distance KL distance diagonal KL LDA | mean Eur Distance KL distance diagonal KL LDA | ||
− | 2-class Acc (50dim) | + | 2-class Acc (50dim) 95.57 - - 95.80 |
− | 8-class Acc (50dim) | + | 8-class Acc (50dim) 88.79 - - - |
</pre> | </pre> | ||
==Semantic word tree== | ==Semantic word tree== | ||
+ | |||
:* Version v2.0 released (filter with query log) | :* Version v2.0 released (filter with query log) | ||
:* Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao) | :* Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao) |
2014年6月27日 (五) 05:53的最后版本
目录
Resoruce Building
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++++++)
Noise training
- Paper writing on going
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- FBank/GFbank, stream/non-stream MPE completed:
Huawei disanpi BJ mobile 8k English data FBank non-stream (MPE4) 20.44% 22.28% 24.36% FBank stream (MPE4) 19.46% 22.00% 21.19% GFbank stream (MPE4) 20.69% 22.84% 24.45% GFbank non-stream (MPE) - - -
Multilingual ASR
HW 27h (HW TR LM not involved) HW27h (HW TR LM involved) Fbank stream (monolang) 21.64 20.72 FBank non-stream (MPE4) 22.23 21.38 FBank stream (MPE4) 21.99 -
Denoising & Farfield ASR
- correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
- how about the output cmvn test?
- deliver the recording to /nfs/disk/perm/data/corpora/reverberant
Original model:
xEnt model: middle-field far-field dev93 74.79 96.68 eval92 63.42 94.75 MPE model: MPE adaptation: middle-field far-field dev93 63.71 94.84 eval92 52.67 90.45
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
- report form
Scoring
- refine the model with AMIDA database. Local minimum observed.
- ivector-based speaker detection seems find, reach 96% with 100 speakers
Embedded decoder
AM: 600x4+800 xent9 model: pruning threshold: 1e-5, Nobiglm ------------------------------------------------------------------------------------------ | 150k | 80k | 40k | 20k | 10k | 5k | ------------------------------------------------------------------------------------------ wer | 26.60 | 27.16 | 28.11 | 29.14 | 31.02 | 33.37 | ------------------------------------------------------------------------------------------ RT | 0.68 | 0.66 | 0.61 | 0.61 | 0.58 | 0.56 | ------------------------------------------------------------------------------------------ graph size | 21M | 14M | 9.1M | 6.9M | 5.5M | 4.1M | ------------------------------------------------------------------------------------------ YINSHI:2014-Jun-24,Wednesday,10:7:0 pruning threshold: 1e-6, Nobiglm ------------------------------------------------------------------------------------------ | 150k | 80k | 40k | 20k | 10k | 5k | ------------------------------------------------------------------------------------------ wer | 22.49 | 23.05 | 24.15 | 25.51 | 27.71 | 30.71 | ------------------------------------------------------------------------------------------ RT | 0.89 | 0.84 | 0.76 | 0.70 | 0.68 | 0.64 | ------------------------------------------------------------------------------------------ graph size | 98M | 86M | 67M | 49M | 34M | 24M | ------------------------------------------------------------------------------------------ YINSHI:2014-Jun-27,Saturday,0:52:35 pruning threshold: 1e-6.5, biglm ------------------------------------------------------------------------------------------ | 150k | 80k | 40k | 20k | 10k | 5k | ------------------------------------------------------------------------------------------ wer | 21.12 | 21.75 | 22.92 | 24.39 | 26.89 | 30.01 | ------------------------------------------------------------------------------------------ RT | 1.45 | 1.25 | 1.16 | 1.11 | 1.02 | 0.94 | ------------------------------------------------------------------------------------------ graph size | 38M | 35M | 30M | 25M | 20M | 15M | ------------------------------------------------------------------------------------------ YINSHI:2014-Jun-27,Saturday,0:58:27 pruning threshold: 1e-5.5, Nobiglm ------------------------------------------------------------------------------------------ | 150k | 80k | 40k | 20k | 10k | 5k | ------------------------------------------------------------------------------------------ wer | 24.46 | 25.05 | 26.05 | 27.11 | 29.36 | 32.01 | ------------------------------------------------------------------------------------------ RT | 0.71 | 0.69 | 0.66 | 0.63 | 0.60 | 0.58 | ------------------------------------------------------------------------------------------ graph size | 39M | 32M | 25M | 19M | 14M | 9.2M | ------------------------------------------------------------------------------------------
LM development
Domain specific LM
- Baiduzhidao + Weibeo extraction done with various thresholds
- Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing.
- Check proportion of tags int HW 30h data
Word2Vector
W2V based doc classification
- Full Gaussian based doc vector
- represent each doc with a Gaussian distribution of the word vectors it involved.
- using k-nn to conduct classification
mean Eur Distance KL distance diagonal KL baseline (NB with mean) Acc (50dim) 81.84 79.65 - 69.7
- svm-based classification
mean Eur Distance KL distance diagonal KL LDA 2-class Acc (50dim) 95.57 - - 95.80 8-class Acc (50dim) 88.79 - - -
Semantic word tree
- Version v2.0 released (filter with query log)
- Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
- Version v3.0 under going. Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training