2014-06-27
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++++++)
Noise training
- Paper writing on going
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- FBank/GFbank, stream/non-stream MPE completed:
Huawei disanpi BJ mobile 8k English data
FBank non-stream (MPE4) 20.44% 22.28% 24.36%
FBank stream (MPE4) 19.46% 22.00% 21.19%
GFbank stream (MPE4) 20.69% 22.84% 24.45%
GFbank non-stream (MPE) - - -
Multilingual ASR
HW 27h (HW TR LM not involved) HW27h (HW TR LM involved)
Fbank stream (monolang) 21.64 20.72
FBank non-stream (MPE4) 22.23 21.38
FBank stream (MPE4) 21.99 -
Denoising & Farfield ASR
- correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
- how about the output cmvn test?
- deliver the recording to /nfs/disk/perm/data/corpora/reverberant
Original model:
xEnt model:
middle-field far-field
dev93 74.79 96.68
eval92 63.42 94.75
MPE model:
MPE adaptation:
middle-field far-field
dev93 63.71 94.84
eval92 52.67 90.45
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
- report form
Scoring
- refine the model with AMIDA database. Local minimum observed.
- ivector-based speaker detection seems find, reach 96% with 100 speakers
Embedded decoder
AM: 600x4+800 xent9 model:
pruning threshold: 1e-5, Nobiglm
------------------------------------------------------------------------------------------
| 150k | 80k | 40k | 20k | 10k | 5k |
------------------------------------------------------------------------------------------
wer | 26.60 | 27.16 | 28.11 | 29.14 | 31.02 | 33.37 |
------------------------------------------------------------------------------------------
RT | 0.68 | 0.66 | 0.61 | 0.61 | 0.58 | 0.56 |
------------------------------------------------------------------------------------------
graph size | 21M | 14M | 9.1M | 6.9M | 5.5M | 4.1M |
------------------------------------------------------------------------------------------
YINSHI:2014-Jun-24,Wednesday,10:7:0
pruning threshold: 1e-6, Nobiglm
------------------------------------------------------------------------------------------
| 150k | 80k | 40k | 20k | 10k | 5k |
------------------------------------------------------------------------------------------
wer | 22.49 | 23.05 | 24.15 | 25.51 | 27.71 | 30.71 |
------------------------------------------------------------------------------------------
RT | 0.89 | 0.84 | 0.76 | 0.70 | 0.68 | 0.64 |
------------------------------------------------------------------------------------------
graph size | 98M | 86M | 67M | 49M | 34M | 24M |
------------------------------------------------------------------------------------------
YINSHI:2014-Jun-27,Saturday,0:52:35
pruning threshold: 1e-6.5, biglm
------------------------------------------------------------------------------------------
| 150k | 80k | 40k | 20k | 10k | 5k |
------------------------------------------------------------------------------------------
wer | 21.12 | 21.75 | 22.92 | 24.39 | 26.89 | 30.01 |
------------------------------------------------------------------------------------------
RT | 1.45 | 1.25 | 1.16 | 1.11 | 1.02 | 0.94 |
------------------------------------------------------------------------------------------
graph size | 38M | 35M | 30M | 25M | 20M | 15M |
------------------------------------------------------------------------------------------
YINSHI:2014-Jun-27,Saturday,0:58:27
pruning threshold: 1e-5.5, Nobiglm
------------------------------------------------------------------------------------------
| 150k | 80k | 40k | 20k | 10k | 5k |
------------------------------------------------------------------------------------------
wer | 24.46 | 25.05 | 26.05 | 27.11 | 29.36 | 32.01 |
------------------------------------------------------------------------------------------
RT | 0.71 | 0.69 | 0.66 | 0.63 | 0.60 | 0.58 |
------------------------------------------------------------------------------------------
graph size | 39M | 32M | 25M | 19M | 14M | 9.2M |
------------------------------------------------------------------------------------------
LM development
Domain specific LM
- Baiduzhidao + Weibeo extraction done with various thresholds
- Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing.
- Check proportion of tags int HW 30h data
Word2Vector
W2V based doc classification
- Full Gaussian based doc vector
- represent each doc with a Gaussian distribution of the word vectors it involved.
- using k-nn to conduct classification
mean Eur Distance KL distance diagonal KL baseline (NB with mean)
Acc (50dim) 81.84 79.65 - 69.7
- svm-based classification
mean Eur Distance KL distance diagonal KL LDA
2-class Acc (50dim) 95.57 - - 95.80
8-class Acc (50dim) 88.79 - - -
Semantic word tree
- Version v2.0 released (filter with query log)
- Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
- Version v3.0 under going. Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training