“2014-06-06”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * Release management has been started == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the...”创建新页面) |
|||
第3行: | 第3行: | ||
== Leftover questions== | == Leftover questions== | ||
− | * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. | + | * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. |
* Multi GPU training: Error encountered | * Multi GPU training: Error encountered | ||
* Multilanguage training | * Multilanguage training | ||
第13行: | 第13行: | ||
=== Sparse DNN === | === Sparse DNN === | ||
− | * GA-based block sparsity (+++++) | + | * GA-based block sparsity (++++++) |
===Noise training=== | ===Noise training=== | ||
− | |||
:* Paper writing will be started this week | :* Paper writing will be started this week | ||
===GFbank=== | ===GFbank=== | ||
− | + | * Running into Sinovoice 8k 1400 + 100 mixture training. | |
− | + | * GFbank 14 xEnt iteration completed: | |
− | * Running into Sinovoice 8k 1400 + 100 mixture training. | + | Huawei disanpi BJ mobile 8k English data |
− | + | FBank non-stream (17 iteration) 22.01% 26.63% - | |
+ | GFbank stream (14 iteration) 22.47%; 27.52% - | ||
===Multilingual ASR=== | ===Multilingual ASR=== | ||
+ | |||
+ | Huawei disanpi BJ mobile 8k English data | ||
+ | FBank non-stream - - - | ||
* Multilingual LM decoding | * Multilingual LM decoding | ||
− | * | + | * TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect. |
− | + | * Investigate with free-loop grammar. | |
− | + | * Non-tag test should be conducted on both Baidu & micro blob data | |
− | + | * Should test the 8k shujutang data on the mixture model. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | - | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
===Denoising & Farfield ASR=== | ===Denoising & Farfield ASR=== | ||
− | * | + | * Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER. |
− | * | + | * Use more training data to do adaptation. |
+ | * Record the wave with a single speaker & near-field microphone and do test again. | ||
===VAD=== | ===VAD=== | ||
第88行: | 第48行: | ||
* DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74) | * DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74) | ||
* Need to test small scale network (+) | * Need to test small scale network (+) | ||
− | :* 600-800 network | + | :* 600-800 network test |
− | :* 100 X 4 + 2 | + | :* 100 X 4 + 2 network training |
===Scoring=== | ===Scoring=== | ||
− | * | + | * Collect more data with human scoring to train discriminative models |
===Embedded decoder=== | ===Embedded decoder=== | ||
− | + | 1200 X 4 + 10k AM: | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | 150k 20k 10k 5k | ||
+ | WER 42.23 43.45 44.54 46.07 | ||
+ | RT 1h31 48m 44m 43m | ||
==LM development== | ==LM development== | ||
第110行: | 第69行: | ||
* Retrieve both Baidu & microblog | * Retrieve both Baidu & microblog | ||
− | + | * Need to check into gitLab(+). | |
− | * Need to check into gitLab. | + | |
+ | |||
+ | ==Word2Vector== | ||
+ | |||
+ | * Design network spider | ||
+ | * Design semantic related word tree | ||
+ | :* First version based on pattern match done | ||
+ | :* Filter with query log | ||
+ | :* Further refinement with Baidu Baike hierarchy | ||
+ | |||
===NN LM=== | ===NN LM=== |
2014年6月6日 (五) 07:02的最后版本
目录
Resoruce Building
- Release management has been started
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++++)
Noise training
- Paper writing will be started this week
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- GFbank 14 xEnt iteration completed:
Huawei disanpi BJ mobile 8k English data
FBank non-stream (17 iteration) 22.01% 26.63% - GFbank stream (14 iteration) 22.47%; 27.52% -
Multilingual ASR
Huawei disanpi BJ mobile 8k English data
FBank non-stream - - -
- Multilingual LM decoding
- TAG-based decoding still problematic. Decoding goes into subgraph, however the decoding results are incorrect.
- Investigate with free-loop grammar.
- Non-tag test should be conducted on both Baidu & micro blob data
- Should test the 8k shujutang data on the mixture model.
Denoising & Farfield ASR
- Add artificial reverberant with various energy decay & time delay. Draw a plot decay vs WER, delay vs WER.
- Use more training data to do adaptation.
- Record the wave with a single speaker & near-field microphone and do test again.
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- Need to test small scale network (+)
- 600-800 network test
- 100 X 4 + 2 network training
Scoring
- Collect more data with human scoring to train discriminative models
Embedded decoder
1200 X 4 + 10k AM:
150k 20k 10k 5k
WER 42.23 43.45 44.54 46.07 RT 1h31 48m 44m 43m
LM development
Domain specific LM
- Retrieve both Baidu & microblog
- Need to check into gitLab(+).
Word2Vector
- Design network spider
- Design semantic related word tree
- First version based on pattern match done
- Filter with query log
- Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training