“2013-12-27”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“== AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # Online OBD held. # OBD + L1 norm start to investigation. * Efficient computing # Conducti...”创建新页面)
 
Efficient DNN training
 
(相同用户的9个中间修订版本未显示)
第15行: 第15行:
 
=== Efficient DNN training ===
 
=== Efficient DNN training ===
  
# Moment-based training. With m=0.2 performs the best on WER. 6.8% improvement on WER. Other settings are tried on 0.05,0.1,0.2,..0.6,0.8,1.0.  
+
# Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
# Asymmetric window: left 20, right 5. NN accuracy increase by 7%, however WER is a bit worse than the baseline. Move back to Tencent 100h training.  
+
# Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
# Frame-skipping is on implementation.  
+
# Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation.
 +
 
 +
<pre>
 +
                      mom0.05  mom0.1  mom0.3  mom0.4  mom0.5  mom0.6  mom0.8  fs_1  fs_2  fs_3
 +
            -------------------------------------------------------------------------------------
 +
            avg_time | 4500    4175    3380    3460    3448    3521    4212    3149  2692  2716
 +
                RT  | 1.52    1.44    1.12    1.14    1.14    1.16    1.38    1.04  0.90  0.92
 +
</pre>
  
 
=== Optimal phoneset===
 
=== Optimal phoneset===
  
# Experiment 3 phone sets: Tencent, CSLT, PQ
+
* Experiment 3 phone sets: Tencent, CSLT, PQ
# Some errors occur in pure CHS experiments
+
* The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones
 +
* Test on the same NN structure.
 +
* CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases
 +
* On online1 and online2, the Tencent set is a little better.
 +
* We therefore prefer a phoneset based on initial-finals.
  
 +
 +
<pre>
 +
 +
CSLT:
 +
 +
map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ]
 +
 +
2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ]
 +
 +
notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ]
 +
 +
record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ]
 +
 +
general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ]
 +
 +
online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ]
 +
 +
online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ]
 +
 +
speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ]
 +
 +
 +
 +
PQ:
 +
 +
map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ]
 +
 +
2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ]
 +
 +
notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ]
 +
 +
record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ]
 +
 +
general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ]
 +
 +
online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ]
 +
 +
online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ]
 +
 +
speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ]
 +
 +
 +
 +
Tencent:
 +
 +
map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ]
 +
 +
2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ]
 +
 +
notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ]
 +
 +
record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ]
 +
 +
general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ]
 +
 +
online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ]
 +
 +
online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ]
 +
 +
speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]
 +
 +
</pre>
  
 
===Engine optimization===
 
===Engine optimization===
第34行: 第107行:
 
===NN LM===
 
===NN LM===
  
* Trained with 500M QA data, 110k vocabulary.
+
* Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
* Tested on number of hidden layers (DNN), performance is better for some tests, but not for others.
+
* Working on NN LM based on word2vector.
* Tested on larger projection layer, from 256 to 384, the performance is consistently improved.  
+
 
+
  
 
==Embedded development==
 
==Embedded development==
  
 +
* Narrow and deep small scale NN trained. Investigating some bugs.
 
* Embedded stream mode on progress.
 
* Embedded stream mode on progress.
 
+
* On-the-fly grammar compiler
 +
:* LG compile is fine
 +
:* CLG compile is fine
 +
:* HCLG compile is slow
 +
:* Working on speed up method.
  
 
==Speech QA==
 
==Speech QA==
  
* SP-QA accuracy 45.14% in all the input (18*199).
+
 
* Investigate the error patterns:
+
* Use N-best to expand match in QA. Better performance were obtained.
:* 70% errors are caused by incorrect name entity recognition. Working on entity recovery (character, pinyin, ... distance penalty).
+
:* 1-best matches 96/121
:* 8% errors are caused by English names. Use class-based LM to solve the problem. Ready to work.
+
:* 10-best matches 102/121
:* Use N-best to recover errors in QA.
+
 
 +
* Use N-best to recover errors in entity check. Working on.
 +
* Use Pinyin to recover errors in entity check. Future work.

2013年12月27日 (五) 02:31的最后版本

AM development

Sparse DNN

  • Optimal Brain Damage(OBD).
  1. Online OBD held.
  2. OBD + L1 norm start to investigation.
  • Efficient computing
  1. Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.


Efficient DNN training

  1. Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
  2. Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  3. Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation.
                      mom0.05  mom0.1  mom0.3  mom0.4  mom0.5  mom0.6  mom0.8  fs_1  fs_2  fs_3
            -------------------------------------------------------------------------------------
            avg_time | 4500     4175    3380    3460    3448    3521    4212    3149   2692  2716
                RT   | 1.52     1.44    1.12    1.14    1.14    1.16    1.38    1.04   0.90  0.92

Optimal phoneset

  • Experiment 3 phone sets: Tencent, CSLT, PQ
  • The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones
  • Test on the same NN structure.
  • CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases
  • On online1 and online2, the Tencent set is a little better.
  • We therefore prefer a phoneset based on initial-finals.


 

CSLT:

map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ]

2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ]

notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ]

record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ]

general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ]

online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ]

online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ]

speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ]

 

PQ:

map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ]

2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ]

notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ]

record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ]

general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ]

online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ]

online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ]

speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ]

 

Tencent:

map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ]

2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ]

notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ]

record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ]

general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ]

online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ]

online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ]

speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]

Engine optimization

  • Investigating LOUDS FST. On progress.


LM development

NN LM

  • Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
  • Working on NN LM based on word2vector.

Embedded development

  • Narrow and deep small scale NN trained. Investigating some bugs.
  • Embedded stream mode on progress.
  • On-the-fly grammar compiler
  • LG compile is fine
  • CLG compile is fine
  • HCLG compile is slow
  • Working on speed up method.

Speech QA

  • Use N-best to expand match in QA. Better performance were obtained.
  • 1-best matches 96/121
  • 10-best matches 102/121
  • Use N-best to recover errors in entity check. Working on.
  • Use Pinyin to recover errors in entity check. Future work.