“Sinovoice-2014-04-15”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=Environment setting= * Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine. * Delivered part of the Kaldi code. Some fix yet waitin...”创建新页面)
 
 
(相同用户的2个中间修订版本未显示)
第3行: 第3行:
 
* Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine.  
 
* Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine.  
 
* Delivered part of the Kaldi code. Some fix yet waiting for check in.
 
* Delivered part of the Kaldi code. Some fix yet waiting for check in.
 
+
* Email notification is problematic. Need obtain a smtp server
  
 
=Corpora=
 
=Corpora=
第10行: 第10行:
 
* Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
 
* Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
 
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
 
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
 
 
* Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
 
* Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
 
* Xiaona will prepare noise database. Start from telephone speech.
 
* Xiaona will prepare noise database. Start from telephone speech.
第22行: 第21行:
 
*Baseline: 8k states, 470+300 MPE4, 20.29
 
*Baseline: 8k states, 470+300 MPE4, 20.29
 
* Jietong phone, 200 hour seed, 10k states training:
 
* Jietong phone, 200 hour seed, 10k states training:
:* Xent 5 iteration: 23.26
 
 
:* Xent 16 iteration: 22.90
 
:* Xent 16 iteration: 22.90
 
+
:* MPE1 : 20.89
  
 
* CSLT phone, 8k states training
 
* CSLT phone, 8k states training
第40行: 第38行:
 
:* MPE1: 9.21
 
:* MPE1: 9.21
 
:* MPE2: 9.13
 
:* MPE2: 9.13
 +
  
 
* 6000h/jt phone set  phone set training
 
* 6000h/jt phone set  phone set training
:* lattice done, MPE run 4 days
+
:* running into MPE1.
  
  
第56行: 第55行:
 
===Multilanguage Training===
 
===Multilanguage Training===
  
* Prepare Chinglish data: Wang Dong provide info.
+
* Prepare Chinglish data: contacted with a vendor for 1000 hour mobile recording. Will check how much we need
 +
* AMIDA database downloading
 +
* Build a baseline system
 
* Prepare shared DNN structure for multilingual training
 
* Prepare shared DNN structure for multilingual training
  
第62行: 第63行:
  
 
* GFbank can be propagated to Sinovoice
 
* GFbank can be propagated to Sinovoice
 +
:* Let Mengyuan prepare the experiments
 
* Liuchao will prepare fast computing code
 
* Liuchao will prepare fast computing code
  
第69行: 第71行:
 
* Training process was delivered.
 
* Training process was delivered.
 
* Problems in encoding were solved.
 
* Problems in encoding were solved.
* Initial CSLT LM buildup completed
+
* Initial CSLT LM buildup completed.
  
 
==Domain specific atom-LM construction==
 
==Domain specific atom-LM construction==
第79行: 第81行:
 
===Text data filtering===
 
===Text data filtering===
  
 +
* Prepare word list
 
* VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient.  
 
* VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient.  
 
* An enhanced toolkit was delivered.
 
* An enhanced toolkit was delivered.
 
+
* A telecom specific word list is ready, several stop words are ready
  
 
=DNN Decoder=
 
=DNN Decoder=

2014年4月15日 (二) 08:26的最后版本

Environment setting

  • Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine.
  • Delivered part of the Kaldi code. Some fix yet waiting for check in.
  • Email notification is problematic. Need obtain a smtp server

Corpora

  • 300h Guangxi telecom text transcription prepared. 150h before 18th, April.
  • Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
  • Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
  • Xiaona will prepare noise database. Start from telephone speech.

Acoustic modeling

Telephone model training

1000h Training

  • Baseline: 8k states, 470+300 MPE4, 20.29
  • Jietong phone, 200 hour seed, 10k states training:
  • Xent 16 iteration: 22.90
  • MPE1 : 20.89
  • CSLT phone, 8k states training
  • MPE1: 20.60
  • MPE2: 20.37
  • MPE3: 20.37
  • MPE4: 20.37

6000 hour 16k training

Training progress

  • 6000h/CSLT phone set training
  • Xent: 12.83
  • MPE1: 9.21
  • MPE2: 9.13


  • 6000h/jt phone set phone set training
  • running into MPE1.


Train Analysis

  • The Qihang model used a subset of the 6k data
  • 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
  • GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
  • Seems the database is still not very consistent
  • Xiaoming kicked off the job to reproduce the Qihang training using this subset

Multilanguage Training

  • Prepare Chinglish data: contacted with a vendor for 1000 hour mobile recording. Will check how much we need
  • AMIDA database downloading
  • Build a baseline system
  • Prepare shared DNN structure for multilingual training

Noise robust feature

  • GFbank can be propagated to Sinovoice
  • Let Mengyuan prepare the experiments
  • Liuchao will prepare fast computing code

Language modeling

Training recipe transfer

  • Training process was delivered.
  • Problems in encoding were solved.
  • Initial CSLT LM buildup completed.

Domain specific atom-LM construction

Some potential problems

  • Unclear domain definition
  • Using the same development set (8k transcription) is not very appropriate

Text data filtering

  • Prepare word list
  • VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient.
  • An enhanced toolkit was delivered.
  • A telecom specific word list is ready, several stop words are ready

DNN Decoder

decoder optimization

  • Test computation cost of each step
  • beam 9/5000: netforward 65%
  • beam 13/7000: netforward 28%
  • Sinovoice change in Kaldi delivered and ready to check-in
  • Need to verify the speed of the CSLT engine

Frame-skipping

  • Zhiyong & Liuchao will deliver the frame-skipping approach.

BigLM optimization

  • Investigate BigLM retrieval optimization.