2014-08-01

来自cslt Wiki
2014年8月1日 (五) 01:52Cslt讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • WJS sparse DNN shows a slightly better than non-sparse cases when the network is in a large scale
  • Pre-training does work for DNN training (for both 4/5/6 layers)

Noise training

  • Journal paper writing on going


Multilingual ASR

  • Native English speaker + Chinglish speaker obtained better performance.


Drop out & convolutional network

  • Change learning to 0.001, the training process can be started.
  • Frame Accuracy goes to : (with/without drop probability normalization)


Denoising & Farfield ASR

  • By tuning parameters of late-response lag & response time, obtained performance improvement with Lasso.

Simulation results: Baseline:


               model/test              |  far_evl92   |   near_evl92

                  clean_ce             |     59.38    |    19.25
                mpe_clean_ce           |     40.46    |    12.94

Lasso with optimal parameters(lambda=0.05, delta=5, N=10)


               model/test              |  far_evl92   |   near_evl92

               clean_ce                |     54.63    |    15.75

             mpe_clean_ce              |     36.58    |    11.64

Real data results:


                  model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                  clean_ce                |     94.86    |    63.48

                mpe_clean_ce              |     92.29    |    58.37

dereverberated recording :


                  model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                  clean_ce                |     94.91    |    61.03

                mpe_clean_ce              |     91.28    |    54.16


  • Adaptation under running


VAD

  • Waiting for testing results


Scoring

  • Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

Confidence

  • Be familiar with Kaldi
  • Need to extract lattice and DNN features

Embedded decoder

  • Chatting LM released (80k)
  • Train two smaller network: 500x4+600, 400x4+500: on going
  • Need to upload the new client code onto git (+)
  • Build a new graph with MPE3 am and chatting LM.


LM development

Domain specific LM

h2. Domain specific LM construction

h3. TAG LM

  • TAG obtained better performance

h3. Chatting LM

  • First version released (80k lexicon)
  • Prepare 2nd released (120k lexicon)
  • Test on Xiaotang long


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.


Speaker ID

  • Full-data SRE trial goes into the final stage
  • results will be ready soon


Translation

  • collecting more data (Xinhua parallel text, bible, name entity) for the second version
  • check possible parameters to control phrase pair lexicon