2014-08-01

来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • WJS sparse DNN shows a slightly better than non-sparse cases when the network is in a large scale
  • Pre-training does work for DNN training (for both 4/5/6 layers)

Noise training

  • Journal paper writing on going


Multilingual ASR

  • Native English speaker + Chinglish speaker obtained better performance.


Drop out & convolutional network

  • Change learning to 0.001, the training process can be started.
  • Frame Accuracy goes to : (with/without drop probability normalization)


Denoising & Farfield ASR

  • By tuning parameters of late-response lag & response time, obtained performance improvement with Lasso.
Simulation results: Baseline:
--------------------------------------------------------------------------
                model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                   clean_ce             |     59.38    |    19.25
                 mpe_clean_ce           |     40.46    |    12.94
--------------------------------------------------------------------------

Lasso with optimal parameters(lambda=0.05, delta=5, N=10)
--------------------------------------------------------------------------
                model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                clean_ce                |     54.63    |    15.75
--------------------------------------------------------------------------
              mpe_clean_ce              |     36.58    |    11.64
--------------------------------------------------------------------------

Real data results:
--------------------------------------------------------------------------
                   model/test              |  far_evl92   |   near_evl92
 --------------------------------------------------------------------------
                   clean_ce                |     94.86    |    63.48
--------------------------------------------------------------------------
                 mpe_clean_ce              |     92.29    |    58.37
--------------------------------------------------------------------------
dereverberated recording :

--------------------------------------------------------------------------
                   model/test              |  far_evl92   |   near_evl92
 --------------------------------------------------------------------------
                   clean_ce                |     94.91    |    61.03
--------------------------------------------------------------------------
                 mpe_clean_ce              |     91.28    |    54.16
--------------------------------------------------------------------------

  • Adaptation under running


VAD

  • Waiting for testing results


Scoring

  • Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

Confidence

  • Be familiar with Kaldi
  • Need to extract lattice and DNN features

Embedded decoder

  • Chatting LM released (80k)
  • Train two smaller network: 500x4+600, 400x4+500: on going
  • Need to upload the new client code onto git (+)
  • Build a new graph with MPE3 am and chatting LM.


LM development

Domain specific LM

h2. Domain specific LM construction

h3. TAG LM

  • TAG obtained better performance

h3. Chatting LM

  • First version released (80k lexicon)
  • Prepare 2nd released (120k lexicon)
  • Test on Xiaotang long


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.


Speaker ID

  • Full-data SRE trial goes into the final stage
  • results will be ready soon


Translation

  • collecting more data (Xinhua parallel text, bible, name entity) for the second version
  • check possible parameters to control phrase pair lexicon