“ASR:2014-12-15”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lr讨论 | 贡献
Text Processing
第114行: 第114行:
 
* domain lm
 
* domain lm
 
:* Sougou2T : kn-count continue .
 
:* Sougou2T : kn-count continue .
:* lm v2.0 set up('''this week''')
+
:* lm v2.0 done,just to test the wer.
  
 
*  new dict.
 
*  new dict.
:* Released vocab v2.0 (mainly done by Dongxu) to JieTong.
 
::* using minimum size segmentation and artificial add the long word(like 中华人民共和国)
 
:* check the v2.0-dict with small data.
 
  
 
====tag LM====
 
====tag LM====
第125行: 第122行:
 
* need to do
 
* need to do
 
:* tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (hold)
 
:* tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (hold)
:* make a summary about tag-lm and '''journal paper'''(wxx and yuanb)('''this weeks''').
+
::* paper done,begin to modify .
::* Reviewed papers and begin to write paper ('''this week''')
+
  
 
====RNN LM====
 
====RNN LM====
第143行: 第139行:
 
====Knowledge vector====
 
====Knowledge vector====
 
* Knowledge vector started
 
* Knowledge vector started
:* Analysis the wiki infomation of category and link into jso done, knowledge vector build graph done.
+
:* code done,to test the baseline with a task.
:* begin to code for train
+
:* problem with weight.
 
====relation====
 
====relation====
 
* Accomplish transE with almost the same performance as the paper did(even better)[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=316:result]
 
* Accomplish transE with almost the same performance as the paper did(even better)[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=316:result]
第160行: 第156行:
 
deatil:
 
deatil:
 
====Spell mistake====
 
====Spell mistake====
* add the xiaoI pingyin correct to framework.
 
 
====improve fuzzy match====
 
====improve fuzzy match====
 
* add Synonyms similarity using MERT-4 method(hold)
 
* add Synonyms similarity using MERT-4 method(hold)
 
====improve lucene search====
 
====improve lucene search====
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
+
:* mutli query's performance improve from 66.228 to 68.672. detail:[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Improved_search_method_by_rewrited_query]
:* now test the performance.
+
:* check the MERT problem that doesn't mach the qa
 
====Multi-Scene Recognition====
 
====Multi-Scene Recognition====
 
* done
 
* done
 
====XiaoI framework====
 
====XiaoI framework====
* ner from xiaoI
+
* ner from xiaoI done
 +
====query normalization====
 +
* using NER to normalize the word
 +
 
 
* new inter will install SEMPRE
 
* new inter will install SEMPRE
  
 
====patent====
 
====patent====
 
* done
 
* done

2014年12月15日 (一) 11:37的版本

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9/12 760GPU crashed again; grid-11 shutdown automatically.
  • Change 760gpu card of grid-12 and grid-14(+).
  • First down-frequency of gpu760.

Sparse DNN

RNN AM

  • Initial nnet seems not very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.(+)
  • Adjusting the learning rate.(+)
  • Trying toolkit of Microsoft.(+)
  • details at http://liuc.cslt.org/pages/rnnam.html
  • Reading papers

A new nnet training scheduler

Dropout & Maxout & Convolutive network

  • Drop out(+)
  • Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    • Problem 1) The effect of dropout in different noise proportion;
           2) The effect of MPE in different noise proportion;
           3) The effect of MPE+dropout in different noise proportion.
 Dropout is effective for minority.
    • Find and test unknown noise test-data.(++)
  • MaxOut
  • Pretraining based maxout, can't use large learning-rate.
  • P-norm
  • Need to solve the too small learning-rate problem
    • Add one normalization layer after the pnorm-layer
    • Add L2-norm upper bound
  • Convolutive network (+)

DAE(Deep Atuo-Encode)

 (1) train_clean
   drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
  ---------------------------------------------------------------------------------------------------------
      std-xEnt-sigmoid-baseline| 6.04            |    29.91         |   27.76         | 16.37
  ---------------------------------------------------------------------------------------------------------
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33         |   16.58         | 9.23
  ---------------------------------------------------------------------------------------------------------
   std+dae_cmvn_splice5_2-100  | 8.19            |    15.21         |   15.25         | 9.31
  ---------------------------------------------------------------------------------------------------------

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Harmonics and Teager energy features being investigation (++)

Speech rate training

  • Data ready on tencent set; some errors on speech rate dependent model. error fixed.
  • Retrain new model(+)

Scoring

  • Timber Comparison done.
  • harmonics based timber comparison: frequency based feature is better. done
  • GMM based timber comparison is done. Similar to speaker recognition. done
  • TODO: Code checkin and technique report. done

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data
  • HOLD

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 4% (GMM-based system)--Text independent
  • EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
  • test different number of components; fast i-vector computing
  • Test with number recordings, The 256 number component is the best.
  • Test with text-dependent recordings, The 1024 number component is the best.
  • Interpolation alpha is not sensitive.

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong
  • Prepare the test-case

Voice Conversion

  • Yiye is reading materials(+)


Text Processing

LM development

Domain specific LM

  • domain lm
  • Sougou2T : kn-count continue .
  • lm v2.0 done,just to test the wer.
  • new dict.

tag LM

  • summary done
  • need to do
  • tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (hold)
  • paper done,begin to modify .

RNN LM

  • rnn
  • test wer RNNLM on Chinese data from jietong-data(this week)
  • generate the ngram model from rnnlm and test the ppl with different size txt.[1]
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Knowledge vector

  • Knowledge vector started
  • code done,to test the baseline with a task.
  • problem with weight.

relation

  • Accomplish transE with almost the same performance as the paper did(even better)[2]

Character to word

  • Character to word conversion(hold)
  • prepare the task: word similarity
  • prepare the dict.

Translation

  • v5.0 demo released
  • cut the dict and use new segment-tool

QA

deatil:

Spell mistake

improve fuzzy match

  • add Synonyms similarity using MERT-4 method(hold)

improve lucene search

  • mutli query's performance improve from 66.228 to 68.672. detail:[3]
  • check the MERT problem that doesn't mach the qa

Multi-Scene Recognition

  • done

XiaoI framework

  • ner from xiaoI done

query normalization

  • using NER to normalize the word
  • new inter will install SEMPRE

patent

  • done