“2014-10-20”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lr讨论 | 贡献
Text Processing
 
(2位用户的8个中间修订版本未显示)
第36行: 第36行:
 
         4.5 |    5.39    |    4.80    |  4.36    |    -       
 
         4.5 |    5.39    |    4.80    |  4.36    |    -       
  
:* Test on noisy AURORA 4 dataset
+
:* Test on noisy AURORA4 dataset
:* Continue the droptout on normal trained XEnt NNET , eg wsj.  
+
        std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
:* Draft the dropout-DNN weight distribution.
+
    -------------------------------------------------------------
 +
      6.05 |    -      |    -      |  -        |  -
 +
:* Continue the droptout on normal trained XEnt NNET , eg wsj. (+)
 +
:* Draft the dropout-DNN weight distribution. (+)
  
 
* Rectification
 
* Rectification
:* Still NAN error, need to debug.
+
:* Still NAN error, need to debug. (+)
  
* MaxOut
+
* MaxOut (+)
  
 
* Convolutive network
 
* Convolutive network
 
:*Test more configurations  
 
:*Test more configurations  
 
:* Yiye will work on CNN
 
:* Yiye will work on CNN
 +
:* Reading CNN tutorial
  
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
第57行: 第61行:
 
* Add more silence tag "#" in pure-silence utterance text(train).
 
* Add more silence tag "#" in pure-silence utterance text(train).
 
:* xEntropy model be training
 
:* xEntropy model be training
 +
:* need to test baseline.
  
 
* Sum all sil-pdf as the silence posterior probability.
 
* Sum all sil-pdf as the silence posterior probability.
 +
:* Program done, to tune the threshold
  
 
====Speech rate training====
 
====Speech rate training====
*
 
 
* Seems ROS model is superior to the normal one with faster speech
 
* Seems ROS model is superior to the normal one with faster speech
* Need to check distribution of ROS on WSJ
+
* Suggest to extract speech data of different ROS, construct a new test set(+)
* Suggest to extract speech data of different ROS, construct a new test set
+
* Suggest to use Tencent training data(+)
* Suggest to use Tencent training data
+
* Suggest to remove silence when compute ROS
+
  
 
==== low resource language AM training ====
 
==== low resource language AM training ====
第73行: 第76行:
 
:**  feature_transform = 6000h_transform + 6000_N*hidden-layers
 
:**  feature_transform = 6000h_transform + 6000_N*hidden-layers
 
   nnet.init = random (4-N)*hidden-layers + output-layer
 
   nnet.init = random (4-N)*hidden-layers + output-layer
 
 
   | N / learn_rate | 0.008        | 0.001 | 0.0001 |
 
   | N / learn_rate | 0.008        | 0.001 | 0.0001 |
 
   |  baseline    | 17.00(14*2h)  |      |        |
 
   |  baseline    | 17.00(14*2h)  |      |        |
第82行: 第84行:
 
   |      0        | 16.88        |      |        |   
 
   |      0        | 16.88        |      |        |   
 
:**  feature_transform = uyghur_transform + 6000_N*hidden-layers
 
:**  feature_transform = uyghur_transform + 6000_N*hidden-layers
nnet.init = random (4-N)*hidden-layers + output-layer
+
  nnet.init = random (4-N)*hidden-layers + output-layer
*Note: This is reproduced Yinshi's experiment
+
  Note: This is reproduced Yinshi's experiment
 
+
  | N / learn_rate | 0.008 | 0.001 | 0.0001 |
| N / learn_rate | 0.008 | 0.001 | 0.0001 |
+
  |  baseline    | 17.00 |      |        |
|  baseline    | 17.00 |      |        |
+
  |      4        | 28.23 | 30.72 | 37.32  |
|      4        | 28.23 | 30.72 | 37.32  |
+
  |      3        | 22.40 |      |        |
|      3        | 22.40 |      |        |
+
  |      2        | 19.76 |      |        |
|      2        | 19.76 |      |        |
+
  |      1        | 17.41 |      |        |
|      1        | 17.41 |      |        |
+
  |      0        |      |      |        |
|      0        |      |      |        |
+
  
 
:** feature_transform = 6000_transform + 6000_N*hidden-layers
 
:** feature_transform = 6000_transform + 6000_N*hidden-layers
nnet.init = uyghur (4-N)*hidden-layers + output-layer
+
  nnet.init = uyghur (4-N)*hidden-layers + output-layer
| N / learn_rate | 0.008 | 0.001 | 0.0001 |
+
  | N / learn_rate | 0.008 | 0.001 | 0.0001 |
|  baseline    | 17.00 |      |        |
+
  |  baseline    | 17.00 |      |        |
|      4        | 17.80 | 18.55 | 21.06  |
+
  |      4        | 17.80 | 18.55 | 21.06  |
|      3        | 16.89 | 17.64 |        |
+
  |      3        | 16.89 | 17.64 |        |
|      2        |      |      |        |
+
  |      2        |      |      |        |
|      1        |      |      |        |
+
  |      1        |      |      |        |
|      0        |      |      |        |
+
  |      0        |      |      |        |
  
 
====Scoring====
 
====Scoring====
第126行: 第127行:
  
 
====Domain specific LM====
 
====Domain specific LM====
 +
* lm based on baidu_hi and baiduzhidao is done, test on shujutang test set.
 +
* weibo lm were training with pruning on counts(5,10,10,20,20),because it is too large. the ppl is twice as high than baidu_hi && baidu_zhidao. 
  
h2. ngram generation is on going
+
* dongxu get good vocabulary from big data. Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
h2. look the memory and baidu_hi done
+
 
+
h2. NUM tag LM:
+
* maxi work is released.
+
* yuanbin continue the tag lm work.
+
* add the ner to tag lm .
+
* Boost specific words like wifi if TAG model does not work for a particular word.
+
  
 +
===tag LM===
 +
* use HIT's LTP tool to segment,pos and ner. the program is running(about 3 days) on baiduHi and baiduzhidao(total 365G)
 +
* will use the small test set from xiaoxi for address-tag..
 +
* now about more 1M address,will prune it using frequency.
  
 
===Word2Vector===
 
===Word2Vector===
第148行: 第148行:
 
* Knowledge vector started
 
* Knowledge vector started
 
:* format the data
 
:* format the data
 +
:* yuanbin will continue this work with help of xingchao.
  
 
* Character to word conversion
 
* Character to word conversion
第154行: 第155行:
  
 
* Google word vector train
 
* Google word vector train
:* improve the sampling method
+
:* some ideal will discuss on weekly report.
 
+
 
===RNN LM===
 
===RNN LM===
 
*rnn
 
*rnn
 +
:* get baseline on nbest rescore of wer.
 
*lstm+rnn
 
*lstm+rnn
: install the tool and prepare the data of wsj
+
:* trained the RNN+LSTM lm on wsj_np_data about 200M. the neural net work is 100*100(lstm cell)*10000 with 100 classes. it cost about 200 minutes each epoch on 2 cpu kernels.
: prepare the baseline.
+
:* get baseline on nbest rescore of wer.
 +
:* more detail on LSTM 
 
===Translation===
 
===Translation===
  
 
* v3.0 demo released
 
* v3.0 demo released
 
:* still slow
 
:* still slow
:* re-segment the word using new dictionary.
+
:* re-segment the word using new dictionary.will use the tencent-dic about 11w.
 
:* check new data.
 
:* check new data.
  
第171行: 第173行:
  
 
* search method:
 
* search method:
:* add the vsm and BM25 to improve the search. and the strategy of selecting the answer
+
:* add the vsm and BM25 to improve the search. and the strategy of selecting the answer.
:* segment the word using minimum granularity for lucene index and bag-of-words method.
+
* spell check
* new inter will install SEMPRE
+
:* get ngram tool and make a simple demo.
 +
:* get domain word list and pingyin tool from huilan.
 +
** new inter will install SEMPRE

2014年10月20日 (一) 07:23的最后版本

Speech Processing

AM development

Contour

  • NAN problem
  • nan recurrence
  ------------------------------------------------------------
   grid/atr.  |   Reproducible  |    add.
  ------------------------------------------------------------
   grid-10    |     yes         |   
  ------------------------------------------------------------
   grid-12    |     no          | "nan" in different position
  ------------------------------------------------------------
   grid-14    |     yes         |  
  ------------------------------------------------------------

Sparse DNN

  • Performance improvement found when pruned slightly
  • Experiments show that
  • Suggest to use TIMIT / AURORA 4 for training

RNN AM

  • Initial test on WSJ , leads to out-memory.
  • Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

  • First draft of the noisy training journal paper
  • Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

  • Drop out
  • dataset:wsj, testset:eval92
       std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
    ------------------------------------------------------------- 
       4.5 |     5.39    |    4.80    |   4.36     |    -      
  • Test on noisy AURORA4 dataset
       std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
    ------------------------------------------------------------- 
      6.05 |     -       |    -       |   -        |   -
  • Continue the droptout on normal trained XEnt NNET , eg wsj. (+)
  • Draft the dropout-DNN weight distribution. (+)
  • Rectification
  • Still NAN error, need to debug. (+)
  • MaxOut (+)
  • Convolutive network
  • Test more configurations
  • Yiye will work on CNN
  • Reading CNN tutorial

Denoising & Farfield ASR

  • ICASSP paper submitted.

VAD

  • Add more silence tag "#" in pure-silence utterance text(train).
  • xEntropy model be training
  • need to test baseline.
  • Sum all sil-pdf as the silence posterior probability.
  • Program done, to tune the threshold

Speech rate training

  • Seems ROS model is superior to the normal one with faster speech
  • Suggest to extract speech data of different ROS, construct a new test set(+)
  • Suggest to use Tencent training data(+)

low resource language AM training

  • Use Chinese NN as initial NN, change the last layer
  • Various the used Chinese trained DNN layer numbers.
    • feature_transform = 6000h_transform + 6000_N*hidden-layers
 nnet.init = random (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008         | 0.001 | 0.0001 |
 |   baseline     | 17.00(14*2h)  |       |        |
 |       4        | 17.75(9*0.6h) | 18.64 |        |
 |       3        | 16.85         |       |        |
 |       2        | 16.69         |       |        |
 |       1        | 16.87         |       |        |
 |       0        | 16.88         |       |        |  
    • feature_transform = uyghur_transform + 6000_N*hidden-layers
 nnet.init = random (4-N)*hidden-layers + output-layer
 Note: This is reproduced Yinshi's experiment
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 28.23 | 30.72 | 37.32  |
 |       3        | 22.40 |       |        |
 |       2        | 19.76 |       |        |
 |       1        | 17.41 |       |        |
 |       0        |       |       |        |
    • feature_transform = 6000_transform + 6000_N*hidden-layers
 nnet.init = uyghur (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 17.80 | 18.55 | 21.06  |
 |       3        | 16.89 | 17.64 |        |
 |       2        |       |       |        |
 |       1        |       |       |        |
 |       0        |       |       |        |

Scoring

  • global scoring done.
  • Pitch & rhythm done, need testing
  • Harmonics program done, experiment to be done.

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset


Speaker ID

  • Preparing GMM-based server.

Emotion detection

  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

  • lm based on baidu_hi and baiduzhidao is done, test on shujutang test set.
  • weibo lm were training with pruning on counts(5,10,10,20,20),because it is too large. the ppl is twice as high than baidu_hi && baidu_zhidao.
  • dongxu get good vocabulary from big data. Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.

tag LM

  • use HIT's LTP tool to segment,pos and ner. the program is running(about 3 days) on baiduHi and baiduzhidao(total 365G)
  • will use the small test set from xiaoxi for address-tag..
  • now about more 1M address,will prune it using frequency.

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • SSA-based local linear mapping still on running.
  • k-means classes change to 2.
  • Knowledge vector started
  • format the data
  • yuanbin will continue this work with help of xingchao.
  • Character to word conversion
  • prepare the task: word similarity
  • prepare the dict.
  • Google word vector train
  • some ideal will discuss on weekly report.

RNN LM

  • rnn
  • get baseline on nbest rescore of wer.
  • lstm+rnn
  • trained the RNN+LSTM lm on wsj_np_data about 200M. the neural net work is 100*100(lstm cell)*10000 with 100 classes. it cost about 200 minutes each epoch on 2 cpu kernels.
  • get baseline on nbest rescore of wer.
  • more detail on LSTM

Translation

  • v3.0 demo released
  • still slow
  • re-segment the word using new dictionary.will use the tencent-dic about 11w.
  • check new data.

QA

  • search method:
  • add the vsm and BM25 to improve the search. and the strategy of selecting the answer.
  • spell check
  • get ngram tool and make a simple demo.
  • get domain word list and pingyin tool from huilan.
    • new inter will install SEMPRE