“2014-10-20”版本间的差异

2014年10月20日 (一) 07:23的最后版本

Speech Processing

AM development

Contour

NAN problem

nan recurrence

  ------------------------------------------------------------
   grid/atr.  |   Reproducible  |    add.
  ------------------------------------------------------------
   grid-10    |     yes         |   
  ------------------------------------------------------------
   grid-12    |     no          | "nan" in different position
  ------------------------------------------------------------
   grid-14    |     yes         |  
  ------------------------------------------------------------

Sparse DNN

Performance improvement found when pruned slightly
Experiments show that
Suggest to use TIMIT / AURORA 4 for training

RNN AM

Initial test on WSJ , leads to out-memory.
Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

First draft of the noisy training journal paper
Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

Drop out

dataset:wsj, testset:eval92

       std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
    ------------------------------------------------------------- 
       4.5 |     5.39    |    4.80    |   4.36     |    -

Test on noisy AURORA4 dataset

       std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
    ------------------------------------------------------------- 
      6.05 |     -       |    -       |   -        |   -

Continue the droptout on normal trained XEnt NNET , eg wsj. (+)
Draft the dropout-DNN weight distribution. (+)

Rectification

Still NAN error, need to debug. (+)

MaxOut (+)

Convolutive network

Test more configurations
Yiye will work on CNN
Reading CNN tutorial

Denoising & Farfield ASR

ICASSP paper submitted.

VAD

Add more silence tag "#" in pure-silence utterance text(train).

xEntropy model be training
need to test baseline.

Sum all sil-pdf as the silence posterior probability.

Program done, to tune the threshold

Speech rate training

Seems ROS model is superior to the normal one with faster speech
Suggest to extract speech data of different ROS, construct a new test set(+)
Suggest to use Tencent training data(+)

low resource language AM training

Use Chinese NN as initial NN, change the last layer

Various the used Chinese trained DNN layer numbers.
- feature_transform = 6000h_transform + 6000_N*hidden-layers

 nnet.init = random (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008         | 0.001 | 0.0001 |
 |   baseline     | 17.00(14*2h)  |       |        |
 |       4        | 17.75(9*0.6h) | 18.64 |        |
 |       3        | 16.85         |       |        |
 |       2        | 16.69         |       |        |
 |       1        | 16.87         |       |        |
 |       0        | 16.88         |       |        |

- feature_transform = uyghur_transform + 6000_N*hidden-layers

 nnet.init = random (4-N)*hidden-layers + output-layer
 Note: This is reproduced Yinshi's experiment
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 28.23 | 30.72 | 37.32  |
 |       3        | 22.40 |       |        |
 |       2        | 19.76 |       |        |
 |       1        | 17.41 |       |        |
 |       0        |       |       |        |

- feature_transform = 6000_transform + 6000_N*hidden-layers

 nnet.init = uyghur (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 17.80 | 18.55 | 21.06  |
 |       3        | 16.89 | 17.64 |        |
 |       2        |       |       |        |
 |       1        |       |       |        |
 |       0        |       |       |        |

Scoring

global scoring done.
Pitch & rhythm done, need testing
Harmonics program done, experiment to be done.

Confidence

Reproduce the experiments on fisher dataset.
Use the fisher DNN model to decode all-wsj dataset

Speaker ID

Preparing GMM-based server.

Emotion detection

Sinovoice is implementing the server

Text Processing

LM development

Domain specific LM

lm based on baidu_hi and baiduzhidao is done, test on shujutang test set.
weibo lm were training with pruning on counts(5,10,10,20,20),because it is too large. the ppl is twice as high than baidu_hi && baidu_zhidao.

dongxu get good vocabulary from big data. Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.

tag LM

use HIT's LTP tool to segment,pos and ner. the program is running(about 3 days) on baiduHi and baiduzhidao(total 365G)
will use the small test set from xiaoxi for address-tag..
now about more 1M address,will prune it using frequency.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

SSA-based local linear mapping still on running.
k-means classes change to 2.

Knowledge vector started

format the data
yuanbin will continue this work with help of xingchao.

Character to word conversion

prepare the task: word similarity
prepare the dict.

Google word vector train

some ideal will discuss on weekly report.

RNN LM

rnn

get baseline on nbest rescore of wer.

lstm+rnn

trained the RNN+LSTM lm on wsj_np_data about 200M. the neural net work is 100*100(lstm cell)*10000 with 100 classes. it cost about 200 minutes each epoch on 2 cpu kernels.
get baseline on nbest rescore of wer.
more detail on LSTM

Translation

v3.0 demo released

still slow
re-segment the word using new dictionary.will use the tencent-dic about 11w.
check new data.

QA

search method:

add the vsm and BM25 to improve the search. and the strategy of selecting the answer.

spell check

get ngram tool and make a simple demo.
get domain word list and pingyin tool from huilan.

- new inter will install SEMPRE

@@ 第36行： / 第36行： @@
 .5 |     5.39    |    4.80    |   4.36     |    -
-:* Test on noisy AURORA 4 dataset
+:* Test on noisy AURORA4 dataset
-:* Continue the droptout on normal trained XEnt NNET , eg wsj.
+        std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
-:* Draft the dropout-DNN weight distribution.
+     -------------------------------------------------------------
+.05 |     -       |    -       |   -        |   -
+:* Continue the droptout on normal trained XEnt NNET , eg wsj. (+)
+:* Draft the dropout-DNN weight distribution. (+)
 * Rectification
-:* Still NAN error, need to debug.
+:* Still NAN error, need to debug. (+)
-* MaxOut
+* MaxOut (+)
 * Convolutive network
 :*Test more configurations
 :* Yiye will work on CNN
+:* Reading CNN tutorial
 ====Denoising & Farfield ASR====
@@ 第57行： / 第61行： @@
 * Add more silence tag "#" in pure-silence utterance text(train).
 :* xEntropy model be training
+:* need to test baseline.
 * Sum all sil-pdf as the silence posterior probability.
+:* Program done, to tune the threshold
 ====Speech rate training====
-*
 * Seems ROS model is superior to the normal one with faster speech
-* Need to check distribution of ROS on WSJ
+* Suggest to extract speech data of different ROS, construct a new test set(+)
-* Suggest to extract speech data of different ROS, construct a new test set
+* Suggest to use Tencent training data(+)
-* Suggest to use Tencent training data
-* Suggest to remove silence when compute ROS
 ==== low resource language AM training ====
@@ 第73行： / 第76行： @@
 :**  feature_transform = 6000h_transform + 6000_N*hidden-layers
    nnet.init = random (4-N)*hidden-layers + output-layer
    | N / learn_rate | 0.008         | 0.001 | 0.0001 |
    |   baseline     | 17.00(14*2h)  |       |        |
@@ 第82行： / 第84行： @@
    |       0        | 16.88         |       |        |
 :**  feature_transform = uyghur_transform + 6000_N*hidden-layers
-nnet.init = random (4-N)*hidden-layers + output-layer
+  nnet.init = random (4-N)*hidden-layers + output-layer
-*Note: This is reproduced Yinshi's experiment
+  Note: This is reproduced Yinshi's experiment
+  | N / learn_rate | 0.008 | 0.001 | 0.0001 |
-| N / learn_rate | 0.008 | 0.001 | 0.0001 |
+  |   baseline     | 17.00 |       |        |
-|   baseline     | 17.00 |       |        |
+  |       4        | 28.23 | 30.72 | 37.32  |
-|       4        | 28.23 | 30.72 | 37.32  |
+  |       3        | 22.40 |       |        |
-|       3        | 22.40 |       |        |
+  |       2        | 19.76 |       |        |
-|       2        | 19.76 |       |        |
+  |       1        | 17.41 |       |        |
-|       1        | 17.41 |       |        |
+  |       0        |       |       |        |
-|       0        |       |       |        |
 :** feature_transform = 6000_transform + 6000_N*hidden-layers
-nnet.init = uyghur (4-N)*hidden-layers + output-layer
+  nnet.init = uyghur (4-N)*hidden-layers + output-layer
-| N / learn_rate | 0.008 | 0.001 | 0.0001 |
+  | N / learn_rate | 0.008 | 0.001 | 0.0001 |
-|   baseline     | 17.00 |       |        |
+  |   baseline     | 17.00 |       |        |
-|       4        | 17.80 | 18.55 | 21.06  |
+  |       4        | 17.80 | 18.55 | 21.06  |
-|       3        | 16.89 | 17.64 |        |
+  |       3        | 16.89 | 17.64 |        |
-|       2        |       |       |        |
+  |       2        |       |       |        |
-|       1        |       |       |        |
+  |       1        |       |       |        |
-|       0        |       |       |        |
+  |       0        |       |       |        |
 ====Scoring====
@@ 第126行： / 第127行： @@
 ====Domain specific LM====
+* lm based on baidu_hi and baiduzhidao is done, test on shujutang test set.
+* weibo lm were training with pruning on counts(5,10,10,20,20),because it is too large. the ppl is twice as high than baidu_hi && baidu_zhidao.
-h2. ngram generation is on going
+* dongxu get good vocabulary from big data. Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
-h2. look the memory and baidu_hi done
-h2. NUM tag LM:
-* maxi work is released.
-* yuanbin continue the tag lm work.
-* add the ner to tag lm .
-* Boost specific words like wifi if TAG model does not work for a particular word.
+===tag LM===
+* use HIT's LTP tool to segment,pos and ner. the program is running(about 3 days) on baiduHi and baiduzhidao(total 365G)
+* will use the small test set from xiaoxi for address-tag..
+* now about more 1M address,will prune it using frequency.
 ===Word2Vector===
@@ 第148行： / 第148行： @@
 * Knowledge vector started
 :* format the data
+:* yuanbin will continue this work with help of xingchao.
 * Character to word conversion
@@ 第154行： / 第155行： @@
 * Google word vector train
-:* improve the sampling method
+:* some ideal will discuss on weekly report.
 ===RNN LM===
 *rnn
+:* get baseline on nbest rescore of wer.
 *lstm+rnn
-: install the tool and prepare the data of wsj
+:* trained the RNN+LSTM lm on wsj_np_data about 200M. the neural net work is 100*100(lstm cell)*10000 with 100 classes. it cost about 200 minutes each epoch on 2 cpu kernels.
-: prepare the baseline.
+:* get baseline on nbest rescore of wer.
+:* more detail on LSTM
 ===Translation===
 * v3.0 demo released
 :* still slow
-:* re-segment the word using new dictionary.
+:* re-segment the word using new dictionary.will use the tencent-dic about 11w.
 :* check new data.
@@ 第171行： / 第173行： @@
 * search method:
-:* add the vsm and BM25 to improve the search. and the strategy of selecting the answer
+:* add the vsm and BM25 to improve the search. and the strategy of selecting the answer.
-:* segment the word using minimum granularity for lucene index and bag-of-words method.
+* spell check
-* new inter will install SEMPRE
+:* get ngram tool and make a simple demo.
+:* get domain word list and pingyin tool from huilan.
+** new inter will install SEMPRE

“2014-10-20”版本间的差异

2014年10月20日 (一) 07:23的最后版本

目录

Speech Processing

AM development

Contour

Sparse DNN

RNN AM

Noise training

Drop out & Rectification & convolutive network

Denoising & Farfield ASR

VAD

Speech rate training

low resource language AM training

Scoring

Confidence

Speaker ID

Emotion detection

Text Processing

LM development

Domain specific LM

tag LM

Word2Vector

W2V based doc classification

RNN LM

Translation

QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具