“RNN test”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
wsj Test
Lr讨论 | 贡献
Test
 
(相同用户的10个中间修订版本未显示)
第5行: 第5行:
 
* nplm: NN LM, large scale data [http://nlg.isi.edu/software/nplm/]
 
* nplm: NN LM, large scale data [http://nlg.isi.edu/software/nplm/]
 
* RNN toolkit from microsoft [http://research.microsoft.com/en-us/projects/rnn/]
 
* RNN toolkit from microsoft [http://research.microsoft.com/en-us/projects/rnn/]
 +
* cslm [http://www-lium.univ-lemans.fr/~cslm/]
  
 
==paper==
 
==paper==
 +
*[[14-9-30]]
 +
*[[2014-10-9]]
 +
 
==Steps==
 
==Steps==
 
[[process dict and data]]
 
[[process dict and data]]
 
==Test==
 
==Test==
[wsi_data]
+
*[[wsj_data]]
=== wsj_data ===
+
*[[chinese_data_gigword]]
*Data
+
*[[jt-chinese]]
:* size:200M,npdata
+
*parameter
+
      rand_seed=1
+
      nwords=10000 # This is how many words we're putting in the vocab of the RNNLM.
+
      hidden=320
+
      class=300 # Num-classes... should be somewhat larger than sqrt of nwords.
+
      direct=2000 # Number of weights that are used for "direct" connections, in millions.
+
      rnnlm_ver=rnnlm-0.3e # version of RNNLM to use
+
      threads=1 # for RNNLM-HS
+
      bptt=2 # length of BPTT unfolding in RNNLM
+
      bptt_block=20 # length of BPTT unfolding in RNNLM
+
*Train RNNLM set
+
 
+
{| border="2px"
+
|+ Train Set Environment
+
|-
+
! Parameters  !! hidden !! class !! direct !! bbt !! bptt_block !! threads !!direct-order!!rand_seed!!nwords!!time(min)
+
|-
+
!set1
+
| 320 || 300 || 2000 || 2 || 20 || 1 || 4 || 1 || 10000||3380(56h)
+
|-
+
|}
+
 
+
===RNNLM Rescore===
+
*Acoustic Model
+
** location: /nfs/disk/work/users/zhangzy/work/train_wsj_eng_new/data/train_si284
+
**
+
*test set
+
** location: /nfs/disk/work/users/zhangzy/work/train_wsj_eng_new/dt/test_eval92
+
** decode: /nfs/disk/work/users/zhangzy/work/train_wsj_eng_new/exp/tri4b_dnn_org/decode_eval92_tri4b_dnn_org
+
*Result
+
** lm:4.16%,rnnlm:3.47%
+
 
+
==chinese data==
+
===prepare data===
+
*now data
+
** gigaword: /work2/xingchao/corpus/Chinese_corpus/gigaword
+
** bing parallel corpus:/nfs/disk/work/users/xingchao/bing_dict
+
** baidu:
+
** sougou:
+
* using data
+
** sample gigword about 344M
+
** dict:tencent11w
+
*train set
+
 
+
{| border="2px"
+
|+ Train Set Environment
+
|-
+
! Parameters  !! hidden !! class !! direct !! bbt !! bptt_block !! threads !!direct-order!!rand_seed!!nwords!!time(min)
+
|-
+
!set1
+
| 320 || 300 || 2000 || 2 || 20 || 1 || 4 || 1 || 10000||3380(56h)
+
|-
+
|}
+

2014年11月3日 (一) 06:26的最后版本

tool

  • LTSM/RNN training, GPU&deep supported [1]
  • RNNLM: RNN LM toolkit [2]
  • RWTHLM: RNN LTSM toolkit [3]
  • nplm: NN LM, large scale data [4]
  • RNN toolkit from microsoft [5]
  • cslm [6]

paper

Steps

process dict and data

Test