Qixin Wang 2016-01-25

来自cslt Wiki
2016年1月24日 (日) 15:29Wqx讨论 | 贡献的版本

跳转至: 导航搜索

Work done in this week

word vector size:200

hidden size:500

mlp hidden size:400

maxout size:300

adadelta 0.3

---

fast mode, added cut, no global, no pz

zgt:song, si, giga, update: (grid-9, grid-9, grid-17, grid-17)

psm:song, si, giga, update: (grid-15, grid-15, grid-13, grid-11)

---

with dropout & without maxout:

batch_all(zgt): grid-12

batch_all_go(zgt): grid-11

---

batch training code:

doing debug

---

int32 * float32 -> float64

float32 * float32 -> float32





fix n*bugs

added maxout

added update vectors

added dropout

mini_batch data parallel training

400k sentences per day (Iambic are longer than QA questions)



deleted some long 224 iambic