Qixin Wang 2016-01-25

来自cslt Wiki
2016年1月28日 (四) 01:11Wqx讨论 | 贡献的版本

跳转至: 导航搜索

Work done in this week

word vector size:200

hidden size:500

mlp hidden size:400

maxout size:300

adadelta 0.3

---

fast mode, added cut, no global, no pz

zgt:song, si, giga, update: (grid-9, grid-9, grid-17, grid-17)

psm:song, si, giga, update: (grid-15, grid-15, grid-13, grid-11)

---

with dropout & without maxout:

batch_all(zgt): grid-12

batch_all_go(zgt): grid-11

---

batch training code:

doing debug

---

int32 * float32 -> float64

float32 * float32 -> float32




fix n*bugs

added maxout

added update vectors

added dropout

deleted some long 224 iambic (length > 120)


mini_batch data parallel training

400k sentences per day (Iambic are longer than QA questions), up to 50X times faster! (now as faster as Bengio's code)

finished iambic format training code&training



some results: