Qixin Wang 2016-01-25
来自cslt Wiki
Work done in this week
word vector size:200
hidden size:500
mlp hidden size:400
maxout size:300
adadelta 0.3
---
fast mode, added cut, no global, no pz
zgt:song, si, giga, update: (grid-9, grid-9, grid-17, grid-17)
psm:song, si, giga, update: (grid-15, grid-15, grid-13, grid-11)
---
with dropout & without maxout:
batch_all(zgt): grid-12
batch_all_go(zgt): grid-11
---
batch training code:
doing debug
---
int32 * float32 -> float64
float32 * float32 -> float32
fix n*bugs
added maxout
added update vectors
added dropout
mini_batch data parallel training
400k sentences per day (Iambic are longer than QA questions)
deleted some long 224 iambic