“Qixin Wang 2016-01-25”版本间的差异
来自cslt Wiki
| 第39行: | 第39行: | ||
float32 * float32 -> float32 | float32 * float32 -> float32 | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | |||
| + | fix n*bugs | ||
| + | |||
| + | added maxout | ||
| + | |||
| + | added update vectors | ||
| + | |||
| + | added dropout | ||
| + | |||
| + | mini_batch data parallel training | ||
| + | |||
| + | 400k sentences per day (Iambic are longer than QA questions) | ||
| + | |||
| + | |||
| + | ----- | ||
| + | |||
| + | deleted some long 224 iambic | ||
2016年1月24日 (日) 15:29的版本
Work done in this week
word vector size:200
hidden size:500
mlp hidden size:400
maxout size:300
adadelta 0.3
---
fast mode, added cut, no global, no pz
zgt:song, si, giga, update: (grid-9, grid-9, grid-17, grid-17)
psm:song, si, giga, update: (grid-15, grid-15, grid-13, grid-11)
---
with dropout & without maxout:
batch_all(zgt): grid-12
batch_all_go(zgt): grid-11
---
batch training code:
doing debug
---
int32 * float32 -> float64
float32 * float32 -> float32
fix n*bugs
added maxout
added update vectors
added dropout
mini_batch data parallel training
400k sentences per day (Iambic are longer than QA questions)
deleted some long 224 iambic