Susheng：以“Last week: * I have find out the reason why two GPU work well, and four GPU don't. This base on two facts: *- 1. Mini-batch training : sum the gradient of all the...”为内容创建页面

2015-10-20T00:49:58Z

以“Last week: * I have find out the reason why two GPU work well, and four GPU don't. This base on two facts: *- 1. Mini-batch training : sum the gradient of all the...”为内容创建页面

新页面

Last week:
* I have find out the reason why two GPU work well, and four GPU don't. This base on two facts:
*- 1. Mini-batch training : sum the gradient of all the frames in the batch
*- 2. Mini-batch size : the baseline will not converge if we set mini-batch size over 1024
* Reason:
*- Mini-batch size is M, after N mini-batch we sum all the gradient from 4 GPU and update the net once.(during the N mini-batch, we don’t update the net).This is equal to the baseline with mini-batch size of M*N*4, much larger than baseline. However if we update the net during the N mini-batch, it seems like, to some extent, reduce the mini-batch size. That’s why two GPUs work well, and four GPUs don’t.
This week:
* 1. Try net average.
* 2. Learning NG-SGD.

Sheng Su 2015-10-19 - 版本历史

Susheng：以“Last week: * I have find out the reason why two GPU work well, and four GPU don't. This base on two facts: *- 1. Mini-batch training : sum the gradient of all the...”为内容创建页面