Sheng Su 2015-10-19

来自cslt Wiki
跳转至: 导航搜索

Last week:

  • I have find out the reason why two GPU work well, and four GPU don't. This base on two facts:
  • - 1. Mini-batch training : sum the gradient of all the frames in the batch
  • - 2. Mini-batch size : the baseline will not converge if we set mini-batch size over 1024
  • Reason:
  • - Mini-batch size is M, after N mini-batch we sum all the gradient from 4 GPU and update the net once.(during the N mini-batch, we don’t update the net).This is equal to the baseline with mini-batch size of M*N*4, much larger than baseline. However if we update the net during the N mini-batch, it seems like, to some extent, reduce the mini-batch size. That’s why two GPUs work well, and four GPUs don’t.

This week:

  • 1. Try net average.
  • 2. Learning NG-SGD.