<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://index.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://index.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sheng_Su_2015-10-19</id>
		<title>Sheng Su 2015-10-19 - 版本历史</title>
		<link rel="self" type="application/atom+xml" href="http://index.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sheng_Su_2015-10-19"/>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php?title=Sheng_Su_2015-10-19&amp;action=history"/>
		<updated>2026-04-15T18:22:21Z</updated>
		<subtitle>本wiki的该页面的版本历史</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php?title=Sheng_Su_2015-10-19&amp;diff=17279&amp;oldid=prev</id>
		<title>Susheng：以“Last week: * I have find out the reason why two GPU work well, and four GPU don't. This base on two facts: *-    1. Mini-batch training : sum the gradient of all the...”为内容创建页面</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php?title=Sheng_Su_2015-10-19&amp;diff=17279&amp;oldid=prev"/>
				<updated>2015-10-20T00:49:58Z</updated>
		
		<summary type="html">&lt;p&gt;以“Last week: * I have find out the reason why two GPU work well, and four GPU don&amp;#039;t. This base on two facts: *-    1. Mini-batch training : sum the gradient of all the...”为内容创建页面&lt;/p&gt;
&lt;p&gt;&lt;b&gt;新页面&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Last week:&lt;br /&gt;
* I have find out the reason why two GPU work well, and four GPU don't. This base on two facts:&lt;br /&gt;
*-    1. Mini-batch training : sum the gradient of all the frames in the batch&lt;br /&gt;
*-    2. Mini-batch size : the baseline will not converge if we set mini-batch size over 1024&lt;br /&gt;
* Reason:&lt;br /&gt;
*-    Mini-batch size is M, after N mini-batch we sum all the gradient from 4 GPU and update the net once.(during the N mini-batch, we don’t update the net).This is equal to the baseline with mini-batch size of M*N*4, much larger than baseline. However if we update the net during the N mini-batch, it seems like, to some extent, reduce the mini-batch size. That’s why two GPUs work well, and four GPUs don’t.&lt;br /&gt;
This week:&lt;br /&gt;
* 1. Try net average.&lt;br /&gt;
* 2. Learning NG-SGD.&lt;/div&gt;</summary>
		<author><name>Susheng</name></author>	</entry>

	</feed>