<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Allocation performance take 2	</title>
	<atom:link href="https://undocumentedmatlab.com/articles/allocation-performance-take-2/feed" rel="self" type="application/rss+xml" />
	<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=allocation-performance-take-2</link>
	<description>Professional Matlab consulting, development and training</description>
	<lastBuildDate>Thu, 05 Sep 2013 15:58:23 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.3</generator>
	<item>
		<title>
		By: Roberto		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-256055</link>

		<dc:creator><![CDATA[Roberto]]></dc:creator>
		<pubDate>Thu, 05 Sep 2013 15:58:23 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-256055</guid>

					<description><![CDATA[The results I get from R2012b on my MacBook Pro (10.8.4) are quite different from the plot you&#039;ve been sent. The performance of &lt;i&gt;zeros&lt;/i&gt; and &lt;i&gt;ones&lt;/i&gt; is practically identical for me, to the point that the two lines in the plot are practically indistinguishable (the maximum difference between the two is about 0.01&#160;sec &lt;b&gt;without&lt;/b&gt; normalising by the number of iterations).
Also, the time growth is linear (no improvement for 200K+ elements), which makes me wonder if R2013a introduced a new memory allocation algorithm.]]></description>
			<content:encoded><![CDATA[<p>The results I get from R2012b on my MacBook Pro (10.8.4) are quite different from the plot you&#8217;ve been sent. The performance of <i>zeros</i> and <i>ones</i> is practically identical for me, to the point that the two lines in the plot are practically indistinguishable (the maximum difference between the two is about 0.01&nbsp;sec <b>without</b> normalising by the number of iterations).<br />
Also, the time growth is linear (no improvement for 200K+ elements), which makes me wonder if R2013a introduced a new memory allocation algorithm.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Yair Altman		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-243241</link>

		<dc:creator><![CDATA[Yair Altman]]></dc:creator>
		<pubDate>Tue, 20 Aug 2013 21:36:15 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-243241</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-243075&quot;&gt;Michelle Hirsch&lt;/a&gt;.

@Michelle - thanks for the clarification]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-243075">Michelle Hirsch</a>.</p>
<p>@Michelle &#8211; thanks for the clarification</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Michelle Hirsch		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-243075</link>

		<dc:creator><![CDATA[Michelle Hirsch]]></dc:creator>
		<pubDate>Tue, 20 Aug 2013 14:11:38 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-243075</guid>

					<description><![CDATA[Interesting assessment Yair, but it turns out that the reasons for the behavior changes aren’t what you thought. The performance change for zeros in R2008b resulted from a change in the underlying MATLAB memory management architecture at that time.]]></description>
			<content:encoded><![CDATA[<p>Interesting assessment Yair, but it turns out that the reasons for the behavior changes aren’t what you thought. The performance change for zeros in R2008b resulted from a change in the underlying MATLAB memory management architecture at that time.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Yair Altman		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-240150</link>

		<dc:creator><![CDATA[Yair Altman]]></dc:creator>
		<pubDate>Thu, 15 Aug 2013 18:06:26 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-240150</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-239592&quot;&gt;Amro&lt;/a&gt;.

@Amro - thanks for the detailed comment and references. You may indeed be correct regarding Intel, since I&#039;ve received the following results for a MacBook Pro (R2013a, Mountain Lion) from Malcolm Lidierth (thanks!):
&lt;img alt=&quot;Allocation results on Mac (R2013a, Mountain Lion)&quot; src=&quot;http://undocumentedmatlab.com/images/perfTest-mac.gif&quot; title=&quot;Allocation results on Mac (R2013a, Mountain Lion)&quot; width=&quot;488&quot; height=&quot;387&quot; /&gt;]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-239592">Amro</a>.</p>
<p>@Amro &#8211; thanks for the detailed comment and references. You may indeed be correct regarding Intel, since I&#8217;ve received the following results for a MacBook Pro (R2013a, Mountain Lion) from Malcolm Lidierth (thanks!):<br />
<img alt="Allocation results on Mac (R2013a, Mountain Lion)" src="http://undocumentedmatlab.com/images/perfTest-mac.gif" title="Allocation results on Mac (R2013a, Mountain Lion)" width="488" height="387" /></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Amro		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-239592</link>

		<dc:creator><![CDATA[Amro]]></dc:creator>
		<pubDate>Thu, 15 Aug 2013 00:04:06 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-239592</guid>

					<description><![CDATA[Just yesterday, I was looking into a similar topic on Stack Overflow: http://stackoverflow.com/a/18217986/97160

One of the things I found while investigating the issue is that MATLAB appears to be using a custom memory allocator optimized for multi-threaded cases, namely Intel TBB scalable memory allocator (libmx.dll had a dependency on tbbmalloc.dll which is Intel&#039;s library). I suspect that the implementation of zeros switch to this parallel memory allocator once the size is large enough.

btw there are all sorts of memory allocators out there, each claiming to be better than the others: http://en.wikipedia.org/wiki/Malloc#Implementations

---

I should point out that &quot;bzero&quot; you mentioned is now deprecated [1], and even appears to be using the same underlying call as &quot;memset&quot; [2]. Even the specialized &quot;ZeroMemory&quot; in the Win32 API is typedef&#039;ed against &quot;memset&quot; [3] (which is probably optimized for your platform, whether that&#039;s implemented in kernel code or by the CRT library).

I think the difference between zeros and ones could be explained by the performance of malloc+memset vs. calloc. There&#039;s an excellent explanation over here: http://stackoverflow.com/a/2688522/97160
_
[1]: http://c-unix-linux.blogspot.com/2009/01/bzero-and-memset.html
[2]: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown
[3]: http://stackoverflow.com/questions/3038302/why-do-zeromemory-etc-exist-when-there-are-memset-etc-already]]></description>
			<content:encoded><![CDATA[<p>Just yesterday, I was looking into a similar topic on Stack Overflow: <a href="http://stackoverflow.com/a/18217986/97160" rel="nofollow ugc">http://stackoverflow.com/a/18217986/97160</a></p>
<p>One of the things I found while investigating the issue is that MATLAB appears to be using a custom memory allocator optimized for multi-threaded cases, namely Intel TBB scalable memory allocator (libmx.dll had a dependency on tbbmalloc.dll which is Intel&#8217;s library). I suspect that the implementation of zeros switch to this parallel memory allocator once the size is large enough.</p>
<p>btw there are all sorts of memory allocators out there, each claiming to be better than the others: <a href="http://en.wikipedia.org/wiki/Malloc#Implementations" rel="nofollow ugc">http://en.wikipedia.org/wiki/Malloc#Implementations</a></p>
<p>&#8212;</p>
<p>I should point out that &#8220;bzero&#8221; you mentioned is now deprecated [1], and even appears to be using the same underlying call as &#8220;memset&#8221; [2]. Even the specialized &#8220;ZeroMemory&#8221; in the Win32 API is typedef&#8217;ed against &#8220;memset&#8221; [3] (which is probably optimized for your platform, whether that&#8217;s implemented in kernel code or by the CRT library).</p>
<p>I think the difference between zeros and ones could be explained by the performance of malloc+memset vs. calloc. There&#8217;s an excellent explanation over here: <a href="http://stackoverflow.com/a/2688522/97160" rel="nofollow ugc">http://stackoverflow.com/a/2688522/97160</a><br />
_<br />
[1]: <a href="http://c-unix-linux.blogspot.com/2009/01/bzero-and-memset.html" rel="nofollow ugc">http://c-unix-linux.blogspot.com/2009/01/bzero-and-memset.html</a><br />
[2]: <a href="http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown" rel="nofollow ugc">http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown</a><br />
[3]: <a href="http://stackoverflow.com/questions/3038302/why-do-zeromemory-etc-exist-when-there-are-memset-etc-already" rel="nofollow ugc">http://stackoverflow.com/questions/3038302/why-do-zeromemory-etc-exist-when-there-are-memset-etc-already</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Joshua Leahy		</title>
		<link>https://undocumentedmatlab.com/articles/allocation-performance-take-2#comment-239578</link>

		<dc:creator><![CDATA[Joshua Leahy]]></dc:creator>
		<pubDate>Wed, 14 Aug 2013 23:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=4086#comment-239578</guid>

					<description><![CDATA[It&#039;s likely that zeros becomes much faster than ones at such large sizes because matlab would switch to using mmap rather than a combination of malloc and bzero. Mmap provides you with any amount of prezeroed memory in constant time.

The trick is achieved by the operating system lazily allocating the memory on first use. If I&#039;m right then you might see a penalty on the first use of larger allocations, but not on smaller ones.]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s likely that zeros becomes much faster than ones at such large sizes because matlab would switch to using mmap rather than a combination of malloc and bzero. Mmap provides you with any amount of prezeroed memory in constant time.</p>
<p>The trick is achieved by the operating system lazily allocating the memory on first use. If I&#8217;m right then you might see a penalty on the first use of larger allocations, but not on smaller ones.</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
