<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: A few parfor tips	</title>
	<atom:link href="https://undocumentedmatlab.com/articles/a-few-parfor-tips/feed" rel="self" type="application/rss+xml" />
	<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-few-parfor-tips</link>
	<description>Professional Matlab consulting, development and training</description>
	<lastBuildDate>Tue, 08 Aug 2017 07:18:44 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.3</generator>
	<item>
		<title>
		By: William Smith		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-411627</link>

		<dc:creator><![CDATA[William Smith]]></dc:creator>
		<pubDate>Tue, 08 Aug 2017 07:18:44 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-411627</guid>

					<description><![CDATA[If you have heterogenous workloads, i.e. some of the parfor tasks are quick and some are slow, and they are clustered together, e.g. the slow ones are at the start, try putting your work into an array, then randomizing the array:
&lt;pre lang=&quot;matlab&quot;&gt;
irand = randperm(numel(work));
work = work(irand);
&lt;/pre&gt;

If you have *very* heterogeneous workloads, such that some of the parfor elements will take a long time, and some will be ‘0 work’, it may be more efficient to compute which elements have ‘0 work’ beforehand and remove these elements from the work. Otherwise the very heterogeneous workloads won’t balance well across the cores, and you may well be left with a few cores running long running tasks, while the others have completed.]]></description>
			<content:encoded><![CDATA[<p>If you have heterogenous workloads, i.e. some of the parfor tasks are quick and some are slow, and they are clustered together, e.g. the slow ones are at the start, try putting your work into an array, then randomizing the array:</p>
<pre lang="matlab">
irand = randperm(numel(work));
work = work(irand);
</pre>
<p>If you have *very* heterogeneous workloads, such that some of the parfor elements will take a long time, and some will be ‘0 work’, it may be more efficient to compute which elements have ‘0 work’ beforehand and remove these elements from the work. Otherwise the very heterogeneous workloads won’t balance well across the cores, and you may well be left with a few cores running long running tasks, while the others have completed.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Yair Altman		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388486</link>

		<dc:creator><![CDATA[Yair Altman]]></dc:creator>
		<pubDate>Fri, 16 Sep 2016 08:47:25 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-388486</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388482&quot;&gt;Jes Vestervang Jensen&lt;/a&gt;.

@Jes, perhaps a possible explanation is that by using logical rather than physical cores, the cores utilize hyperthreading and perhaps in your specific case this has a greater advantage than the drawbacks of the extra overheads of parallelization management, OS context-switching, and extra memory. Using extra parallel processes could also help when the processes have a significant I/O portion, since during the I/O wait the CPU is idle and can therefore be used by the non-I/O portions of other parallel processes. In short, this is highly system-, program- and data-dependent. In the general case I found that limiting the number of processes to the number of physical cores is better than the default limit to the number of logical cores, but in specific cases it may well be different. This is easy to test on your specific system/program.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388482">Jes Vestervang Jensen</a>.</p>
<p>@Jes, perhaps a possible explanation is that by using logical rather than physical cores, the cores utilize hyperthreading and perhaps in your specific case this has a greater advantage than the drawbacks of the extra overheads of parallelization management, OS context-switching, and extra memory. Using extra parallel processes could also help when the processes have a significant I/O portion, since during the I/O wait the CPU is idle and can therefore be used by the non-I/O portions of other parallel processes. In short, this is highly system-, program- and data-dependent. In the general case I found that limiting the number of processes to the number of physical cores is better than the default limit to the number of logical cores, but in specific cases it may well be different. This is easy to test on your specific system/program.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jes Vestervang Jensen		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388483</link>

		<dc:creator><![CDATA[Jes Vestervang Jensen]]></dc:creator>
		<pubDate>Fri, 16 Sep 2016 08:35:09 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-388483</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388482&quot;&gt;Jes Vestervang Jensen&lt;/a&gt;.

I meant to say &quot;Doubling to the amount workers to the number of logical cores&quot;. :-)]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388482">Jes Vestervang Jensen</a>.</p>
<p>I meant to say &#8220;Doubling to the amount workers to the number of logical cores&#8221;. 🙂</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jes Vestervang Jensen		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-388482</link>

		<dc:creator><![CDATA[Jes Vestervang Jensen]]></dc:creator>
		<pubDate>Fri, 16 Sep 2016 08:33:23 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-388482</guid>

					<description><![CDATA[My experience is that only having workers corresponding to the physical cores does not cause the processor (several i7 and a pair of Xeon e5-2630v2) to clock up to full speed. Doubling to the amount of logical cores, however, causes the processor to increase its clock speed to the highest. It may very well be a bad performance metric, but I haven&#039;t looked in to the actual performance. 

Yair, can you elaborate a bit on less intuitive notion that the best performance is achieved with lower than maximum clock frequency?

I have used Intel&#039;s performance counter monitor on Linux to see what&#039;s going on:
https://software.intel.com/en-us/articles/intel-performance-counter-monitor]]></description>
			<content:encoded><![CDATA[<p>My experience is that only having workers corresponding to the physical cores does not cause the processor (several i7 and a pair of Xeon e5-2630v2) to clock up to full speed. Doubling to the amount of logical cores, however, causes the processor to increase its clock speed to the highest. It may very well be a bad performance metric, but I haven&#8217;t looked in to the actual performance. </p>
<p>Yair, can you elaborate a bit on less intuitive notion that the best performance is achieved with lower than maximum clock frequency?</p>
<p>I have used Intel&#8217;s performance counter monitor on Linux to see what&#8217;s going on:<br />
<a href="https://software.intel.com/en-us/articles/intel-performance-counter-monitor" rel="nofollow ugc">https://software.intel.com/en-us/articles/intel-performance-counter-monitor</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Sean de Wolski		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-387215</link>

		<dc:creator><![CDATA[Sean de Wolski]]></dc:creator>
		<pubDate>Fri, 02 Sep 2016 16:27:17 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-387215</guid>

					<description><![CDATA[I think the example above, where you want to run multiple functions, might be better fulfilled by parfeval.]]></description>
			<content:encoded><![CDATA[<p>I think the example above, where you want to run multiple functions, might be better fulfilled by parfeval.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Sam Roberts		</title>
		<link>https://undocumentedmatlab.com/articles/a-few-parfor-tips#comment-382515</link>

		<dc:creator><![CDATA[Sam Roberts]]></dc:creator>
		<pubDate>Thu, 07 Jul 2016 12:30:20 +0000</pubDate>
		<guid isPermaLink="false">http://undocumentedmatlab.com/?p=6516#comment-382515</guid>

					<description><![CDATA[Yair, I&#039;m pretty sure that if you find a case where parpool is using the number of logical (rather than physical) cores as its default number of workers, that&#039;s a bug. As you mentioned in your section on hyperthreading, there are good reasons why it should be the number of physical cores.

I&#039;ve always felt that the documentation could be clearer on that area: perhaps even providing a little tutorial on what hyperthreading is, the difference between logical and physical cores, and why it&#039;s very likely that the number of physical cores is the relevant number.

I know I&#039;ve answered quite a few questions on StackOverflow that were based on misunderstandings related to these topics. I&#039;ve also worked with customers who were confused when their machine appeared to working at full tilt, but Task Manager showed each core at half-potential, because they had hyperthreading switched on.]]></description>
			<content:encoded><![CDATA[<p>Yair, I&#8217;m pretty sure that if you find a case where parpool is using the number of logical (rather than physical) cores as its default number of workers, that&#8217;s a bug. As you mentioned in your section on hyperthreading, there are good reasons why it should be the number of physical cores.</p>
<p>I&#8217;ve always felt that the documentation could be clearer on that area: perhaps even providing a little tutorial on what hyperthreading is, the difference between logical and physical cores, and why it&#8217;s very likely that the number of physical cores is the relevant number.</p>
<p>I know I&#8217;ve answered quite a few questions on StackOverflow that were based on misunderstandings related to these topics. I&#8217;ve also worked with customers who were confused when their machine appeared to working at full tilt, but Task Manager showed each core at half-potential, because they had hyperthreading switched on.</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
