Comments on: A few parfor tips https://undocumentedmatlab.com/blog_old/a-few-parfor-tips Charting Matlab's unsupported hidden underbelly Wed, 20 May 2020 03:01:17 +0000 hourly 1 https://wordpress.org/?v=4.4.1 By: William Smithhttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-411627 Tue, 08 Aug 2017 07:18:44 +0000 http://undocumentedmatlab.com/?p=6516#comment-411627 If you have heterogenous workloads, i.e. some of the parfor tasks are quick and some are slow, and they are clustered together, e.g. the slow ones are at the start, try putting your work into an array, then randomizing the array:

irand = randperm(numel(work));
work = work(irand);

If you have *very* heterogeneous workloads, such that some of the parfor elements will take a long time, and some will be ‘0 work’, it may be more efficient to compute which elements have ‘0 work’ beforehand and remove these elements from the work. Otherwise the very heterogeneous workloads won’t balance well across the cores, and you may well be left with a few cores running long running tasks, while the others have completed.

]]>
By: Yair Altmanhttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-388486 Fri, 16 Sep 2016 08:47:25 +0000 http://undocumentedmatlab.com/?p=6516#comment-388486 @Jes, perhaps a possible explanation is that by using logical rather than physical cores, the cores utilize hyperthreading and perhaps in your specific case this has a greater advantage than the drawbacks of the extra overheads of parallelization management, OS context-switching, and extra memory. Using extra parallel processes could also help when the processes have a significant I/O portion, since during the I/O wait the CPU is idle and can therefore be used by the non-I/O portions of other parallel processes. In short, this is highly system-, program- and data-dependent. In the general case I found that limiting the number of processes to the number of physical cores is better than the default limit to the number of logical cores, but in specific cases it may well be different. This is easy to test on your specific system/program.

]]>
By: Jes Vestervang Jensenhttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-388483 Fri, 16 Sep 2016 08:35:09 +0000 http://undocumentedmatlab.com/?p=6516#comment-388483 I meant to say “Doubling to the amount workers to the number of logical cores”. :-)

]]>
By: Jes Vestervang Jensenhttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-388482 Fri, 16 Sep 2016 08:33:23 +0000 http://undocumentedmatlab.com/?p=6516#comment-388482 My experience is that only having workers corresponding to the physical cores does not cause the processor (several i7 and a pair of Xeon e5-2630v2) to clock up to full speed. Doubling to the amount of logical cores, however, causes the processor to increase its clock speed to the highest. It may very well be a bad performance metric, but I haven’t looked in to the actual performance.

Yair, can you elaborate a bit on less intuitive notion that the best performance is achieved with lower than maximum clock frequency?

I have used Intel’s performance counter monitor on Linux to see what’s going on:
https://software.intel.com/en-us/articles/intel-performance-counter-monitor

]]>
By: Sean de Wolskihttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-387215 Fri, 02 Sep 2016 16:27:17 +0000 http://undocumentedmatlab.com/?p=6516#comment-387215 I think the example above, where you want to run multiple functions, might be better fulfilled by parfeval.

]]>
By: Sam Robertshttps://undocumentedmatlab.com/blog_old/a-few-parfor-tips#comment-382515 Thu, 07 Jul 2016 12:30:20 +0000 http://undocumentedmatlab.com/?p=6516#comment-382515 Yair, I’m pretty sure that if you find a case where parpool is using the number of logical (rather than physical) cores as its default number of workers, that’s a bug. As you mentioned in your section on hyperthreading, there are good reasons why it should be the number of physical cores.

I’ve always felt that the documentation could be clearer on that area: perhaps even providing a little tutorial on what hyperthreading is, the difference between logical and physical cores, and why it’s very likely that the number of physical cores is the relevant number.

I know I’ve answered quite a few questions on StackOverflow that were based on misunderstandings related to these topics. I’ve also worked with customers who were confused when their machine appeared to working at full tilt, but Task Manager showed each core at half-potential, because they had hyperthreading switched on.

]]>