Using Infiniband with Matlab Parallel Computing Toolbox

I would like to welcome guest blogger Brock Palen, who is the Associate Director for Advanced Research Computing at the University of Michigan. Brock worked in High Performance computing since 2004 and is also one half of the Research Computing podcast; Research Computing and Engineering. You can find him blogging at failureasaservice.com. This is an updated repost of Brock’s article on UMich’s Flux HPC blog. Additional information on the Parallel Computing Toolbox can be found in my book Accelerating MATLAB Performance.

In High Performance Computing (HPC) there are a number of network types commonly used, among these are: Ethernet, the common network found on all computer equipment. Infiniband, a specialty high performance low latency interconnect common on commodity clusters. There are also several propriety types and a few other less common types but I will focus on Ethernet and Infiniband.

Ethernet and really its mate protocol, TCP, are the most common supported MPI networks. Almost all computer platforms support this network type and can be as simple as using your home network switch. It is ubiquitous and easy to support. Networks like Infiniband though require special drivers, uncommon hardware but the effort is normally worth it.

The MATLAB Parallel Computing Toolbox provides a collection of functions that allow MATLAB users to utilize multiple compute nodes to work on larger problems. Many may not realize that MathWorks chose to use the standard MPI routines to implement this toolbox. MathWorks also chose, for ease of use, to ship MATLAB with the MPICH2 MPI library, and the version they use only supports Ethernet for communication between nodes.

Unfortunately, Ethernet is about the slowest common network used in parallel applications. The question is how much can this impact performance.

Mmmmm Data:

The data was generated on 12 nodes of Xeon x5650 total 144 cores. The code was the stock MATLAB paralleldemo_backslash_bench(1.25) from MATLAB 2013b. You can find my M-code at Gist.

The data shows two trends: the first is that independently of the network type, many parallel algorithms do not scale unless the amount of data for each core to work on is sufficiently large. In this case, for Ethernet especially, the peak performance is never reached. What should be really noted though is that without Infiniband, at many problem sizes, over half of the performance of the nodes is lost. The second trend is that the network type really matters.

How to have MATLAB use Infiniband?

MathWorks does not ship an MPI library with the parallel computing toolbox that can use infiniband by default. This is reasonable, I would be curious how large the average PCT cluster is, and/or how big the jobs ran on the toolbox are. Lucky for us MathWorks allows a way for introducing your own MPI library. Let me be the first to proclaim:

Thank you MathWorks for adding mpiLibConf.m as a feature.
— Brock Palen

A simple example (note that MATLAB’s local scheduler also uses mpiLibConf, so we need to check for this case):

function [lib, extras] = mpiLibConf
%MATLAB MPI Library overloading for Infiniband and Ethernet Networks
%
%USAGE
%   place in ~/matlab/mpiLibConf.m
%   Update to point to your MPICH / Intel MPI etc location
 
% Check first if we're running the local scheduler - if we are, then get the default and exit
dfcn = getenv('MDCE_DECODE_FUNCTION');
if strcmp(dfcn, 'parallel.internal.decode.localMpiexecTask')
    % Get the local scheduler's default libs
    [lib, extras] = distcomp.mpiLibConfs('default');
else
    % We're not running the local scheduler or using the default MATLAB libmpich
    lib = '/home/software/rhel6/mvapich2/1.8/lib/libmpich.so';
 
    % mvapich has two extra libraries libmpl.so and libopa.so
    %  use # ldd /home/software/rhel6/mvapich2/1.8/lib/libmpich.so
    %   Any libraries from the mpich/mvapich install location need to be included in extras
    extras = {'/home/software/rhel6/mvapich2/1.8/lib/libmpl.so',
              '/home/software/rhel6/mvapich2/1.8/lib/libopa.so'};
end

In the above test we used Intel MPI for the infiniband test and mpich for the ethernet test. The choice of MPI is important. The MPI standard enforces a shared API (Application programming interface), but not a shared ABI (Application binary interface). Thus the MPI library you substitute needs to match the one MATLAB is compiled against. Lucky for us they used mpich, so any mpich clone should work; MVAPICH, IntelMPI, etc.

If you are using the MATLAB Parallel Computing Toolbox on more than one node, and if your cluster has a network other than Ethernet/TCP (there are non-TCP Ethernet networks that perform very well) I highly encourage that the effort be put in to ensure you use that network.

For Flux users we have this setup, but you have to do some setup for yourself before you see the benefit. Please visit the ARC MATLAB documentation, or send us a question at hpc-support@umich.edu.

Categories: Guest bloggers, Low risk of breaking in future versions, Stock Matlab function, Undocumented feature

Tags: , , ,

Bookmark and SharePrint Print

2 Responses to Using Infiniband with Matlab Parallel Computing Toolbox

  1. Obliczone says:

    Thank you for this article, especially for source code. Very useful.

  2. Chris Marshall says:

    Thanks for the article. I’ve modified my mpiLibConf file to use a local mvpapich2 lib (2.1) that I’ve compiled using gcc. Trouble is matlab stops with a signal 11. Is there a way of getting this to work with gcc or do I need to use the Intel compiler?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *