Explicit multi-threading in Matlab part 1

One of the limitations of Matlab already recognized by the community, is that it does not provide the users direct access to threads without the PCT (Parallel Computing Toolbox). For example, letting some expensive computations or I/O to be run in the background without freezing the main application. Instead, in Matlab there is either implicit multiprocessing which relies on built-in threading support in some MATLAB functions, or explicit multiprocessing using PCT (note: PCT workers use heavyweight processes, not lightweight threads). So the only way to achieve truly multi-threading in Matlab is via MEX, Java or .Net, or by spawning external standalone processes (yes, there are a few other esoteric variants – don’t nit-pick).

Note that we do not save any CPU cycles by running tasks in parallel. In the overall balance, we actually increase the amount of CPU processing, due to the multi-threading overhead. However, in the vast majority of cases we are more interested in the responsivity of Matlab’s main processing thread (known as the Main Thread, Matlab Thread, or simply MT) than in reducing the computer’s total energy consumption. In such cases, offloading work to asynchronous C++, Java or .Net threads could remove bottlenecks from Matlab’s main thread, achieving significant speedup.

Today’s article is a derivative of a much larger section on explicit multi-threading in Matlab, that will be included in my upcoming book MATLAB Performance Tuning, which will be published later this year. It is the first in a series of articles that will be devoted to various alternatives.

Sample problem

In the following example, we compute some data, save it to file on a relatively slow USB/network disk, and then proceed with another calculation. We start with a simple synchronous implementation in plain Matlab:

tic
data = rand(5e6,1);  % pre-processing, 5M elements, ~40MB
fid = fopen('F:\test.data','w');
fwrite(fid,data,'double');
fclose(fid);
data = fft(data);  % post-processing
toc
 
Elapsed time is 9.922366 seconds.

~10 seconds happens to be too slow for our specific needs. We could perhaps improve it a bit with some fancy tricks for save or fwrite. But let’s take a different approach today, using multi-threading:

Using Java threads

Matlab uses Java for numerous tasks, including networking, data-processing algorithms and graphical user-interface (GUI). In fact, under the hood, even Matlab timers employ Java threads for their internal triggering mechanism. In order to use Java, Matlab launches its own dedicated JVM (Java Virtual Machine) when it starts (unless it’s started with the -nojvm startup option). Once started, Java can be directly used within Matlab as a natural extension of the Matlab language. Today I will only discuss Java multithreading and its potential benefits for Matlab users: Readers are assumed to know how to program Java code and how to compile Java classes.

To use Java threads in Matlab, first create a class that implements the Runnable interface or extends java.lang.Thread. In either case we need to implement at least the run() method, which runs the thread’s processing core.

Now let us replace the serial I/O with a very simple dedicated Java thread. Our second calculation (fft) will not need to wait for the I/O to complete, enabling much faster responsiveness on Matlab’s MT. In this case, we get a 58x (!) speedup:

tic
data = rand(5e6,1);  % pre-processing (5M elements, ~40MB)
javaaddpath 'C:\Yair\Code\'  % path to MyJavaThread.class
start(MyJavaThread('F:\test.data',data));  % start running in parallel
data = fft(data);  % post-processing (Java I/O runs in parallel)
toc
 
Elapsed time is 0.170722 seconds.   % 58x speedup !!!

Note that the call to javaaddpath only needs to be done once in the entire Matlab session, not repeatedly. The definition of our Java thread class is very simple (real-life classes would not be as simplistic, but the purpose here is to show the basic concept, not to teach Java threading):

import java.io.DataOutputStream;
import java.io.FileOutputStream;
public class MyJavaThread extends Thread
{
    String filename;
    double[] doubleData;
    public MyJavaThread(String filename, double[] data)
    {
        this.filename = filename;
        this.doubleData = data;
    }
    @Override
    public void run()
    {
        try
        {
            DataOutputStream out = new DataOutputStream(
                                     new FileOutputStream(filename));
            for (int i=0; i < doubleData.length; i++)
            {
                out.writeDouble(doubleData[i]);
            }
            out.close();
        } catch (Exception ex) {
            System.out.println(ex.toString());
        }
    }
}

Note: when compiling a Java class that should be used within Matlab, as above, ensure that you are compiling for a JVM version that is equal to, or lower than Matlab’s JVM, as reported by Matlab’s version function:

% Matlab R2013b uses JVM 1.7, so we can use JVMs up to 7, but not 8
>> version –java
ans =
Java 1.7.0_11-b21 ...

Matlab synchronization

Java (and C++/.Net) threads are very effective when they can run entirely independently from Matlab’s main thread. But what if we need to synchronize the other thread with Matlab's MT? For example, what if the Java code needs to run some Matlab function, or access some Matlab data? In MEX this could be done using the dedicated and documented MEX functions; in Java this can be done using the undocumented/unsupported JMI (Java-Matlab Interface) package. Note that using standard Java Threads without Matlab synchronization is fully supported; it is only the JMI package that is undocumented and unsupported.

Here is the relevant code snippet for evaluating Matlab code within a Java thread:

import com.mathworks.jmi.Matlab;  //in %matlabroot%/java/jar/jmi.jar
...
Matlab matlabEngine = new Matlab();
...
Matlab.whenMatlabReady(runnableClass);

Where runnableClass is a class whose run() method includes calls to com.mathworks.jmi.Matlab methods such as:

matlabEngine.mtEval("plot(data)");
Double value = matlabEngine.mtFeval("min",{a,b},1); //2 inputs 1 output

Unfortunately, we cannot directly call matlabEngine's methods in our Java thread, since this is blocked in order to ensure synchronization Matlab only enables calling these methods from the MT, which is the reason for the runnableClass. Indeed, synchronizing Java code with MATLAB could be quite tricky, and can easily deadlock MATLAB. To alleviate some of the risk, I advise not to use the JMI class directly: use Joshua Kaplan's MatlabControl class, a user-friendly JMI wrapper.

Note that Java's native invokeAndWait() method cannot be used to synchronize with Matlab. M-code executes as a single uninterrupted thread (MT). Events are simply queued by Matlab's interpreter and processed when we relinquish control by requesting drawnow, pause, wait, waitfor etc. Matlab synchronization is robust and predictable, yet forces us to use the whenMatlabReady(runnableClass) mechanism to add to the event queue. The next time drawnow etc. is called in M-code, the event queue is purged and our submitted code will be processed by Matlab's interpreter.

Java threading can be quite tricky even without the Matlab synchronization complexity. Deadlock, starvation and race conditions are frequent problems with Java threads. Basic Java synchronization is relatively easy, using the synchronized keyword. But getting the synchronization to work correctly is much more difficult and requires Java programming expertise that is beyond most Java programmers. In fact, many Java programmers who use threads are not even aware that their threads synchronization is buggy and that their code is not thread-safe.

My general advise is to use Java threads just for simple independent tasks that require minimal interactions with other threads, Matlab engine, and/or shared resources.

Additional alternatives and musings

In addition to Java threads, we can use other technologies for multi-threading in Matlab: Next week's article will explore Dot-Net (C#) threads and timers, and that will be followed by a variety of options for C++ threads and spawned-processes IPC. So don't let anyone complain any longer about not having explicit multi-threading in Matlab. It's not trivial, but it's also not rocket science, and there are plenty of alternatives out there.

Still, admittedly MT's current single-threaded implementation is a pain-in-the-so-and-so, relic of a decades-old design. A likely future improvement to the Matlab M-code interpreter would be to make it thread-safe. This would enable automatic conversion of for loops into multiple threads running on multiple local CPUs/cores, significantly improving Matlab's standard performance and essentially eliminating the need for a separate parfor in PCT (imagine me drooling here). Then again, this might reduce PCT sales...

Advanced Matlab Programming course – London 10-11 March, 2014

If Matlab performance interests you, consider joining my Advanced Matlab Programming course in London on 10-11 March, 2014. In this course/seminar I will explore numerous other ways by which we can improve Matlab's performance and create professional code. This is a unique opportunity to take your Matlab skills to a higher level within a couple of days. Registration closes this Friday, so don't wait too long.

Related posts:

  1. Explicit multi-threading in Matlab part 2 Matlab performance can be improved by employing .Net (C#, VB, F# or C++) threads. ...
  2. Explicit multi-threading in Matlab part 3 Matlab performance can be improved by employing POSIX threads in C/C++ code. ...
  3. Explicit multi-threading in Matlab part 4 Matlab performance can be improved by employing timer objects and spawning external processes. ...
  4. Multi-line uitable column headers Matlab uitables can present long column headers in multiple lines, for improved readability. ...
  5. JMI wrapper – local MatlabControl part 2 An example using matlabcontrol for calling Matlab from within a Java class is explained and discussed...
  6. Multi-line tooltips Multi-line tooltips are very easy to set up, once you know your way around a few undocumented hiccups....

Categories: Java, Low risk of breaking in future versions

Tags: , , ,

Bookmark and SharePrint Print

11 Responses to Explicit multi-threading in Matlab part 1

  1. Thierry Dalon says:

    Hi Yair
    Nice post!
    Another possibility you haven’t mentioned for multi-threading is also to run a new Matlab instance from Matlab.

    • @Thierry – I did mention “spawning external standalone processes” in my opening paragraph. Just note that it is not multi-threading but rather multi-processing. There’s a wide variety of things that you can do by spawning external processes, but it will always be less efficient to spawn an external heavyweight process than an in-process thread, not to mention the fact that it is harder to synchronize the data and coordinate execution. Perhaps I’ll dedicate a special post about spawning external processes, but this is a wide topic that opens the way to Matlab parallelization alternatives, and this could take me a full year of posts, so I guess I need to stop somewhere…

  2. oro77 says:

    I guess it is different but what about the Matlab Parallel Toolbox ? Can it be compared to the Java thread you explain in your article ?

    • @Oro77 – PCT is different in many respects:

      1. PCT costs $$$, multi-threading is free
      2. PCT is much easier to use than creating multi-threaded classes that need to be compiled, debugged etc.
      3. PCT enables easy integration/synchronization with Matlab data & execution; multi-threading does not (at least not easily)
      4. PCT is supported by MathWorks, multithreading is your own code that nobody will support for you
      5. PCT uses spawned Matlab processes (headless workers – Matlab processes that simply have no GUI); multi-threading uses much lighter and more efficient threads

      It is not that one is generally better than the other – both are good, for different use-cases. Depending on your specific needs you can select either one or the other (or both).

    • oro77 says:

      Thank you for your complete comment on PCT :)

  3. Eric says:

    Very intriguing Yair… :) Have you tried writing .MAT files in a background Java thread? If so what .MAT library did you use. This could be very handy functionality in certain circumstances!

  4. Amro says:

    You briefly mentioned doing multithreading in MEX-functions. I just wanted to clarify that the MEX API is *not* thread-safe. So while it is possible to spawn threads in your MEX-files and perform independent computations, you should never call any mx*/mex* functions from those threads, and should be restricted to the main running thread of the MEX-function.

    Here is an example of multithreaded C/C++ using simple OpenMP compiler directives: http://www.walkingrandomly.com/?p=1795 .

Leave a Reply

Your email address will not be published. Required fields are marked *

*

<pre lang="matlab">
a = magic(3);
sum(a)
</pre>