One of the limitations of Matlab already recognized by the community, is that it does not provide the users direct access to threads without the PCT (Parallel Computing Toolbox). For example, letting some expensive computations or I/O to be run in the background without freezing the main application. Instead, in Matlab there is either implicit multiprocessing which relies on built-in threading support in some MATLAB functions, or explicit multiprocessing using PCT (note: PCT workers use heavyweight processes, not lightweight threads). So the only way to achieve truly multi-threading in Matlab is via MEX, Java or .Net, or by spawning external standalone processes (yes, there are a few other esoteric variants – don’t nit-pick).
Note that we do not save any CPU cycles by running tasks in parallel. In the overall balance, we actually increase the amount of CPU processing, due to the multi-threading overhead. However, in the vast majority of cases we are more interested in the responsivity of Matlab’s main processing thread (known as the Main Thread, Matlab Thread, or simply MT) than in reducing the computer’s total energy consumption. In such cases, offloading work to asynchronous C++, Java or .Net threads could remove bottlenecks from Matlab’s main thread, achieving significant speedup.
Today’s article is a derivative of a much larger section on explicit multi-threading in Matlab, that will be included in my upcoming book MATLAB Performance Tuning, which will be published later this year. It is the first in a series of articles that will be devoted to various alternatives.
Sample problem
In the following example, we compute some data, save it to file on a relatively slow USB/network disk, and then proceed with another calculation. We start with a simple synchronous implementation in plain Matlab:
tic data = rand(5e6,1); % pre-processing, 5M elements, ~40MB fid = fopen('F:\test.data','w'); fwrite(fid,data,'double'); fclose(fid); data = fft(data); % post-processing toc Elapsed time is 9.922366 seconds. |
~10 seconds happens to be too slow for our specific needs. We could perhaps improve it a bit with some fancy tricks for save or fwrite. But let’s take a different approach today, using multi-threading:
Using Java threads
Matlab uses Java for numerous tasks, including networking, data-processing algorithms and graphical user-interface (GUI). In fact, under the hood, even Matlab timers employ Java threads for their internal triggering mechanism. In order to use Java, Matlab launches its own dedicated JVM (Java Virtual Machine) when it starts (unless it’s started with the -nojvm startup option). Once started, Java can be directly used within Matlab as a natural extension of the Matlab language. Today I will only discuss Java multithreading and its potential benefits for Matlab users: Readers are assumed to know how to program Java code and how to compile Java classes.
To use Java threads in Matlab, first create a class that implements the Runnable
interface or extends java.lang.Thread. In either case we need to implement at least the run() method, which runs the thread’s processing core.
Now let us replace the serial I/O with a very simple dedicated Java thread. Our second calculation (fft) will not need to wait for the I/O to complete, enabling much faster responsiveness on Matlab’s MT. In this case, we get a 58x (!) speedup:
tic data = rand(5e6,1); % pre-processing (5M elements, ~40MB) javaaddpath 'C:\Yair\Code\' % path to MyJavaThread.class start(MyJavaThread('F:\test.data',data)); % start running in parallel data = fft(data); % post-processing (Java I/O runs in parallel) toc Elapsed time is 0.170722 seconds. % 58x speedup !!! |
Note that the call to javaaddpath only needs to be done once in the entire Matlab session, not repeatedly. The definition of our Java thread class is very simple (real-life classes would not be as simplistic, but the purpose here is to show the basic concept, not to teach Java threading):
import java.io.DataOutputStream; import java.io.FileOutputStream; public class MyJavaThread extends Thread { String filename; double[] doubleData; public MyJavaThread(String filename, double[] data) { this.filename = filename; this.doubleData = data; } @Override public void run() { try { DataOutputStream out = new DataOutputStream( new FileOutputStream(filename)); for (int i=0; i < doubleData.length; i++) { out.writeDouble(doubleData[i]); } out.close(); } catch (Exception ex) { System.out.println(ex.toString()); } } } |
Note: when compiling a Java class that should be used within Matlab, as above, ensure that you are compiling for a JVM version that is equal to, or lower than Matlab’s JVM, as reported by Matlab’s version function:
% Matlab R2013b uses JVM 1.7, so we can use JVMs up to 7, but not 8 >> version –java ans = Java 1.7.0_11-b21 ... |
Matlab synchronization
Java (and C++/.Net) threads are very effective when they can run entirely independently from Matlab’s main thread. But what if we need to synchronize the other thread with Matlab’s MT? For example, what if the Java code needs to run some Matlab function, or access some Matlab data? In MEX this could be done using the dedicated and documented MEX functions; in Java this can be done using the undocumented/unsupported JMI (Java-Matlab Interface) package. Note that using standard Java Threads without Matlab synchronization is fully supported; it is only the JMI package that is undocumented and unsupported.
Here is the relevant code snippet for evaluating Matlab code within a Java thread:
import com.mathworks.jmi.Matlab; //in %matlabroot%/java/jar/jmi.jar ... Matlab matlabEngine = new Matlab(); ... Matlab.whenMatlabReady(runnableClass); |
Where runnableClass
is a class whose run() method includes calls to com.mathworks.jmi.Matlab
methods such as:
matlabEngine.mtEval("plot(data)"); Double value = matlabEngine.mtFeval("min",{a,b},1); //2 inputs 1 output |
Unfortunately, we cannot directly call matlabEngine
‘s methods in our Java thread, since this is blocked in order to ensure synchronization Matlab only enables calling these methods from the MT, which is the reason for the runnableClass
. Indeed, synchronizing Java code with MATLAB could be quite tricky, and can easily deadlock MATLAB. To alleviate some of the risk, I advise not to use the JMI class directly: use Joshua Kaplan’s MatlabControl
class, a user-friendly JMI wrapper.
Note that Java’s native invokeAndWait() method cannot be used to synchronize with Matlab. M-code executes as a single uninterrupted thread (MT). Events are simply queued by Matlab’s interpreter and processed when we relinquish control by requesting drawnow, pause, wait, waitfor etc. Matlab synchronization is robust and predictable, yet forces us to use the whenMatlabReady(runnableClass)
mechanism to add to the event queue. The next time drawnow etc. is called in M-code, the event queue is purged and our submitted code will be processed by Matlab’s interpreter.
Java threading can be quite tricky even without the Matlab synchronization complexity. Deadlock, starvation and race conditions are frequent problems with Java threads. Basic Java synchronization is relatively easy, using the synchronized
keyword. But getting the synchronization to work correctly is much more difficult and requires Java programming expertise that is beyond most Java programmers. In fact, many Java programmers who use threads are not even aware that their threads synchronization is buggy and that their code is not thread-safe.
My general advise is to use Java threads just for simple independent tasks that require minimal interactions with other threads, Matlab engine, and/or shared resources.
Additional alternatives and musings
In addition to Java threads, we can use other technologies for multi-threading in Matlab: Next week’s article will explore Dot-Net (C#) threads and timers, and that will be followed by a variety of options for C++ threads and spawned-processes IPC. So don’t let anyone complain any longer about not having explicit multi-threading in Matlab. It’s not trivial, but it’s also not rocket science, and there are plenty of alternatives out there.
Still, admittedly MT’s current single-threaded implementation is a pain-in-the-so-and-so, relic of a decades-old design. A likely future improvement to the Matlab M-code interpreter would be to make it thread-safe. This would enable automatic conversion of for loops into multiple threads running on multiple local CPUs/cores, significantly improving Matlab’s standard performance and essentially eliminating the need for a separate parfor in PCT (imagine me drooling here). Then again, this might reduce PCT sales…
Advanced Matlab Programming course – London 10-11 March, 2014
If Matlab performance interests you, consider joining my Advanced Matlab Programming course in London on 10-11 March, 2014. In this course/seminar I will explore numerous other ways by which we can improve Matlab’s performance and create professional code. This is a unique opportunity to take your Matlab skills to a higher level within a couple of days. Registration closes this Friday, so don’t wait too long.
Hi Yair
Nice post!
Another possibility you haven’t mentioned for multi-threading is also to run a new Matlab instance from Matlab.
@Thierry – I did mention “spawning external standalone processes” in my opening paragraph. Just note that it is not multi-threading but rather multi-processing. There’s a wide variety of things that you can do by spawning external processes, but it will always be less efficient to spawn an external heavyweight process than an in-process thread, not to mention the fact that it is harder to synchronize the data and coordinate execution. Perhaps I’ll dedicate a special post about spawning external processes, but this is a wide topic that opens the way to Matlab parallelization alternatives, and this could take me a full year of posts, so I guess I need to stop somewhere…
I guess it is different but what about the Matlab Parallel Toolbox ? Can it be compared to the Java thread you explain in your article ?
@Oro77 – PCT is different in many respects:
It is not that one is generally better than the other – both are good, for different use-cases. Depending on your specific needs you can select either one or the other (or both).
Thank you for your complete comment on PCT 🙂
Very intriguing Yair… 🙂 Have you tried writing .MAT files in a background Java thread? If so what .MAT library did you use. This could be very handy functionality in certain circumstances!
@Eric – you can use JMATIO for MAT-file I/O in Java
I did use this library. I had some issues with big MAT files. Except this problem, it is quite easy to use.
You briefly mentioned doing multithreading in MEX-functions. I just wanted to clarify that the MEX API is *not* thread-safe. So while it is possible to spawn threads in your MEX-files and perform independent computations, you should never call any mx*/mex* functions from those threads, and should be restricted to the main running thread of the MEX-function.
Here is an example of multithreaded C/C++ using simple OpenMP compiler directives: http://www.walkingrandomly.com/?p=1795 .
@Amro – thanks for the clarification, but you are providing a spoiler… My MEX C++ multithreading article will appear on March 5, as part 3 of this series.
@Yair: sorry for giving it away 🙂 Interesting articles as always, keep up the good work!
Great post Yair!
I am sure this is a stupid question (I am rather new to java), but I can’t figure it out:
I’d like to start a java thread from Matlab and continue to execute the Matlab script, analogous to what you show above.
Then, I’d like Matlab at some point in my Matlab script check whether the java thread is finished and execute some code (for instance, wait until it’s really finished to retrieve some output arguments. Any suggestions on how this could work?
Thanks a lot!
Wolfgang
@Wolfgang – if you wish to wait for the Java thread to exit, you could try to use the
Thread.join()
method. See additional/related information here.Thanks, Yair!
I can start java threads from matlab, but I am still stuck on how to get Matlab interact with them somehow. My problem is the following: How do start a java thread in Matlab and then retrieve any information about it in my Matlab script later, e.g. ask in the Matlab script whether the thread that was just started above is still running? Please, pardon my ignorance….
Any help greatly appreciated!
Wolfgang
@Yair: Don’t bother. I figured it out. Btw: Your blog is awesome!
Wolfgang
Hi Yair!
I may be a bit out of topic, but trying to add multithreading to a Matlab script I developed I have ecountered a problem I think worths to be posted and put in the public limelight. The problem regards variable transfer from client to workers when the “variable” is a Java object. I have experieced that, without multithreading, passing a Java object from a main script to a function is not a problem, while using (for instance) parfeval to implement multithreading, the object is not passed correctly resulting in the error “Attempt to reference field of non-structure array”. Something similar happens to me also trying to add multithreading exploiting a parfor cycle, therefore I suspect it may be a generalized issue with the usage of Matlab multithreading tools. May I kindly ask some clarification with this regards?
Many thanks in advance
Stefano
@Stefano – This may answer your question: http://fluffynukeit.com/tag/loadobj
(sorry for the late response, but better late than never I guess…)
Hi Yair,
So I am very new to Java and the concept of multithreading so what I might ask may be child’s play, but why does the java thread take so long on complete? I understand the matlab is now free to do other things, but the test.data file takes a very very long time to complete. Much longer than just creating and saving.
Also, is there a way to stop this thread once it is sent out?
@Ben – I don’t know what’s taking so long in your specific case. It is certainly not something general but specific to your particular implementation (perhaps the file is slow to access on some remote network drive for example?).
Anyway, you can temporarily stop a thread via its stop() method; you can terminate it via suspend(). Read more about threads here, or in any standard Java textbook. This is a Matlab blog and not a Java one so if you need more information on the Java aspects you should go elsewhere.
@Yair – So i am writing to the network and was thinking that might be the issue with the length of time, but if I write out the same file to the same network path using
it only taken 0.06 sec. compared to the Java thread that takes 15sec.
Do you have any thoughts on this?
Thank you for your post by the way, it’s awesome!
It appears the MatlabControl project has been migrated to GitHub (due to Google Code shutting down): https://github.com/jakaplan/matlabcontrol
Documentation at the Wiki: https://github.com/jakaplan/matlabcontrol/wiki
Hello Yair,
I wish to spawn a figure that shows the time while the script continues to run.
I thought figures run in their own thread so I should be able to do this with addlistener, but i am not having luck.
Do i need java? if so can you give any pointers?
Many Thanks,
@Serge – you can use a simple Matlab timer for this. Read the documentation for the timer function.
Great post, Yair. I was wondering if it is possible to interrupt the java thread from the matlab main thread, given that the java thread can catch it and exit gracefully. Is there a matlab method available to do it? Thank you very much!
@Leo – you can stop a Java thread using its stop() method:
See here for more details on Java threads.
Of course, if you create your own version of a Java thread, you can create a custom public method that signals the thread in a more elegant manner than the brute-force stop() method.
Thank you for the post. It seems to be the exact solution I was looging for but I cannot start the Thread like you showed above. I get the error message:
??? Undefined function or method 'start' for input arguments of type 'ParallelPortReaderThread'.
Error in ==> testThread at 8
start(ParallelPortReaderThread(rr_intervals));
This is my Matlab code:
And this my java class:
I’d be very thankful about any help and/or advice
Mybe important: I’m using matlab R2008a and jdk 1.6.0 (both matlab and compilation)
@Peyman – it’s probably due to one of the possible reasons that I listed here: http://undocumentedmatlab.com/blog/java-class-access-pitfalls
Thanks Yair. Clearing java solved the problem.
Great post!
I use the example you provided on Matlab 2019b, and I am getting similar results on speed. However, I noticed that if we read the file back (no matter if it is written using Matlab, or Java) they both have 40,000,000 elements in them, whereas we have written 50,000,000 elements. Kindly comment, why it is so?
Kindly ignore the previous msg, I did not mention the data type while reading the file. It works prefect. Thanks.