Explicit multi-threading in Matlab part 4

In the past weeks, I explained how we can start asynchronous threads to run in parallel to the main Matlab processing using Java, Dot-Net and C++ POSIX threads. Today I conclude the mini-series by examining two other alternatives, timer objects and process-spawning. As we shall see below, these are not “real” multi-threading alternatives, but they can indeed be important in certain use-cases.

Matlab timers

Multithreading helps application performance in two related but distinct ways:

  • By allowing code to run in parallel, on different CPUs or cores
  • By allowing code to run asynchronously, rather than in serial manner

C++, Java and .Net threads can improve performance by both of these manners. Matlab timers, on the other hand, only enable the second option, of running code asynchronously. The reason for this is that all M-code, including timer callback code, is executed by Matlab’s interpreter on a single processing thread (MT).

So, while a timer callback executes, no other M-code can run. This may seem on the face of it to be unhelpful. But in fact, the ability to schedule a Matlab processing task for later (non-serial) invocation, could be very handy, if we can time it so that the timer callback is triggered when the application is idle, for example, waiting for user input, following complex GUI update, or during late hours of the night.

I continue using last weeks’ example, where we compute some data, save it to file on a relatively slow USB/network disk, and then proceed with another calculation. The purpose of multi-threading would be to offload the I/O onto a separate thread, so that the Matlab computation can continue in parallel without needing to wait for the slow I/O. Here is an implementation of our asynchronous I/O example, this time using Matlab timers. First we define the timer’s callback function, using pure M-code (this is the Matlab equivalent of the run() method in the previous examples):

function timerCallback(hTimer,eventData,filename,data)
    try
        fid = fopen('F:\test.data','w');
        fwrite(fid,data,'double');
        fclose(fid);
    catch
        err = lasterror;
        fprintf(2,'Error saving to file:\n%s\n',err.message);
    end
end

We can now use this timer in Matlab, similarly to our previous examples:

data = rand(5e6,1);  % pre-processing (5M elements, ~40MB)
timerFcn = {@timerCallback,'F:\test.data',data};
start(Timer('StartDelay',2, 'TimerFcn',timerFcn)); % start after 2sec
data = fft(data);  % post-processing (timer I/O will run later!)

The difference vs. our earlier examples is that the timer code is not run in parallel to the fft post-processing, but rather 2 seconds later, when MT is hopefully idle.

I often employ Matlab timers in GUIs: the initial GUI is presented to the user immediately, and then a timer is used to load data from I/O, something which could take long seconds. During this time the user can peruse the GUI, getting a feeling of improved responsiveness compared to a situation of having to wait all those long seconds for the GUI to initially load. This relates to the interesting topics of perceived performance and lazy/delayed evaluation. Matlab timers are certainly under-appreciated for their performance usefulness.

As a related usage, GUI callbacks should be designed to be as short as possible. The quicker the callback, the more responsive the GUI. Users may lose patience with long callback execution. Unfortunately, callbacks sometimes need to perform a lengthy computation or update (for example, updating an Excel file). In such a case, consider delegating the lengthy update task to a timer object that will execute asynchronously, and enable the synchronous callback to complete its work much quicker. This will ensure better GUI responsiveness without affecting the actual program logic. Here is a simple example (we may wish to specify some more timer properties – the snippet below is only meant for illustration):

% A utility function that performs a lengthy calculation/update
function utilityFcn()
    % some lengthy calculation/update done here
end
 
% Regular callback function – takes a long time to complete
function myCallbackFcn(varagin)
    % Call the utility function directly (synchronously)
    utilityFcn();
end
 
% A better callback – completes much faster, using asynchronous timer
function myCallbackFcn(varagin)
    % Start an asynchronous timer to perform the lengthy update
    start(timer('StartDelay',0.5, 'TimerFcn',@utilityFcn));
end

Similarly, when plotting real-time data, we can employ a timer to periodically update the graph in near-real-time, enabling the main processing to work in near-parallel.

Matlab timers have an advantage over Java/C++/.Net multithreading in their synchronization with Matlab, since the M-code interpreter is single-threaded. We just need to handle cases where a timer callback might interrupt other M-code.

Matlab timers run pure M-code, so there is no need to know Java/C#/C++ or to use external compilers, and they are easy to set up and use. They can be very effective when tasks can be postponed asynchronously to when the MT is idle.

Spawning external processes

In some cases, it is impractical to create additional processing threads. For example, we might only have the processing element in executable binary format, or we might wish to use a separate memory space for the processing, to sandbox (isolate) it from the main application. In such cases, we can spawn heavyweight processes (as opposed to lightweight threads), either directly from within Matlab, or externally.

The simplest way to spawn an external process in Matlab is using the system function. This function accepts a string that will be evaluated in the OS prompt (shell), at Matlab’s current folder. By appending a ‘&’ character to the end of the string, we let Matlab return immediately, and the spawned process will run asynchronously (in parallel to Matlab); otherwise, Matlab will block until the spawned process ends (i.e., synchronous invocation of the process).

system('program arg1 arg2');    % blocking, synchronous
system('program arg1 arg2 &');  % non-blocking, asynchronous

Matlab normally uses only a single core on a single CPU, except when using the Parallel Computing Toolbox or when doing some implicit parallelization of vectorized code. Therefore, on a quad-core dual-CPU machine, we would normally see Matlab’s CPU usage at only 1/(2*4)=12%. The simplest way to utilize the unused CPU cores without PCT is to spawn additional Matlab processes. This can be done using the system function, as above. The spawned Matlab sessions can be made to run specific commands or functions. For example:

system('matlab –r "for idx=1:100, doSomething(idx); end" &');
system(['matlab –r "processFile(' filename ');" &']);

At this point, we may possibly wish to use processor affinity to ensure that each process runs on a separate CPU. Different OSes have different ways of doing this. For example, on Windows it can easily be done using Process Explorer’s context menu.

When Matlab spawns an external process, it passes to it the set of environment variables used in Matlab. This may be different than the set that is normally used when running the same process from the OS’s command prompt. This could lead to unexpected results, so care should be taken to update such environment variables in Matlab before spawning the process, if they could affect its outcome.

Once an asynchronous (non-blocking) process is started, Matlab does not provide a way to synchronize with it. We could of course employ external signals or the state or contents of some disk file, to let the Matlab process know that one or more of the spawned processes has ended. When multiple processes are spawned, we might wish to employ some sort of load balancing for optimal throughput.

We can use OS commands to check if a spawned processId is still running. This ID is not provided by system so we need to determine it right after spawning the process. On Unix systems (Linux and Mac), both of these can be done using a system call to the OS’s ps command; on Windows we can use the tasklist or wmic commands.

An alternative is to use Java’s built-in process synchronization mechanism, which enables more control over a spawned process. The idea is to spawn an external asynchronous process via Java, continue the Matlab processing, and later (if and when needed) wait for the external process to complete:

runtime = java.lang.Runtime.getRuntime();
process = runtime.exec('program arg1 arg2');  % non-blocking
% Continue Matlab processing in parallel to spawned process

When we need to collect scalar results, we could use the process’ result code:

rc = process.waitFor();    % block Matlab until external program ends
rc = process.exitValue();  % fetch an ended process' return code

Or, if we need to abandon the work, we could stop the spawned process:

process.destroy();         % force-kill the process (rc will be 1)

While this mechanism enables synchronization of the Matlab and external process at the basic execution level, it does not enable synchronization of the data. Doing this between processes (that have independent memory spaces) is much harder (and slower) than it is between threads (that share their memory) or MEX. For inter-process data synchronization (known as IPC, or Inter-Process Communication), we can use shared memory, named pipes or data files. There are various mechanisms and libraries that enable this using C++ and Java that could be used in Matlab. Examples of memory sharing are Joshua Dillon’s sharedmatrix and Kevin Stone’s SharedMemory utilities, which use POSIX shared-memory and the Boost IPC library (SharedMemory is an improved version of sharedmatrix). Rice University’s TreadMarks library is another example of a shared-memory approach that has been used with Matlab, in the MATmarks package (whose current availability is unclear).

Named pipes can be used on Unix systems (Linux and Mac). In this case, the source process sends information to the pipe and the destination process reads from it. After setting up the pipe in the OS, it can be opened, updated and closed just like any other data file. Unfortunately, this mechanism is not generally used on Windows.

Matlab includes a dedicated doc-page showing how to synchronize inter-process data using disk files. An approach that combines memory sharing and files is use of memory-mapped files on R2008a or newer (memmapfile was buggy before then, so I suggest not using it on earlier releases).

Finally, we can use Matlab’s documented ability to serve as a COM/DCOM (automation) server to communicate with it from the external process via the COM interface. Data can be exchanged and Matlab functionality can be invoked by the process.

Followup – MEX functions in R2014a

A few weeks ago I reported that hundreds of internal MEX functions that were previously available and which were enormously useful for a variety of use-cases, have been removed in the R2014a pre-release. Now that the official R2014a has been released, I am happy to report that most of the more-important MEX functions have been restored in the official release (see details in the article addendum), perhaps in some part due to lobbying by yours truly and by others.

MathWorks should be commended for their meaningful dialog with users and for making the fixes in such a short turn-around before the official release, despite the fact that they belong to the undocumented netherworld. MathWorks may appear superficially to be like any other corporate monolith, but when you scratch the surface you discover that there are people there who really care about users, not just the corporate bottom line. I really like this aspect of their corporate culture. I wish all software developers were as receptive to user input as MathWorks is.

Categories: Low risk of breaking in future versions, Stock Matlab function

Tags: ,

Bookmark and SharePrint Print

28 Responses to Explicit multi-threading in Matlab part 4

  1. Thanks Yair for the great series on multi-threading.

    I usually spawn a new Matlab process and then use sockets to communicate between the Matlab processes (with the msocket package, which serializes Matlab objects so they can be sent via the socket). I have used this approach successfully for running experiments, where one Matlab process does the data collection (e.g. from a motion capture device), while the other does the graphics.

  2. Thanks for the kind words, Yair. Everybody involved really appreciated hearing your kudos. Just as importantly, thanks for getting in touch with me so that we could address these issues before they impacted many more users!

  3. Dan says:

    Yair,
    As always, thank you for the great info… I’ve got a quick question for you. I’m trying to achieve a poor-man’s parfor. I’ve written a function which I’ve compiled into a standalone executable, and it does have a GUI component (basically a graph showing progress). Right now I’m working with using the java.lang.Runtime approach to launch the standalone, and I can launch multiple instances. But, as soon as the second instance starts, it appears to halt the execution of the first. I don’t see any case where the two instances would be using the same file resources or anything. Is this something to do with the use of the java calling implementation? Any suggestions on a better way to achieve my goal?

    Thanks,
    Dan

    • Dan says:

      Yair,
      Just wanted to edit my earlier question… It turns out that the issue was that there was an embedded command that gave feedback to the command window, so the various solvers were competing for access to writing responses to a command window that doesn’t exist.

      Dan

    • @Dan – I’m glad this blog provided you with the necessary inspiration to solve your problem… :-)

  4. Dan says:

    Yair,
    One more question on this topic. I’ve been using the runtime.exec approach to launch my background processes. Do you know how to launch the new process at lower priority? I’m trying to do the equivalent of start /low "C:\MATLAB\myprog.exe" "{PathToDataFile}

    Thanks,
    Dan

    • @Dan – you cannot do this using java.lang.Runtime.exec(), but you can start the process directly via Matlab’s system command:

      system('start /low "C:\MATLAB\myprog.exe" "C:\Path\To\Data\File"')

      The drawback is that you would not be able to monitor or stop the process asynchronously unless we use the OS’s ps command (Linux and Mac) or the tasklist / wmic commands (Windows), as explained in the post above.

      An alternative is to use Java’s java.lang.ProcessBuilder class rather than java.lang.Runtime.exec(). Usage examples (that should be Matlabized) can be found here (Linux) and here (Windows).

    • Dan says:

      Yair,
      Thanks for the feedback…. First with response to your suggestion of using the system command: that was the first thing I tried. I don’t understand why but somehow the start command would pull the last entry and open the .mat file by launching Matlab! I tried several different variations on that and couldn’t get it to work. Actually I did find a workaround solution after some more searching. I use the Runtime.exec() to launch the process and then run:

      setPriorityCmd = 'wmic process where name="ping_fitter_exe.exe" CALL setpriority "low"';
      runtime.exec(setPriorityCmd);

      Even this is a bit of a workaround for what I’m really trying to achieve. What I’m trying to do is to run a batch of this executable without bogging down the workstation. What I really wanted to achieve was to query the total processor % load and decide whether or not to launch another sub-process based on that. Unfortunately the only ways I could find to access the total processor load (wmic and typeperf) are slow (at least a second to execute), so I don’t want to drop those in the checking loop. I suspect there’s a sexy way to achieve this using a timer object that runs in the background, but I wasn’t going to spend the time chasing it down.

      Thanks,
      Dan

  5. Oliver Woodford says:

    Yair, and everyone

    Yair, another great post. You mentioned data files as a method for interprocess communication, but didn’t give an example of this, so I thought I’d advertise my own utility :), batch_job:
    http://www.mathworks.com/matlabcentral/fileexchange/44077-batch-job
    It spawns extra MATLAB instances and uses the filesystem to communicate between them, as well as sharing memory using memory mapped files. Like the IP-based approaches, this approach can be extended to sharing work across CPUs that don’t share a memory address space. They only need access to a common filesystem, e.g. CPUs on different computers with a networked file server. Using the filesystem might seem slow, but if your computation is quite long compared to the amount of data it produces, then the write times are not significant.

    Best wishes,
    Oliver

    • @Oliver hi – I was aware of your batch_job utility but for some reason I simply forgot to add the reference in the main text, so thanks for plugging-in!

  6. Dr. Michael Scholz says:

    Dear Yair,
    many thanks for your tutorial on multi-threading in matlab, which helped me substancially in solving my process control problem for live data analysis. I needed to launch a matlab function (with parameters) in a second matlab instance. Since “system” doesn’t give me a handle to later stop the spawned process, java Runtime was an option. The java runTime exec solution however blocks on IO when matlab is called without desktop (i.e. the good old stdin/out/err problem on background processes). I wish to point you to Brian Lau’s processManager toolbox (https://github.com/brian-lau/MatlabProcessManager) with which you have full control over the spawned processes, including IO redirection.

    Best wishes,
    Michael Scholz

  7. stefan says:

    Great post Yair.

    To anyone exploring using the “Spawning external processes” approach with Matlab scripts/functions, let me add some advice. First off, this approach should be considered LAST RESORT. Here are some points that may make it feasible for you to use

    1. Use the matlab program switches to your advantage!
    The list of interesting ones are:
    -nosplash (disables the Logo window when Matlab starts up, including a small idle delay)
    -nodesktop (disables the Matlab desktop environment, giving a minimalistic interface. All Java and editor functionalities still intact)
    -nojvm (disables the java virtual machine, giving access to only the core Matlab computational engine. All Java functions disappear, but Matlab starts up in less than a second)

    Without this, it is almost outright unusable. Switches you will probably ALWAYS add are -nosplash and -nodesktop. This will make your code look something like this:

    system('matlab -nosplash -nodesktop –r "myFile.m" ');

    Most likely, you would like to paralellize your tasks so that the new instances of Matlab you start up, do not require any Java machine. The -nojvm switch makes the memory overhead of Matlab tiny, and starts up a new instance in less than a second.

    2. End your script or function with

    quit force

    3. Communicate with the rest of your program by writing to data files.
    Fancier ways exist, but this is the easiest, and the gain you get from fancier ways are insignificant to what you loose in overhead by starting new matlab instances anyway. (even with -nojvm).

    4. learn from your experience with this approach, and never suggest anyone else to do it :)

  8. stefan karlsson says:

    Consider replacing calls such as:

    start(timer('StartDelay',0.5, 'TimerFcn',@utilityFcn));

    with this:

    start(timer('StartDelay',0.5, 'TimerFcn',@utilityFcn, 'StopFcn', @(obj,~) delete(obj)));

    If not, repeated calls will lock the available timers for your system, until such a point as the timer function will stop working.

  9. Mark G says:

    Hi Yair,

    Thanks for sharing your immense knowledge of MATLAB.

    I was bitten recently by one behavior of timer callbacks that I haven’t seen documented anywhere. Suppose a long processing task is initiated by a UI callback, but has frequently-occurring drawnow or pause(0.01) commands. In this case, a periodic timer callback can run at those opportunities.

    Now suppose that same long processing task is initiated via a single-shot timer (similar to the example you provided in this article). In this case the task is running in the timer callback rather than the UI event callback. It turns out that the other periodic timer is completely blocked until the long task launched by the single-shot timer has entirely finished. Even long pause commands don’t allow the periodic timer task to run. This “serialization” of asynchronous tasks was a surprise to me.

    It turns out that a timer task, even if sleeping, also blocks the main MATLAB processing thread. This is R2012b, by the way; perhaps this behavior has since been changed.

    I’d be interested to know if there’s a way to launch a task asynchronously (e.g. from a timer callback) so that the new task (assuming it contains pauses) is itself interruptible.

    Keep up the great work,
    Mark

    • stefan karlsson says:

      I’ve had similar problems. I believe it is hardware specific, in that it is how the Java engine is working on different machines. Maybe someone can tell us more?

      To solve issues like this, I currently do something like this, which is ugly:

      In my timer callback, (our little friend that contains lots of pauses and drawnows), I do the following:

      start(timer('StartDelay',asLongAsItTakes, ...
                  'TimerFcn', {@(~,~,p) TaskToRun(p), SomeExtraDataToSend}, ... 
                  'StopFcn' ,  @(obj,~) delete(obj)));

      TaskToRun is your task, and “asLongAsItTakes” is to wait for the subsystem that handles these things (I guess some part of the java machine) to be able to handle it properly.

      on the systems it has worked for me, I have never had to put asLongAsItTakes to longer than a second. Would love a better solution…

      good luck

    • stefan karlsson says:

      Just to add to clarity of my previous reply. I am running with Matlab 2013a.

      Most of the time you can replace my ugly code, from previous reply, with just:

           TaskToRun(SomeExtraDataToSend);

      … and “TaskToRun” will be interuptible. There are some rare occasions when TaskToRun is not interuptible, notably when being invoked from a timer callback that experiences lots of interuptions itself. The problem can appear as other timers being blocked, or as user-interface callbacks that wont be executed (all depending on your application)

      In these rare occasions, you can try my ugly approach which has worked for me. Hopefully someone will post a better solution.

    • @Stefan – thanks. For clarification, I believe that in your second comment you meant this:

      start(timer('StartDelay', asLongAsItTakes, ...
                  'TimerFcn', @(~,~) TaskToRun(SomeExtraDataToSend), ...            'StopFcn',  @(obj,~) delete(obj)));
    • stefan karlsson says:

      Hi Yair,

      The use of anonomous functions together with the timer function interface callbacks make the code look…. funky.

      If both works equally well, then ofc, shorter code wins.

    • Mark G says:

      @Stefan – Good find on the matlabcentral post. That will save me from more time pursuing unlikely prospects. Thanks all.

  10. MU says:

    Hi Yair,
    Thank you for the brilliant article on multi-threading.
    I am having an issue compiling an m-file (Main.m) which uses the batch function (parallel computing toolbox) to start a Matlab worker to asynchronously write to a mapped memory (without blocking the EDT). This works okay in MATLAB 2015aSP1. However when I compile Main.m using the Matlab Application compiler, the resulting executable doesnt launch a new worker as instructed by the batch command (though the home GUI screen does launch).
    Is there a special way to compile scripts which use Parallel computing toolbox functions.
    Many thanks

    • MU says:

      Hi Yair,

      I was wondering if batch workers (parallel computing toolbox) have their own timers that we can use. Or are these disabled (much in the same way as GUI objects).

      Many thanks for your help.

    • @MU – I think you should ask MathWorks support about this – perhaps it’s a bug in the Matlab Compiler.

  11. Knut A Meyer says:

    Thanks Yair for this interesting series!
    I found the ability of having more control over the external calls appealing. Do you know if there is a command for checking if a process is completed.
    I would like to terminate the process if it takes too much time:

    runtime = java.lang.Runtime.getRuntime();
    process = runtime.exec('runsim.bat');
    process_start = tic;
    while(true)
    	pause(CompletionCheckInterval)
     	if process.isCompleted()  %I would like a command like this!
      		break;
     	elseif toc(process_start) > TimeLimit
    		process.destroy();
    		break;
    	end %if
    end %while
    • @Knut – you can use the tasklist command on Windows. For example:

      >> procId = feature('getPID');  % Matlab's own process ID
      >> str = evalc('system([''tasklist /v /fi "PID eq '' num2str(procId) ''" /FO LIST'']);')
      str = 
      Image Name:   MATLAB.exe 
      PID:          15212 
      Session Name: Console 
      Session#:     1 
      Mem Usage:    1,314,408 K 
      Status:       Running 
      User Name:    Thinkpad-E530\BMPA 
      CPU Time:     0:11:52 
      Window Title: MATLAB R2016b

      Alternately, you can use System.Diagnostics.Process.GetProcessById(procId) to determine whether a process is running or not, and if it is then you can also retrieve some process properties from the returned data object:

      try
         procObject = System.Diagnostics.Process.GetProcessById(procId);
         processName = char(procObject.ProcessName);  % procObj can be queried 
         isRunning = true;
      catch
         isRunning = false;
      end

      For Linux/MacOS, see here or simply use the ps (rather than tasklist) command with system.

    • Knut A Meyer says:

      Thanks for the quick reply Yair!
      I didn’t get how you identify the process ID of the called process though…? Perhaps something very easy I’m missing?
      But, I found a solution in the spirit of your last code example, which seems to be working:

      function [out, tlimexceeded] = runscript(script, settings)
      out = 0;
      tlimexceeded = 0;
      runtime = java.lang.Runtime.getRuntime();
      process = runtime.exec(script);
      process_start = tic;
       
      while(1) %Check for completion
          pause(settings.CompletionCheckInterval)
          try
              out = process.exitValue(); %Throws IllegalThreadStateException if process not completed
              break;
          catch
              %Process still running
          end %try-catch
       
          if (toc(process_start) > settings.TimeLimit)
              process.destroy();
              fprintf('Time limit exceeded\n');
              tlimexceeded = 1;
              out = 0;
          end
      end %while
  12. Pingback: Matlab: Asynchronous Program Flow | Notes

Leave a Reply


Your email address will not be published. Required fields are marked *