Low risk of breaking in future versions – Undocumented Matlab

Undocumented plot marker types

Yair Altman — Wed, 13 Mar 2019 11:05:14 +0000

I wanted to take a break from my miniseries on the Matlab toolstrip to describe a nice little undocumented aspect of plot line markers. Plot line marker types have remained essentially unchanged in user-facing functionality for the past two+ decades, allowing the well-known marker types (.,+,o,^ etc.). Internally, lots of things changed in the graphics engine, particularly in the transition to HG2 in R2014b and the implementation of markers using OpenGL primitives. I suspect that during the massive amount of development work that was done at that time, important functionality improvements that were implemented in the engine were forgotten and did not percolate all the way up to the user-facing functions. I highlighted a few of these in the past, for example transparency and color gradient for plot lines and markers, or various aspects of contour plots.

Fortunately, Matlab usually exposes the internal objects that we can customize and which enable these extra features, in hidden properties of the top-level graphics handle. For example, the standard Matlab plot-line handle has a hidden property called MarkerHandle that we can access. This returns an internal object that enables marker transparency and color gradients. We can also use this object to set the marker style to a couple of formats that are not available in the top-level object:

>> x=1:10; y=10*x; hLine=plot(x,y,'o-'); box off; drawnow;
>> hLine.MarkerEdgeColor = 'r';
 
>> set(hLine, 'Marker')'  % top-level marker styles
ans =
  1×14 cell array
    {'+'} {'o'} {'*'} {'.'} {'x'} {'square'} {'diamond'} {'v'} {'^'} {'>'} {'<'} {'pentagram'} {'hexagram'} {'none'}
 
>> set(hLine.MarkerHandle, 'Style')'  % low-level marker styles
ans =
  1×16 cell array
    {'plus'} {'circle'} {'asterisk'} {'point'} {'x'} {'square'} {'diamond'} {'downtriangle'} {'triangle'} {'righttriangle'} {'lefttriangle'} {'pentagram'} {'hexagram'} {'vbar'} {'hbar'} {'none'}

We see that the top-level marker styles directly correspond to the low-level styles, except for the low-level ‘vbar’ and ‘hbar’ styles. Perhaps the developers forgot to add these two styles to the top-level object in the enormous upheaval of HG2. Luckily, we can set the hbar/vbar styles directly, using the line’s MarkerHandle property:

hLine.MarkerHandle.Style = 'hbar';
set(hLine.MarkerHandle, 'Style','hbar');  % alternative

hLine.MarkerHandle.Style='hbar'

hLine.MarkerHandle.Style='vbar'

USA visit

I will be travelling in the US in May/June 2019. Please let me know (altmany at gmail) if you would like to schedule a meeting or onsite visit for consulting/training, or perhaps just to explore the possibility of my professional assistance to your Matlab programming needs.

Multi-threaded Mex

Yair Altman — Wed, 18 Jul 2018 14:56:44 +0000

I was recently asked by a consulting client to help speed up a Matlab process. Quite often there are various ways to improve the run-time, and in this particular case it turned out that the best option was to convert the core Matlab processing loop into a multi-threaded Mex function, while keeping the rest (vast majority of program code) in easy-to-maintain Matlab. This resulted in a 160x speedup (25 secs => 0.16 secs). Some of this speedup is attributed to C-code being faster in general than Matlab, another part is due to the multi-threading, and another due to in-place data manipulations that avoid costly memory access and re-allocations.

In today’s post I will share some of the insights relating to this MEX conversion, which could be adapted for many other similar use-cases. Additional Matlab speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

There are quite a few online resources about creating Mex files, so I will not focus on this aspect. I’ll assume that the reader is already familiar with the concept of using Mex functions, which are simply dynamically-linked libraries that have a predefined entry-function syntax and predefined platform-specific extension. Instead, I’ll focus on how to create and debug a multi-threaded Mex function, so that it runs in parallel on all CPU cores.

The benefit of multi-threading is that threads are very light-weight objects, that have minimal performance and memory overheads. This contrasts to multi-tasking, which is what the Parallel Computing Toolbox currently does: launches duplicate copies of the entire Matlab engine process (“headless workers”) and then manages and coordinates the tasks to split up the processing work. Multi-tasking should be avoided wherever we can employ light-weight multi-threading instead. Unfortunately, Matlab does not currently have the ability to explicitly multi-thread Matlab code. But we can still use explicit multi-threading by invoking code in other languages, as I’ve already shown for Java, C# (and .NET in general), and C/C++. Today’s article will expand on the latter post (the one about C/C++ multi-threading), by showing a general framework for making a multi-threaded C-based Mex function.

There are several alternative implementation of threads. On non-Windows machines, POSIX threads (“pthreads”) are a de-facto standard; on Windows, which pthreads can still be used, they generally use native Windows threads under the hood, and we can use these native threads directly.

I have uploaded a file called max_in_place to the Matlab File Exchange. This function serves as an example showing a generic multi-threaded Mex function. A compiled version of this Mex file for Win64 is also included in the File Exchange entry, and you can run it directly on a Win64 machine.

The usage in Matlab is as follows (note how matrix1 is updated in-place):

>> matrix1 = rand(4)
matrix1 =
      0.89092      0.14929      0.81428      0.19664
      0.95929      0.25751      0.24352      0.25108
      0.54722      0.84072      0.92926      0.61604
      0.13862      0.25428      0.34998      0.47329
 
>> matrix2 = rand(4)
matrix2 =
      0.35166      0.91719      0.38045      0.53081
      0.83083      0.28584      0.56782      0.77917
      0.58526       0.7572     0.075854      0.93401
      0.54972      0.75373      0.05395      0.12991
 
>> max_in_place(matrix1, matrix2)
 
>> matrix1
matrix1 =
      0.89092      0.91719      0.81428      0.53081
      0.95929      0.28584      0.56782      0.77917
      0.58526      0.84072      0.92926      0.93401
      0.54972      0.75373      0.34998      0.47329

Source code and discussion

The pseudo-code for the MEX function is as follows:

mexFunction():
   validate the input/output args
   quick bail-out if empty inputs
   get the number of threads N from Matlab's maxNumCompThreads function
   if N == 1
       run main_loop directly
   else
       split input matrix #1 into N index blocks
       assign start/end index for each thread
       create and launch N new threads that run main_loop
       wait for all N threads to complete
       free the allocated memory resources
   end

Here’s the core source-code of this function, which was adapted from original work by Dirk-Jan Kroon:

/*====================================================================
 *
 * max_in_place.c  updates a data matrix in-place with the max value
 *                 of the matrix and a 2nd matrix of the same size
 *
 * The calling syntax is:
 *
 *		max_in_place(matrix1, matrix2)
 *
 * matrix1 will be updated with the maximal values from corresponding
 * indices of the 2 matrices
 *
 * Both inputs must be double 2D real non-sparse matrices of same size
 *
 * Yair Altman 2018-07-18
 * https://undocumentedmatlab.com/blog/multi-threaded-mex
 *
 * Adapted from original work by Dirk-Jan Kroon
 * http://mathworks.com/matlabcentral/profile/authors/1097878-dirk-jan-kroon
 *
 *==================================================================*/
 
#include 
#include "mex.h"
 
/* undef needed for LCC compiler */
#undef EXTERN_C
#ifdef _WIN32
    #include 
    #include 
#else
    #include 
#endif
 
/* Input Arguments */
#define	hMatrix1	prhs[0]
#define	hMatrix2	prhs[1]
 
/* Macros */
#if !defined(MAX)
#define	MIN(A, B)	((A) < (B) ? (A) : (B))
#endif
 
/* Main processing loop function */
void main_loop(const mxArray *prhs[], int startIdx, int endIdx)
{
    /* Assign pointers to the various parameters */
    double *p1 = mxGetPr(hMatrix1);
    double *p2 = mxGetPr(hMatrix2);
 
    /* Loop through all matrix coordinates */
    for (int idx=startIdx; idx<=endIdx; idx++)
    {
        /* Update hMatrix1 with the maximal value of hMatrix1,hMatrix2 */
        if (p1[idx] < p2[idx]) {
            p1[idx] = p2[idx];
        }
    }
}
 
/* Computation function in threads */
#ifdef _WIN32
  unsigned __stdcall thread_func(void *ThreadArgs_) {
#else
  void thread_func(void *ThreadArgs_) {
#endif
    double **ThreadArgs = ThreadArgs_;  /* void* => double** */
    const mxArray** prhs = (const mxArray**) ThreadArgs[0];
 
    int ThreadID = (int) ThreadArgs[1][0];
    int startIdx = (int) ThreadArgs[2][0];
    int endIdx   = (int) ThreadArgs[3][0];
    /*mexPrintf("Starting thread #%d: idx=%d:%d\n", ThreadID, startIdx, endIdx); */
 
    /* Run the main processing function */
    main_loop(prhs, startIdx, endIdx);
 
    /* Explicit end thread, helps to ensure proper recovery of resources allocated for the thread */
    #ifdef _WIN32
        _endthreadex( 0 );
        return 0;
    #else
        pthread_exit(NULL);
    #endif
}
 
/* validateInputs function here... */
 
/* Main entry function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
 
{
    /* Validate the inputs */
    validateInputs(nlhs, plhs, nrhs, prhs);
 
    /* Quick bail-out in the trivial case of empty inputs */
    if (mxIsEmpty(hMatrix1))  return;
 
    /* Get the number of threads from the Matlab engine (maxNumCompThreads) */
    mxArray *matlabCallOut[1] = {0};
    mxArray *matlabCallIn[1]  = {0};
    mexCallMATLAB(1, matlabCallOut, 0, matlabCallIn, "maxNumCompThreads");
    double *Nthreadsd = mxGetPr(matlabCallOut[0]);
    int Nthreads = (int) Nthreadsd[0];
 
    /* Get the number of elements to process */
    size_t n1 = mxGetNumberOfElements(hMatrix1);
 
    if (Nthreads == 1) {
 
        /* Process the inputs directly (not via a thread) */
        main_loop(prhs, 0, n1-1);
 
    } else {  /* multi-threaded */
 
        /* Allocate memory for handles of worker threads */
        #ifdef _WIN32
            HANDLE    *ThreadList = (HANDLE*)   malloc(Nthreads*sizeof(HANDLE));
        #else
            pthread_t *ThreadList = (pthread_t*)malloc(Nthreads*sizeof(pthread_t));
        #endif
 
        /* Allocate memory for the thread arguments (attributes) */
        double **ThreadID, **ThreadStartIdx, **ThreadEndIdx, ***ThreadArgs;
        double *ThreadID1, *ThreadStartIdx1, *ThreadEndIdx1, **ThreadArgs1;
 
        ThreadID       = (double **) malloc( Nthreads* sizeof(double *) );
        ThreadStartIdx = (double **) malloc( Nthreads* sizeof(double *) );
        ThreadEndIdx   = (double **) malloc( Nthreads* sizeof(double *) );
        ThreadArgs     = (double ***)malloc( Nthreads* sizeof(double **) );
 
        /* Launch the requested number of threads */
        int i;
        int threadBlockSize = ceil( ((double)n1) / Nthreads );
        for (i=0; i<Nthreads; i++)
        {
            /* Create thread ID */
            ThreadID1 = (double *)malloc( 1* sizeof(double) );
            ThreadID1[0] = i;
            ThreadID[i] = ThreadID1;
 
            /* Compute start/end indexes for this thread */
            ThreadStartIdx1 = (double *) malloc( sizeof(double) );
            ThreadStartIdx1[0] = i * threadBlockSize;
            ThreadStartIdx[i] = ThreadStartIdx1;
 
            ThreadEndIdx1 = (double *) malloc( sizeof(double) );
            ThreadEndIdx1[0] = MIN((i+1)*threadBlockSize, n1) - 1;
            ThreadEndIdx[i] = ThreadEndIdx1;
 
            /* Prepare thread input args */
            ThreadArgs1 = (double **) malloc( 4* sizeof(double*) );
            ThreadArgs1[0] = (double *) prhs;
            ThreadArgs1[1] = ThreadID[i];
            ThreadArgs1[2] = ThreadStartIdx[i];
            ThreadArgs1[3] = ThreadEndIdx[i];
 
            ThreadArgs[i] = ThreadArgs1;
 
            /* Launch the thread with its associated args */
            #ifdef _WIN32
                ThreadList[i] = (HANDLE)_beginthreadex(NULL, 0, &thread_func, ThreadArgs[i], 0, NULL);
            #else
                pthread_create ((pthread_t*)&ThreadList[i], NULL, (void *) &thread_func, ThreadArgs[i]);
            #endif
        }
 
        /* Wait for all the treads to finish working */
        #ifdef _WIN32
            for (i=0; i<Nthreads; i++) { WaitForSingleObject(ThreadList[i], INFINITE); }
            for (i=0; i<Nthreads; i++) { CloseHandle( ThreadList[i] ); }
        #else
            for (i=0; i<Nthreads; i++) { pthread_join(ThreadList[i],NULL); }
        #endif
 
        /* Free the memory resources allocated for the threads */
        for (i=0; i<Nthreads; i++)
        {
            free(ThreadArgs[i]);
            free(ThreadID[i]);
            free(ThreadStartIdx[i]);
            free(ThreadEndIdx[i]);
        }
 
        free(ThreadArgs);
        free(ThreadID );
        free(ThreadStartIdx);
        free(ThreadEndIdx);
        free(ThreadList);
    }
 
    return;
}

This file also includes a validateInputs function. I did not include it in the code snippet above for brevity; you can read it directly in the FEX entry (max_in_place.c). This function checks that there are exactly 0 output and 2 input args, that the input args are real non-sparse matrices and that they have the same number of elements.

Note that the threads run a generic thread_func function, which in turn runs the main_loop function with the thread’s specified startIndex, endIndex values. When this function completes, the thread ends itself explicitly, to ensure resource cleanup.

Also note how the thread code is using pthreads on non-Windows (!defined(_WIN32)) machines, and native Windows threads otherwise. This means that the same MEX source-code could be used on all Matlab platforms.

The important thing to note about this framework is that we no longer need to worry about the thread plumbing. If we wish to adapt this code for any other processing, we just need to modify the main_loop function with the new processing logic. In addition, you may wish to modify the validateInputs function based on your new setup of input/output args.

A few caveats:

On Windows machines with R2017b or older, we simply compile using mex max_in_place.c; on non-Windows we might need to add the –lpthread flag to link the pthreads library, depending on your specific compiler.
On R2018a or newer on all platforms, due to MEX’s new interleaved-complex memory format, we would need to compile with the -R2017b flag if we wish to use mexGetPr, as in the sample code above (in R2018a’s new data model, the corresponding function is mxGetDoubles). Note that updating data in-place becomes more difficult with the new MEX API, so if you want to preserve the performance boost that in-place data manipulation provides, it may be better to stick with the legacy data memory model.
The sample code above splits the data between the threads based on the first input matrix’s size. Instead, you may consider sending to the MEX function the loop indexes as extra input args, and then splitting those up between the threads.
In this specific implementation of max_in_place, I have updated the data locations directly. This is generally discouraged and risky, because it conflicts with Matlab’s standard Copy-on-Write mechanism. For example, if we assign the input to any other Matlab variable(s) before calling max_in_place, then that other variable(s) will also get their data updated. If we do not want this side-effect, we should mxUnshareArray the input matrix1, and return the resulting matrix as an output of the MEX function (plhs[0]).

Speed-up tips

The core logic in the specific case that I was asked to optimize was something similar to this:

main_process:
    initialize output matrix
    loop z over all slices in a 3D data matrix
        temp_data = data(:,:,z);
        temp_data = process(temp_data);
        output = max(output, temp_data);
    end z loop

The initial speed-up attempt was naturally focused on the process and max functions. Converting them to a MEX function improved the speed by a factor of ~8, but the resulting run-time (4-5 secs) was still too slow for real-time use. The reason that we did not see a larger speed-up was, I believe, due to several reasons:

temp_data was small enough such that the overheads associated with creating and then disposing separate threads were significant compared to the processing time of each thread.
temp_data was small enough such that each thread processed a relatively small portion of the memory, in contrast to single-threaded processing that accesses memory in larger blocks, more efficiently.
In each iteration of the z loop, the overheads associated with calling the MEX function, handling input variables and validation, creating/disposing threads, and allocating/deallocating memory for temp_data, were repeatedly paid.

So, while the profiling result showed that 98% of the time was spent in the MEX function (which would seem to indicate that not much additional speedup can be achieved), in fact the MEX function was under-performing because of the inefficiencies involved in repeatedly creating threads to process small data chunks. It turned out that running in single-thread mode was actually somewhat faster than multi-threaded mode.

I then moved the entire z loop (entire main_process) into the MEX function, where the threads were split to process separate adjacent blocks of z slices (i.e. different blocks of the z loop). This way, the MEX function was called, the inputs validated, and threads created/disposed only once for the entire process, making this overhead negligible compared to each thread’s processing time. Moreover, each thread now processed the entire temp_data belonging to its z slice, so memory access was more efficient, reducing the memory I/O wait time and improving the overall processing time. Additional benefits were due to the fact that some variables could be reused within each thread across loop iterations, minimizing memory allocations and deallocations. The overall effect was to reduce the total run-time down to ~0.16 secs, a 160x speedup compared to the original (25 secs). As my client said: “You sped up [the application] from practically useless to clinically useful.”

The lesson: try to minimize MEX invocation and thread creation/disposal overheads, and let the threads process as large adjacent memory blocks as possible.

Debugging MEX files

When debugging multi-threaded MEX functions, I find that it’s often easier to run the function in single-threaded mode to debug the core logic, and once this is ready we can switch back multi-threading. This can easily be done by setting the number of threads outside the MEX function using Matlab’s builtin maxNumCompThreads function:

Nthreads = maxNumCompThreads(1);  % temporarily use only 1 thread for debugging
max_in_place(matrix1, matrix2);
maxNumCompThreads(Nthreads);      % restore previous value
%maxNumCompThreads('automatic');  % alternative

Once debugging is done and the MEX function works properly, we should remove the maxNumCompThreads calls, so that the MEX function will use the regular number of Matlab computational threads, which should be the same as the number of cores: feature(‘numCores’).

I typically like to use Eclipse as my IDE for non-Matlab code (Java, C/C++ etc.). Unfortunately, there’s a problem attaching Eclipse to Matlab processes (which is necessary for interactive MEX debugging) if you’re using any recent (post-2015) version of MinGW and Eclipse. This problem is due to a known Eclipse bug, as user Lithe pointed out. The workaround is to install an old version of MinGW, *in addition* to your existing MinGW version. Reportedly, only versions 4.9.1 or older of MinGW include gdb 7.8 (which is still supported by Eclipse), whereas newer versions of MinGW include a newer gdb that is not supported. Download and install such an old MinGW version in a separate folder from your more-modern compiler. Don’t update your MEX to use the older MinGW – just tell Eclipse to use the version of gdb in the old MinGW bin/ folder when you set up a debug configuration for debugging your MEX files.

Once you have a compatible gdb, and ask Eclipse to attach to a process, the processes list will finally appear (it won’t with an incompatible gdb). Use feature('getPID') to get your Matlab process ID, which can then used to attach to the relevant process in the Eclipse Attach-to-process window. For example, if your Matlab’s PID is 4321, then the Matlab process will be called “Program – 4321” in Eclipse’s processes list.

I wish that MathWorks would update their official Answer and their MinGW support package on File Exchange to include this information, because without it debugging on Eclipse becomes impossible. Eclipse is so powerful, easy-to-use and ubiquitous that it’s a shame for most users not to be able to work with it just because the workarounds above are not readily explained.

N.B. If you don’t like Eclipse, you can also use Visual Studio Code (VS Code), as Andy Campbell recently explained in the MathWorks Developers’ blog.

Consulting

Do you have any Matlab code that could use a bit (or a lot) of speeding-up? If so, please contact me for a private consulting offer. I can’t promise to speed up your code by a similar factor of 160x, but you never know…

String/char compatibility

Yair Altman — Thu, 28 Jun 2018 12:57:59 +0000

In numerous functions that I wrote over the years, some input arguments were expected to be strings in the old sense, i.e. char arrays for example, 'on' or 'off'. Matlab release R2016b introduced the concept of string objects, which can be created using the string function or [starting in R2017a] double quotes ("on").

The problem is that I have numerous functions that supported the old char-based strings but not the new string objects. If someone tries to enter a string object ("on") as input to a function that expects a char-array ('on'), in many cases Matlab will error. This by itself is very unfortunate – I would have liked everything to be fully backward-compatible. But unfortunately this is not the case: MathWorks did invest effort in making the new strings backward-compatible to some degree (for example, graphic object property names/values and many internal functions that now accept either form as input). However, backward compatibility of strings is not 100% perfect.

In such cases, the only solution is to make the function accept both forms (char-arrays and string objects), for example, by type-casting all such inputs as char-arrays using the builtin char function. If we do this at the top of our function, then the rest of the function can remain unchanged. For example:

function test(stage)
   if isa(stage,'string')      stage = char(stage);   end   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays
 
   switch stage
      case 'stage 1', ...
      case 'stage 2', ...
      ...
   end
end

That was simple enough. But what if our function expects complex inputs (cell-arrays, structs etc.) that may contain strings in only some of their cells/fields?

Luckily, Matlab contains an internal utility function that can help us: controllib.internal.util.hString2Char. This function, whose Matlab source-code is available (%matlabroot%/toolbox/shared/controllib/general/+controllib/+internal/+util/hString2Char.m) recursively scans the input value and converts any string object into the corresponding char-array, leaving all other data-types unchanged. For example:

>> controllib.internal.util.hString2Char({123, 'char-array', "a string"})
ans =
  1×3 cell array
    {[123]}    {'char-array'}    {'a string'}
 
>> controllib.internal.util.hString2Char(struct('a',"another string", 'b',pi))
ans = 
  struct with fields:
    a: 'another string'
    b: 3.14159265358979

In order to keep our code working not just on recent releases (that support strings and controllib.internal.util.hString2Char) but also on older Matlab releases (where they did not exist), we simply wrap the call to hString2Char within a try–catch block. The adaptation of our function might then look as follows:

function test(varargin)
   try varargin = controllib.internal.util.hString2Char(varargin); catch, end   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays
   ...
end

Note that controllib.internal.util.hString2Char is a semi-documented function: it contains a readable internal help section (accessible via help controllib.internal.util.hString2Char), but not a doc-page. Nor is this function mentioned anywhere in Matlab’s official documentation. I think that this is a pity, because it’s such a useful little helper function.

Blocked wait with timeout for asynchronous events

Yair Altman — Sun, 13 May 2018 20:22:08 +0000

Readers of this website may have noticed that I have recently added an IQML section to the website’s top menu bar. IQML is a software connector that connects Matlab to DTN’s IQFeed, a financial data-feed of live and historic market data. IQFeed, like most other data-feed providers, sends its data in asynchronous messages, which need to be processed one at a time by the receiving client program (Matlab in this case). I wanted IQML to provide users with two complementary modes of operation:

Streaming (asynchronous, non-blocking) – incoming server data is processed by internal callback functions in the background, and is made available for the user to query at any later time.
Blocking (synchronously waiting for data) – in this case, the main Matlab processing flows waits until the data arrives, or until the specified timeout period has passed – whichever comes first.

Implementing streaming mode is relatively simple in general – all we need to do is ensure that the underlying connector object passes the incoming server messages to the relevant Matlab function for processing, and ensure that the user has some external way to access this processed data in Matlab memory (in practice making the connector object pass incoming data messages as Matlab callback events may be non-trivial, but that’s a separate matter – read here for details).

In today’s article I’ll explain how we can implement a blocking mode in Matlab. It may sound difficult but it turns out to be relatively simple.

I had several requirements/criteria for my blocked-wait implementation:

Compatibility – It had to work on all Matlab platforms, and all Matlab releases in the past decade (which rules out using Microsoft Dot-NET objects)
Ease-of-use – It had to work out-of-the-box, with no additional installation/configuration (which ruled out using Perl/Python objects), and had to use a simple interface function
Timeout – It had to implement a timed-wait, and had to be able to tell whether the program proceeded due to a timeout, or because the expected event has arrived
Performance – It had to have minimal performance overhead

The basic idea

The basic idea is to use Matlab’s builtin waitfor, as I explained in another post back in 2012. If our underlying connector object has some settable boolean property (e.g., Done) then we can set this property in our event callback, and then waitfor(object,'Done'). The timeout mechanism is implemented using a dedicated timer object, as follows:

% Wait for data updates to complete (isDone = false if timeout, true if event has arrived)
function isDone = waitForDone(object, timeout)
    % Initialize: timeout flag = false
    object.setDone(false);
 
    % Create and start the separate timeout timer thread
    hTimer = timer('TimerFcn',@(h,e)object.setDone(true), 'StartDelay',timeout);
    start(hTimer);
 
    % Wait for the object property to change or for timeout, whichever comes first
    waitfor(object,'Done');
 
    % waitfor is over - either because of timeout or because the data changed
    % To determine which, check whether the timer callback was activated
    isDone = isvalid(hTimer) && hTimer.TasksExecuted == 0;
 
    % Delete the timer object
    try stop(hTimer);   catch, end
    try delete(hTimer); catch, end
 
    % Return the flag indicating whether or not timeout was reached
end  % waitForDone

where the event callback is responsible for invoking object.setDone(true) when the server data arrives. The usage would then be similar to this:

requestDataFromServer();
 
if isBlockingMode
   % Blocking mode - wait for data or timeout (whichever comes first)
   isDone = waitForDone(object, 10.0);  % wait up to 10 secs for data to arrive
   if ~isDone  % indicates a timeout
      fprintf(2, 'No server data has arrived within the specified timeout period!\n')
   end
else
   % Non-blocking (streaming) mode - continue with regular processing
end

Using a stand-alone generic signaling object

But what can we do if we don’t have such a Done property in our underlying object, or if we do not have programmatic access to it?

We could create a new non-visible figure and then waitfor one of its properties (e.g. Resize). The property would be initialized to 'off', and within both the event and timer callbacks we would set it to 'on', and then waitfor(hFigure,'Resize','on'). However, figures, even if non-visible, are quite heavy objects in terms of both memory, UI resources, and performance.

It would be preferable to use a much lighter-weight object, as long as it abides by the other criteria above. Luckily, there are numerous such objects in Java, which is bundled in every Matlab since 2000, on every Matlab platform. As long as we choose a small Java object that has existed forever, we should be fine. For example, we could use a simple javax.swing.JButton and its boolean property Enabled:

hSemaphore = handle(javax.swing.JButton);  % waitfor() expects a handle() object, not a "naked" Java reference
 
% Wait for data updates to complete (isDone = false if timeout, true if event has arrived)
function isDone = waitForDone(hSemaphore, timeout)
    % Initialize: timeout flag = false
    hSemaphore.setEnabled(false);
 
    % Create and start the separate timeout timer thread
    hTimer = timer('TimerFcn',@(h,e)hSemaphore.setEnabled(true), 'StartDelay',timeout);
    start(hTimer);
 
    % Wait for the object property to change or for timeout, whichever comes first
    waitfor(hSemaphore,'Enabled');
 
    % waitfor is over - either because of timeout or because the data changed
    % To determine which, check whether the timer callback was activated
    isDone = isvalid(hTimer) && hTimer.TasksExecuted == 0;
 
    % Delete the timer object
    try stop(hTimer);   catch, end
    try delete(hTimer); catch, end
 
    % Return the flag indicating whether or not timeout was reached
end  % waitForDone

In this implementation, we would need to pass the hSemaphore object handle to the event callback, so that it would be able to invoke hSemaphore.setEnabled(true) when the server data has arrived.

Under the hood, note that Enabled is not a true “property” of javax.swing.JButton, but rather exposes two simple public getter/setter methods (setEnabled() and isEnabled()), which Matlab interprets as a “property”. For all intents and purposes, in our Matlab code we can treat Enabled as a property of javax.swing.JButton, including its use by Matlab’s builtin waitfor function.

There is a light overhead to this: on my laptop the waitfor function returns ~20 mSecs after the invocation of hSemaphore.setEnabled(true) in the timer or event callback – in many cases this overhead is negligible compared to the networking latencies for the blocked data request. When event reactivity is of utmost importance, users can always use streaming (non-blocking) mode, and process the incoming data events immediately in a callback.

Of course, it would have been best if MathWorks would add a timeout option and return value to Matlab’s builtin waitfor function, similar to my waitForDone function – this would significantly simplify the code above. But until this happens, you can use waitForDone pretty-much as-is. I have used similar combinations of blocking and streaming modes with multiple other connectors that I implemented over the years (Interactive Brokers, CQG, Bloomberg and Reuters for example), and the bottom line is that it actually works well in practice.

Let me know if you’d like me to assist with your own Matlab project or data connector, either developing it from scratch or improving your existing code. I will be visiting Boston and New York in early June and would be happy to meet you in person to discuss your specific needs.

Speeding-up builtin Matlab functions – part 2

Yair Altman — Sun, 06 May 2018 16:26:19 +0000

Last week I showed how we can speed-up built-in Matlab functions, by creating local copies of the relevant m-files and then optimizing them for improved speed using a variety of techniques. Today I will show another example of such speed-up, this time of the Financial Toolbox’s maxdrawdown function, which is widely used to estimate the relative risk of a trading strategy or asset. One might think that such a basic indicator would be optimized for speed, but experience shows otherwise. In fact, this function turned out to be the main run-time performance hotspot for one of my clients. The vast majority of his code was optimized for speed, and he naturally assumed that the built-in Matlab functions were optimized as well, but this was not the case. Fortunately, I was able to create an equivalent version that was 30-40 times faster (!), and my client remains a loyal Matlab fan.

In today’s post I will show how I achieved this speed-up, using different methods than the ones I showed last week. A combination of these techniques can be used in a wide range of other Matlab functions. Additional speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

Profiling

As I explained last week, the first step in speeding up any function is to profile its current run-time behavior using Matlab’s builtin Profiler tool, which can either be started from the Matlab Editor toolstrip (“Run and Time”) or via the profile function.

The profile report for the client’s function showed that it had two separate performance hotspots:

Code that checks the drawdown format (optional 2nd input argument) against a set of allowed formats
Main code section that iteratively loops over the data-series values to compute the maximal drawdown

In order top optimize the code, I copied %matlabroot%/toolbox/finance/finance/maxdrawdown.m to a local folder on the Matlab path, renaming the file (and the function) maxdrawdn, in order to avoid conflict with the built-in version.

Optimizing input args pre-processing

The main problem with the pre-processing of the optional format input argument is the string comparisons, which are being done even when the default format is used (which is by far the most common case). String comparison are often more expensive than numerical computations. Each comparison by itself is very short, but when maxdrawdown is run in a loop (as it often is), the run-time adds up.

Here’s a snippet of the original code:

if nargin < 2 || isempty(Format)
    Format = 'return';
end
if ~ischar(Format) || size(Format,1) ~= 1
    error(message('finance:maxdrawdown:InvalidFormatType'));
end
choice = find(strncmpi(Format,{'return','arithmetic','geometric'},length(Format)));
if isempty(choice)
    error(message('finance:maxdrawdown:InvalidFormatValue'));
end

An equivalent code, which avoids any string processing in the common default case, is faster, simpler and more readable:

if nargin < 2 || isempty(Format)
    choice = 1;
elseif ~ischar(Format) || size(Format,1) ~= 1
    error(message('finance:maxdrawdown:InvalidFormatType'));
else
    choice = find(strncmpi(Format,{'return','arithmetic','geometric'},length(Format)));
    if isempty(choice)
        error(message('finance:maxdrawdown:InvalidFormatValue'));
    end
end

The general rule is that whenever you have a common case, you should check it first, avoiding unnecessary processing downstream. Moreover, for improved run-time performance (although not necessarily maintainability), it is generally preferable to work with numbers rather than strings (choice rather than Format, in our case).

Vectorizing the main loop

The main processing loop uses a very simple yet inefficient iterative loop. I assume that the code was originally developed this way in order to assist debugging and to ensure correctness, and that once it was ready nobody took the time to also optimize it for speed. It looks something like this:

MaxDD = zeros(1,N);
MaxDDIndex = ones(2,N);
...
if choice == 1   % 'return' format
    MaxData = Data(1,:);
    MaxIndex = ones(1,N);
    for i = 1:T
        MaxData = max(MaxData, Data(i,:));
        q = MaxData == Data(i,:);
        MaxIndex(1,q) = i;
        DD = (MaxData - Data(i,:)) ./ MaxData;
        if any(DD > MaxDD)
            p = DD > MaxDD;
            MaxDD(p) = DD(p);
            MaxDDIndex(1,p) = MaxIndex(p);
            MaxDDIndex(2,p) = i;
        end
    end
else             % 'arithmetic' or 'geometric' formats
    ...

This loop can relatively easily be vectorized, making the code much faster, and arguably also simpler, more readable, and more maintainable:

if choice == 3
    Data = log(Data);
end
MaxDDIndex = ones(2,N);
MaxData = cummax(Data);
MaxIndexes = find(MaxData==Data);
DD = MaxData - Data;
if choice == 1	% 'return' format
    DD = DD ./ MaxData;
end
MaxDD = cummax(DD);
MaxIndex2 = find(MaxDD==DD,1,'last');
MaxIndex1 = MaxIndexes(find(MaxIndexes<=MaxIndex2,1,'last'));
MaxDDIndex(1,:) = MaxIndex1;
MaxDDIndex(2,:) = MaxIndex2;
MaxDD = MaxDD(end,:);

Let’s make a short run-time and accuracy check – we can see that we achieved a 31-fold speedup (YMMV), and received the exact same results:

>> data = rand(1,1e7);
 
>> tic, [MaxDD1, MaxDDIndex1] = maxdrawdown(data); toc  % builtin Matlab function
Elapsed time is 7.847140 seconds.
 
>> tic, [MaxDD2, MaxDDIndex2] = maxdrawdn(data); toc  % our optimized version
Elapsed time is 0.253130 seconds.
 
>> speedup = round(7.847140 / 0.253130)
speedup =
    31
 
>> isequal(MaxDD1,MaxDD2) && isequal(MaxDDIndex1,MaxDDIndex2)  % check accuracy
ans =
  logical
   1

Disclaimer: The code above seems to work (quite well in fact) for a 1D data vector. You’d need to modify it a bit to handle 2D data – the returned maximal drawdown are still computed correctly but not the returned indices, due to their computation using the find function. This modification is left as an exercise for the reader…

Very similar code could be used for the corresponding maxdrawup function. Although this function is used much less often than maxdrawdown, it is in fact widely used and very similar to maxdrawdown, so it is surprising that it is missing in the Financial Toolbox. Here is the corresponding code snippet:

% Code snippet for maxdrawup (similar to maxdrawdn)
MaxDUIndex = ones(2,N);
MinData = cummin(Data);
MinIndexes = find(MinData==Data);
DU = Data - MinData;
if choice == 1	% 'return' format
    DU = DU ./ MinData;
end
MaxDU = cummax(DU);
MaxIndex = find(MaxDD==DD,1,'last');
MinIndex = MinIndexes(find(MinIndexes<=MaxIndex,1,'last'));
MaxDUIndex(1,:) = MinIndex;
MaxDUIndex(2,:) = MaxIndex;
MaxDU = MaxDU(end,:);

Similar vectorization could be applied to the emaxdrawdown function. This too is left as an exercise for the reader…

Conclusions

Matlab is composed of thousands of internal functions. Each and every one of these functions was meticulously developed and tested by engineers, who are after all only human. Whereas supreme emphasis is always placed with Matlab functions on their accuracy, run-time performance sometimes takes a back-seat. Make no mistake about this: code accuracy is almost always more important than speed (an exception are cases where some accuracy may be sacrificed for improved run-time performance). So I’m not complaining about the current state of affairs.

But when we run into a specific run-time problem in our Matlab program, we should not despair if we see that built-in functions cause slowdown. We can try to avoid calling those functions (for example, by reducing the number of invocations, or limiting the target accuracy, etc.), or optimize these functions in our own local copy, as I’ve shown last week and today. There are multiple techniques that we could employ to improve the run time. Just use the profiler and keep an open mind about alternative speed-up mechanisms, and you’d be half-way there.

Let me know if you’d like me to assist with your Matlab project, either developing it from scratch or improving your existing code, or just training you in how to improve your Matlab code’s run-time/robustness/usability/appearance. I will be visiting Boston and New York in early June and would be happy to meet you in person to discuss your specific needs.

Speeding-up builtin Matlab functions – part 1

Yair Altman — Sun, 29 Apr 2018 09:46:29 +0000

A client recently asked me to assist with his numeric analysis function – it took 45 minutes to run, which was unacceptable (5000 runs of ~0.55 secs each). The code had to run in 10 minutes or less to be useful. It turns out that 99% of the run-time was taken up by Matlab’s built-in fitdist function (part of the Statistics Toolbox), which my client was certain is already optimized for maximal performance. He therefore assumed that to get the necessary speedup he must either switch to another programming language (C/Java/Python), and/or upgrade his computer hardware at considerable expense, since parallelization was not feasible in his specific case.

Luckily, I was there to assist and was able to quickly speed-up the code down to 7 minutes, well below the required run-time. In today’s post I will show how I did this, which is relevant for a wide variety of other similar performance issues with Matlab. Many additional speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

Profiling

The first step in speeding up any function is to profile its current run-time behavior using Matlab’s builtin Profiler tool, which can either be started from the Matlab Editor toolstrip (“Run and Time”) or via the profile function.

The profile report for the client’s function showed that 99% of the time was spent in the Statistics Toolbox’s fitdist function. Drilling into this function in the profiling report, we see onion-like functions that processed input parameters, ensured data validation etc. The core processing is done inside a class that is unique to each required distribution (e.g., prob.StableDistribution, prob.BetaDistribution etc.) that is invoked within fitdist using an feval call, based on the distribution name that was specified by the user.

In our specific case, the external onion layers of sanity checks were unnecessary and could be avoided. In general, I advise not to discard such checks, because you never know whether future uses might have a problem with outlier data or input parameters. Moreover, in the specific case of fitdist they take only a very minor portion of the run-time (this may be different in other cases, such as the ismember function that I described years ago, where the sanity checks have a significant run-time impact compared to the core processing in the internal ismembc function).

However, since we wanted to significantly improve the total run-time and this was spent within the distribution class (prob.StableDistribution in the case of my client), we continue to drill-down into this class to determine what can be done.

It turns out that prob.StableDistribution basically does 3 things in its main fit() method:

pre-process the input data (prob.ToolboxFittableParametricDistribution.processFitArgs() and .removeCensoring() methods) – this turned out to be unnecessary in my client’s data, but has little run-time impact.
call stablefit() in order to get fitting parameters – this took about half the run-time
call stablelike() in order to get likelihood data – this took about half the run-time as well
call prob.StableDistribution.makeFitted() to create a probability-distribution object returned to the caller – this also took little run-time that was not worth bothering about.

The speed-up improvement process

With user-created code we could simply modify our code in-place. However, a more careful process is necessary when modifying built-in Matlab functions (either in the core Matlab distribution or in one of the toolboxes).

The basic idea here is to create a side-function that would replicate the core processing of fitdist. This is preferable to modifying Matlab’s installation files because we could then reuse the new function in multiple computers, without having to mess around in the Matlab installation (which may not even be possible if we do not have admin privileges). Also, if we ever upgrade our Matlab we won’t need to remember to update the installed files (and obviously retest).

I called the new function fitdist2 and inside it I initially placed only the very core functionality of prob.StableDistribution.fit():

% fitdist2 - fast replacement for fitdist(data,'stable')
% equivalent to pd = prob.StableDistribution.fit(data);
function pd = fitdist2(data)
    % Bypass pre-processing (not very important)
    [cens,freq,opt] = deal([]);
    %[data,cens,freq,opt] = prob.ToolboxFittableParametricDistribution.processFitArgs(data);
    %data = prob.ToolboxFittableParametricDistribution.removeCensoring(data,cens,freq,'stable');
 
    % Main processing in stablefit(), stablelike()
    params = stablefit(data,0.05,opt);
    [nll,cov] = stablelike(params,data);
 
    % Combine results into the returned probability distribution object
    pd = prob.StableDistribution.makeFitted(params,nll,cov,data,cens,freq);
end

If we try to run this as-is, we’d see errors because stablefit() and stablelike() are both sub-functions within %matlabroot%/toolbox/stats/stats/+prob/StableDistribution.m. So we copy these sub-functions (and their dependent subfunctions infoMtxCal(), intMle(), tabpdf(), neglog_pdf(), stable_nloglf(), varTrans) to the bottom of our fitdist2.m file, about 400 lines in total.

We also remove places that call checkargs(…) since that seems to be unnecessary – if you want to keep it then add the checkargs() function as well.

Now we re-run our code, after each speedup iteration verifying that the returned pd object returned by our fitdist2 is equivalent to the original object returned by fitdist.

Speeding-up stablefit()

A new profiling run shows that the vast majority of the time in stablefit() is spent in two lines:

s = load('private/StablePdfTable.mat');
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);

The first of these lines is reloading a static data file. The very same static data file is later reloaded in stablelike(). Both of these data-loads is done in every single invocation of fitdist, so if we have 5000 data fits, we load the same static data file 10,000 times! This is certainly not indicative of good programming practice. It would be much faster to reload the static data once into memory, and then use this cached data for the next 9,999 invocation. Since the original authors of StableDistribution.m seem to like single-character global variables (another bad programming practice, for multiple reasons), we’ll follow their example (added lines are highlighted):

persistent s  % this should have a more meaningful name (but at least is not global...)!if isempty(s)    fit_path = fileparts(which('fitdist'));    s = load([fit_path '/private/StablePdfTable.mat']);
    a = s.a;
    b = s.b;
    xgd = s.xgd;
    p = s.p;
end

In order to speed-up the second line (that calls fminsearch), we can reduce the tolerances used by this function, by updating the options structure passed to it, so that we use tolerances of 1e-3 rather than the default 1e-6 (in our specific case this resulted in acceptable errors of ~0.1%). Specifically, we modify the code from this:

function [parmhat,parmci] = stablefit(x,alpha,options)
...
if nargin < 3 || isempty(options)
    options = statset('stablefit');
else
    options = statset(statset('stablefit'),options);
end
 
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

to this (note that the line that actually calls fminsearch remains unchanged):

function [parmhat,parmci] = stablefit(x,alpha,unused_options)...
persistent optionsif isempty(options)    options = statset('stablefit');
    options.TolX   = 1e-3;    options.TolFun = 1e-3;    options.TolBnd = 1e-3;end
 
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

The fminsearch internally calls tabpdf() repeatedly. Drilling down in the profiling report we see that it recomputes a griddedInterpolant object that is essentially the same for all iterations (and therefore a prime candidate for caching), and also that it uses the costly cubic interpolation rather than a more straight-forward linear interpolation:

function y = tabpdf(x,alpha,beta)
...
persistent G  % this should have a more meaningful name (but at least is not global...)!if isempty(G)    G = griddedInterpolant({b, a, xgd}, p, 'linear','none');  % 'linear' instead of 'cubic'end%G = griddedInterpolant({b, a, xgd}, p, 'cubic','none');  % original
y = G({beta,alpha,x});
...

These cases illustrate two important speedup technique categories: caching data in order to reduce the number of times that a certain code hotspot is being run, and modifying the parameters/inputs in order to reduce the individual run-time of the hotspot code. Variations of these techniques form the essence of effective speedup and can often be achieved by just reflecting on the problem and asking yourself two questions:

can I reduce the number of times that this code is being run? and
can I reduce the run-time of this specific code section?

Additional important speed-up categories include parallelization, vectorization and algorithmic modification. These are sometimes more difficult programmatically and riskier in terms of functional equivalence, but may be required in case the two main techniques above are not feasible. Of course, we can always combine these techniques, we don’t need to choose just one or the other.

Speeding-up stablelike()

We now turn our attentions to stablelike(). As for the loaded static file, we could simply use the cached s to load the data in order to avoid repeated reloading of the data from disk. But it turns out that this data is actually not used at all inside the function (!) so we just comment-out the old code:

%s = load('private/StablePdfTable.mat');
%a = s.a;
%b = s.b;
%xgd = s.xgd;
%p = s.p;

Think about this – the builtin Matlab code loads a data file from disk and then does absolutely nothing with it – what a waste!

Another important change is to reduce the run-time of the integral function, which is called thousands of times within a double loop. We do this by reducing the tolerances specified in the integral call from 1e-6:

F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-6,'RelTol',1e-4); % original
F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-3,'RelTol',1e-3); % new

You can see that once again these two cases follow the two techniques that I mentioned above: we reduced the number of times that we load the data file (to 0 in our case), and we improved the run-time of the individual integral calculation by reducing its tolerances.

Conclusions

The final result of applying the techniques above was a 6-fold speedup, reducing the total run-time from 45 minutes down to 7 minutes. I could probably have improved the run-time even further, but since we reached our target run-time I stopped there. The point after all was to make the code usable, not to reach a world speed record.

In my next article I will present another example of dramatically improving the run-time speed of a built-in Matlab function, this time a very commonly-used function in the Financial Toolbox that I was able to speed-up by a factor of 40.

Matlab releases improve continuously, so hopefully my techniques above (or alternatives) would find their way into the builtin Matlab functions, making them faster than today, out-of-the-box.

Until this happens, we should not lose hope when faced with a slow Matlab function, even if it is a built-in/internal one, as I hope to have clearly illustrated today, and will also show in my next article. Improving the performance is often easy. In fact, it took me much longer to write this article than it was to improve my client’s code…

Adding custom properties to GUI objects

Yair Altman — Thu, 15 Feb 2018 12:39:35 +0000

Matlab objects have numerous built-in properties (some of them publicly-accessible/documented and others not, but that’s a different story). For various purposes, it is sometimes useful to attach custom user-defined properties to such objects. While there was never a fully-documented way to do this, most users simply attached such properties as fields in the UserData property or the object’s [hidden] ApplicationData property (accessible via the documented setappdata/getappdata functions).

An undocumented way to attach actual new user-defined properties to objects such as GUI handles or Java references has historically (in HG1, up to R2014a) been to use the undocumented schema.prop function, as I explained here. As I wrote in that post, in HG2 (R2014b onward), we can use the fully-documented addprop function to add new custom properties (and methods) to such objects. What is still NOT documented, as far as I could tell, is that all of Matlab’s builtin handle graphics objects indirectly inherit the dynamicprops class, which allows this. The bottom line is that we can dynamically add custom properties in run-time to any HG object, without affecting any other object. In other words, the new properties will only be added to the handles that we specifically request, and not to any others.

All this is important, because for some unexplained reason that escapes my understanding, MathWorks chose to seal its classes, thus preventing users to extend them with sub-classes that contain the new properties. So much frustration could have been solved if MathWorks would simply remove the Sealed class meta-property from its classes. Then again, I’d have less to blog about in that case…

Anyway, why am I rehashing old news that I have already reported a few years ago?

Well, first, because my experience has been that this little tidbit is [still] fairly unknown by Matlab developers. Secondly, I happened to run into a perfect usage example a short while ago that called for this solution: a StackExchange user asked whether it is possible to tell a GUI figure’s age, in other words the elapsed time since the figure was created. The simple answer would be to use setappdata with the creation date whenever we create a figure. However, a “cleaner” approach seems to be to create new read-only properties for the figure’s CreationTime and Age:

First, create a small Matlab function as follows, that attaches the CreationTime property to a figure:

function setCreationTime(hFig,varargin)
   hProp = addprop(hFig,'CreationTime');
   hFig.CreationTime = now;
   hProp.SetAccess = 'private';  % make property read-only after setting its initial value
 
   hProp = addprop(hFig,'Age');
   hProp.GetMethod = @(h,e) etime(datevec(hFig.CreationTime), clock);  % compute on-the-fly
   hProp.SetAccess = 'private';  % make property read-only
end

Now assign this function as the default CreateFcn callback function for all new figures from now on:

set(0,'DefaultFigureCreateFcn',@setCreationTime)

That’s it – you’re done! Whenever a new figure will be created from now on, it will have two custom read-only properties: CreationTime and Age.

For example:

>> newFig = figure;
>> newFig.CreationTime
ans =
      737096.613706748
 
>> ageInDays = now - newFig.CreationTime
ageInDays = 
       0.0162507836846635
>> ageDuration = duration(ageInDays*24,0,0)
ageDuration = 
  duration
   00:23:24
>> ageString = datestr(ageInDays, 'HH:MM:SS.FFF')
ageString = 
    '00:23:24.068'
 
>> ageInSecs = newFig.Age
ageInSecs =
       1404.06771035492

Note that an alternative way to set the computed property Age would have been to set its value to be an anonymous function, but this would have necessitated invoking it with parenthesis (as in: ageInSecs = newFig.Age()). By setting the property’s GetMethod meta-property we avoid this need.

Keen readers will have noticed that the mechanism that I outlined above for the Age property/method can also be used to add custom user methods. For example, we can create a new custom property named refresh that would be read-only and have a GetMethod which is the function handle of the function that refreshes the object in some way.

Do you have any special uses for custom user-defined properties/methods in your program? or perhaps you have a use-case that might show MathWorks why sub-classing the built-in classes might improve your work? if so, then please place a comment about it below. If enough users show MathWorks why this is important, then maybe it will be fixed in some future release.

Customizing axes tick labels

Yair Altman — Wed, 24 Jan 2018 13:38:26 +0000

In last week’s post, I discussed various ways to customize bar/histogram plots, including customization of the tick labels. While some of the customizations that I discussed indeed rely on undocumented properties/features, many Matlab users are not aware that tick labels can be individually customized, and that this is a fully documented/supported functionality. This relies on the fact that the default axes TickLabelInterpreter property value is 'tex', which supports a wide range of font customizations, individually for each label. This includes any combination of symbols, superscript, subscript, bold, italic, slanted, face-name, font-size and color – even intermixed within a single label. Since tex is the default interpreter, we don’t need any special preparation – simply set the relevant X/Y/ZTickLabel string to include the relevant tex markup.

To illustrate this, have a look at the following excellent answer by user Ubi on Stack Overflow:

Axes with Tex-customized tick labels

plot(1:10, rand(1,10))
ax = gca;
 
% Simply color an XTickLabel
ax.XTickLabel{3} = ['\color{red}' ax.XTickLabel{3}];
 
% Use TeX symbols
ax.XTickLabel{4} = '\color{blue} \uparrow';
 
% Use multiple colors in one XTickLabel
ax.XTickLabel{5} = '\color[rgb]{0,1,0}green\color{orange}?';
 
% Color YTickLabels with colormap
nColors = numel(ax.YTickLabel);
cm = jet(nColors);
for i = 1:nColors
    ax.YTickLabel{i} = sprintf('\\color[rgb]{%f,%f,%f}%s', cm(i,:), ax.YTickLabel{i});
end

In addition to 'tex', we can also set the axes object’s TickLabelInterpreter to 'latex' for a Latex interpreter, or 'none' if we want to use no string interpretation at all.

As I showed in last week’s post, we can control the gap between the tick labels and the axle line, using the Ruler object’s undocumented TickLabelGapOffset, TickLabelGapMultiplier properties.

Also, as I explained in other posts (here and here), we can also control the display of the secondary axle label (typically exponent or units) using the Ruler’s similarly-undocumented SecondaryLabel property. Note that the related Ruler’s Exponent property is documented/supported, but simply sets a basic exponent label (e.g., '\times10^{6}' when Exponent==6) – to set a custom label string (e.g., '\it\color{gray}Millions'), or to modify its other properties (position, alignment etc.), we should use SecondaryLabel.

Using SQLite in Matlab

Yair Altman — Wed, 27 Dec 2017 21:53:54 +0000

MathWorks invests a huge amount of effort in recent years on supporting large distributed databases. The business case for this focus is entirely understandable, but many Matlab users have much simpler needs, which are often served by the light-weight open-source SQLite database (which claims to be the most widely-used database worldwide). Although SQLite is very widely used, and despite the fact that built-in support for SQLite is included in Matlab (for its internal use), MathWorks has chosen not to expose any functionality or wrapper function that would enable end-users to access it. In any case, I recently came across a need to do just that, when a consulting client asked me to create an interactive data-browser for their SQLite database that would integrate with their Matlab program:

In today’s post I will discuss several possible mechanisms to integrate SQLite in Matlab code, and you can take your pick among them. Except for the Database Toolbox, all the alternatives are free (open-source) libraries (even the commercial Database Toolbox relies on one of the open-source libraries, by the way).

sqlite4java

sqlite4java is a Java package by ALM Works that is bundled with Matlab for the past several years (in the %matlabroot%/java/jarext/sqlite4java/ folder). This is a lightweight open-source package that provides a minimalist and fast (although not very convenient) interface to SQLite. You can either use the package that comes with your Matlab installation, or download and use the latest version from the project repository, where you can also find documentation.

Mark Mikofski exposed this hidden nugget back in 2015, and you are welcome to view his post for additional details. Here’s a sample usage:

% Open the DB data file
db = com.almworks.sqlite4java.SQLiteConnection(java.io.File('C:\Yair\Data\IGdb 2017-11-13.sqlite'));
db.open;
 
% Prepare an SQL query statement
stmt = db.prepare(['select * from data_table where ' conditionStr]);
 
% Step through the result set rows
row = 1;
while stmt.step
   numericValues(row) = stmt.columnInt(0);    % column #0
   stringValues{row}  = stmt.columnString(1); % column #1
end
 
% Cleanup
stmt.dispose
db.dispose

Note that since sqlite4java uses a proprietary interface (similar, but not identical, to JDBC), it can take a bit of time to get used to it. I am generally a big fan of preferring built-in components over externally-installed ones, but in this particular case I prefer other alternatives.

JDBC

JDBC (Java Database Connectivity) is the industry standard for connectivity to databases. Practically all databases nowadays have at least one JDBC connector, and many DBs have multiple JDBC drivers created by different groups. As long as they all adhere to the JDBC interface standard, these drivers are all equivalent and you can choose between them based on availability, cost, support, license, performance and other similar factors. SQLite is no exception to this rule, and has several JDBC driver implementations, including xerial’s sqlite-jdbc (also discussed by Mark Mikofski) and sqlitejdbc. If you ask me,
sqlite-jdbc is better as it is being maintained with new versions released periodically.

The example above would look something like this with sqlite-jdbc:

% Add the downloaded JAR library file to the dynamic Java classpath
javaaddpath('C:\path\to\sqlite\sqlite-jdbc-3.21.0.jar')
 
% Open the DB file
jdbc = org.sqlite.JDBC;
props = java.util.Properties;
conn = jdbc.createConnection('jdbc:sqlite:C:\Yair\Data\IGdb 2017-11-13.sqlite',props);  % org.sqlite.SQLiteConnection object
 
% Prepare and run an SQL query statement
sqlStr = ['select * from data_table where ' conditionStr];
stmt = conn.createStatement;     % org.sqlite.jdbc4.JDBC4Statement object
rs = stmt.executeQuery(sqlStr);  % org.sqlite.jdbc4.JDBC4ResultSet object
 
% Step through the result set rows
rows = 1;
while rs.next
   numericValues(row) = rs.getLong('ID');
   stringValues{row}  = rs.getString('Name');
end
 
% Cleanup
rs.close
stmt.close
conn.close

Database toolbox

In addition to all the above, MathWorks sells the Database Toolbox which has an integral SQLite connector, in two flavors – native and JDBC (the JDBC connector is simply sqlite-jdbc that I mentioned above, see a short discussion here).

I assume that the availability of this feature in the DB toolbox is the reason why MathWorks has never created a documented wrapper function for the bundled sqlite4java. I could certainly understand this from a business perspective. Still, with so many free alternatives available as discussed in this post, I see not reason to purchase the toolbox merely for its SQLite connector. Then again, if you need to connect to several different database types, not just SQLite, then getting the toolbox might make sense.

mksqlite

My personal favorite is actually none of these Java-based connectors (surprise, surprise), but rather the open-source mksqlite connector by Martin Kortmann and Andreas Martin. This is a native (Mex-file) connector that acts as a direct Matlab function. The syntax is pretty straight-forward and supports SQL queries. IMHO, its usage is a much simpler than with any of the other alternatives:

% Open the DB file
mksqlite('open', 'C:\Yair\Data\IGdb 2017-11-13.sqlite');
 
% Query the database
results = mksqlite(['select * from data_table where ' conditionStr]);
numericValues = [results.ID];
stringValues  = {results.Name};
 
% Cleanup
mksqlite('close');

Can it be any simpler than this!?

However, the main benefit of mksqlite over the other connectors is not its simplicity but the connector’s speed. This speed is due to the fact that the query is vectorized and we do not need to loop over all the separate data rows and fields. With the other connectors, it is actually not the loop that takes so long in Matlab, but rather the overheads and inefficiencies of numerous library calls to fetch one single value at a time from the result-set – this is avoided in mksqlite where there is only a single call. This results in lightning speed: A couple of years ago I consulted to a client who used a JDBC connector to an SQLite database; by switching from a JDBC connector to mksqlite, I reduced the execution time from 7 secs to 70 msecs – a 100x speedup! In that specific case, this made the difference between an unusable program and a highly interactive/responsive one.

Other alternatives

In addition to all the above, we can also use a .NET-based connector or a Python one – I leave these as an exercise for the reader…

Have I forgotten some important alternative? Or perhaps you have have some related tip you’d like to share? If so, then please leave a comment below.

Happy New Year everybody!

PlotEdit context-menu customization

Yair Altman — Wed, 13 Dec 2017 12:57:14 +0000

Last week, a Matlab user asked whether it is possible to customize the context (right-click) menu that is presented in plot-edit mode. This menu is displayed by clicking the plot-edit (arrow) icon on the standard Matlab figure toolbar, then right-clicking any graphic/GUI element in the figure. Unfortunately, it seems that this context menu is only created the first time that a user right-clicks in plot-edit mode – it is not accessible before then, and so it seems impossible to customize the menu before it is presented to the user the first time.

Customized plot-edit context-menu

A few workarounds were suggested to the original poster and you are most welcome to review them. There is also some discussion about the technical reasons that none of the “standard” ways of finding and modifying menu items fail in this case.

In today’s post I wish to repost my solution, in the hope that it might help other users in similar cases.

My solution is basically this:

First, enter plot-edit mode programmatically using the plotedit function
Next, move the mouse to the screen location of the relevant figure component (e.g. axes). This can be done in several different ways (the root object’s PointerLocation property, the moveptr function, or java.awt.Robot.mouseMove() method).
Next, automate a mouse right-click using the built in java.awt.Robot class (as discussed in this blog back in 2010)
Next, locate the relevant context-menu item and modify its label, callback or any of its other properties
Next, dismiss the context-menu by simulating a follow-on right-click using the same Robot object
Finally, exit plot-edit mode and return the mouse pointer to its original location

% Create an initial figure / axes for demostration purpose
fig = figure('MenuBar','none','Toolbar','figure');
plot(1:5); drawnow; 
 
% Enter plot-edit mode temporarily
plotedit(fig,'on'); drawnow
 
% Preserve the current mouse pointer location
oldPos = get(0,'PointerLocation');
 
% Move the mouse pointer to within the axes boundary
% ref: https://undocumentedmatlab.com/blog/undocumented-mouse-pointer-functions
figPos = getpixelposition(fig);   % figure position
axPos  = getpixelposition(gca,1); % axes position
figure(fig);  % ensure that the figure is in focus
newPos = figPos(1:2) + axPos(1:2) + axPos(3:4)/4;  % new pointer position
set(0,'PointerLocation',newPos);  % alternatives: moveptr(), java.awt.Robot.mouseMove()
 
% Simulate a right-click using Java robot
% ref: https://undocumentedmatlab.com/blog/gui-automation-robot
robot = java.awt.Robot;
robot.mousePress  (java.awt.event.InputEvent.BUTTON3_MASK); pause(0.1)
robot.mouseRelease(java.awt.event.InputEvent.BUTTON3_MASK); pause(0.1)
 
% Modify the  menu item
hMenuItem = findall(fig,'Label','Clear Axes');
if ~isempty(hMenuItem)
   label = 'Undocumented Matlab';
   callback = 'web(''https://undocumentedmatlab.com'',''-browser'');';
   set(hMenuItem, 'Label',label, 'Callback',callback);
end
 
% Hide the context menu by simulating a left-click slightly offset
set(0,'PointerLocation',newPos+[-2,2]);  % 2 pixels up-and-left
pause(0.1)
robot.mousePress  (java.awt.event.InputEvent.BUTTON1_MASK); pause(0.1)
robot.mouseRelease(java.awt.event.InputEvent.BUTTON1_MASK); pause(0.1)
 
% Exit plot-edit mode
plotedit(fig,'off'); drawnow
 
% Restore the mouse pointer to its previous location
set(0,'PointerLocation',oldPos);

In this code, I sprinkled a few pauses at several locations, to ensure that everything has time to fully render. Different pause values, or perhaps no pause at all, may be needed on your specific system.

Modifying the default context-menu shown in plot-edit mode may perhaps be an uncommon use-case. But the technique that I demonstrated above – of using a combination of Matlab and Java Robot commands to automate a certain animation – can well be used in many other use-cases where we cannot easily access the underlying code. For example, when the internal code is encoded/encrypted, or when a certain functionality (such as the plot-edit context-menu) is created on-the-fly.

If you have encountered a similar use-case where such automated animations can be used effectively, please add a comment below.

Tips for accelerating Matlab performance

Yair Altman — Thu, 05 Oct 2017 18:25:06 +0000

I’m proud to report that MathWorks has recently posted my article “Tips for Accelerating MATLAB Performance” in their latest newsletter digest (September 2017). This article is an updated and expanded version of my post about consulting work that I did for the Crustal Dynamics Research Group at Harvard University, where I helped speed-up a complex Matlab-based GUI by a factor of 50-500 (depending on the specific feature). You can read the full detailed technical article here.
Crustal dynamics visualization GUI
Featuring an article on the official newsletter by a non-MathWorker is rare. Doing this with someone like myself who has a reputation for undocumented aspects, and a consultancy business that potentially competes with theirs, is certainly not obvious. I take this to be a sign that despite the possible drawbacks of publishing my article, MathWorks felt that it provided enough value to the Matlab user community to merit the risk. I applaud MathWorks for this, and thank them for the opportunity of being featured in their official newsletter and conferences. I do not take it for granted in the least.
The newsletter article provides multiple ideas of improving the run-time performance for file I/O and graphics. Many additional techniques for improving Matlab’s performance can be found under the Performance tag in this blog, as well as in my book “Accelerating MATLAB Performance” (CRC Press, 2014, ISBN 978-1482211290).
I am offering a couple of webinars about various ways to improve Matlab’s run-time performance:
Matlab performance tuning part 1 (3:39 hours, syllabus) – $195 (buy)
Matlab performance tuning part 2 (3:43 hours, syllabus) – $195 (buy)
==> or buy both Matlab performance tuning webinars for only $345 (buy)
Both the webinar videos and their corresponding slide-decks are available for download. The webinars content is based on onsite training courses that I presented at multiple client locations (details).
Email me if you would like additional information on the webinars or my consulting, or to inquire regarding an onsite training course.
Related posts:
Performance: scatter vs. line – In many circumstances, the line function can generate visually-identical plots as the scatter function, much faster...
Plot LimInclude properties – The plot objects' XLimInclude, YLimInclude, ZLimInclude, ALimInclude and CLimInclude properties are an important feature, that has both functional and performance implications....
Plot performance – Undocumented inner plot mechanisms can significantly improve plotting performance ...
Performance: accessing handle properties – Handle object property access (get/set) performance can be significantly improved using dot-notation. ...

Faster csvwrite/dlmwrite

Yair Altman — Tue, 03 Oct 2017 15:00:05 +0000

Matlab’s builtin functions for exporting (saving) data to output files are quite sub-optimal (as in slowwwwww…). I wrote a few posts about this in the past (how to improve fwrite performance, and save performance). Today I extend the series by showing how we can improve the performance of delimited text output, for example comma-separated (CSV) or tab-separated (TSV/TXT) files.
The basic problem is that Matlab’s dlmwrite function, which can either be used directly, or via the csvwrite function which calls it internally, is extremely inefficient: It processes each input data value separately, in a non-vectorized loop. In the general (completely non-vectorized) case, each data value is separately converted into a string, and is separately sent to disk (using fprintf). In the specific case of real data values with simple delimiters and formatting, row values are vectorized, but in any case the rows are processed in a non-vectorized loop: A newline character is separately exported at the end of each row, using a separate fprintf call, and this has the effect of flushing the I/O to disk each and every row separately, which is of course disastrous for performance. The output file is indeed originally opened in buffered mode (as I explained in my fprintf performance post), but this only helps for outputs done within the row – the newline output at the end of each row forces an I/O flush regardless of how the file was opened. In general, when you read the short source-code of dlmwrite.m you’ll get the distinct feeling that it was written for correctness and maintainability, and some focus on performance (e.g., the vectorization edge-case). But much more could be done for performance it would seem.
This is where Alex Nazarovsky comes to the rescue.

Alex was so bothered by the slow performance of csvwrite and dlmwrite that he created a C++ (MEX) version that runs about enormously faster (30 times faster on my system). He explains the idea in his blog, and posted it as an open-source utility (mex-writematrix) on GitHub.
Usage of Alex’s utility is very easy:
mex_WriteMatrix(filename, dataMatrix, textFormat, delimiter, writeMode);
where the input arguments are:
filename – full path name for file to export
dataMatrix – matrix of numeric values to be exported
textFormat – format of output text (sprintf format), e.g. '%10.6f'
delimiter – delimiter, for example ',' or ';' or char(9) (=tab)
writeMode – 'w+' for rewriting file; 'a+' for appending (note the lowercase: uppercase will crash Matlab!)
Here is a sample run on my system, writing a simple CSV file containing 1K-by-1K data values (1M elements, ~12MB text files):
>> data = rand(1000, 1000); % 1M data values, 8MB in memory, ~12MB on disk >> tic, dlmwrite('temp1.csv', data, 'delimiter',',', 'precision','%10.10f'); toc Elapsed time is 28.724937 seconds. >> tic, mex_WriteMatrix('temp2.csv', data, '%10.10f', ',', 'w+'); toc % 30 times faster! Elapsed time is 0.957256 seconds.
Alex’s mex_WriteMatrix function is faster even in the edge case of simple formatting where dlmwrite uses vectorized mode (in that case, the file is exported in ~1.2 secs by dlmwrite and ~0.9 secs by mex_WriteMatrix, on my system).
Trapping Ctrl-C interrupts
Alex’s mex_WriteMatrix code includes another undocumented trick that could help anyone else who uses a long-running MEX function, namely the ability to stop the MEX execution using Ctrl-C. Using Ctrl-C is normally ignored in MEX code, but Wotao Yin showed how we can use the undocumented utIsInterruptPending() MEX function to monitor for user interrupts using Ctrl-C. For easy reference, here is a copy of Wotao Yin’s usage example (read his webpage for additional details):
/* A demo of Ctrl-C detection in mex-file by Wotao Yin. Jan 29, 2010. */ #include "mex.h" #if defined (_WIN32) #include #elif defined (__linux__) #include #endif #ifdef __cplusplus extern "C" bool utIsInterruptPending(); #else extern bool utIsInterruptPending(); #endif void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { int count = 0; while(1) { #if defined(_WIN32) Sleep(1000); /* Sleep one second */ #elif defined(__linux__) usleep(1000*1000); /* Sleep one second */ #endif mexPrintf("Count = %d\n", count++); /* print count and increase it by 1 */ mexEvalString("drawnow;"); /* flush screen output */ if (utIsInterruptPending()) { /* check for a Ctrl-C event */ mexPrintf("Ctrl-C Detected. END\n\n"); return; } if (count == 10) { mexPrintf("Count Reached 10. END\n\n"); return; } } }
Matlab performance webinars
I am offering a couple of webinars about various ways to improve Matlab’s run-time performance:
Matlab performance tuning part 1 (3:39 hours, syllabus) – $195 (buy)
Matlab performance tuning part 2 (3:43 hours, syllabus) – $195 (buy)
==> or buy both Matlab performance tuning webinars for only $345 (buy)
Both the webinar videos and their corresponding slide-decks are available for download. The webinars content is based on onsite training courses that I presented at multiple client locations (details).
Additional Matlab performance tips can be found under the Performance tag in this blog, as well as in my book “Accelerating MATLAB Performance“.
Email me if you would like additional information on the webinars, or an onsite training course, or about my consulting.
Related posts:
Parsing mlint (Code Analyzer) output – The Matlab Code Analyzer (mlint) has a lot of undocumented functionality just waiting to be used. ...
Explicit multi-threading in Matlab part 3 – Matlab performance can be improved by employing POSIX threads in C/C++ code. ...
Plot LimInclude properties – The plot objects' XLimInclude, YLimInclude, ZLimInclude, ALimInclude and CLimInclude properties are an important feature, that has both functional and performance implications....
Preallocation performance – Preallocation is a standard Matlab speedup technique. Still, it has several undocumented aspects. ...

Sending HTML emails from Matlab

Yair Altman — Wed, 02 Aug 2017 21:19:42 +0000

A few months ago I wrote about various tricks for sending email/text messages from Matlab. Unfortunately, Matlab only sends text emails by default and provides no documented way to send HTML-formatted emails. Text-only emails are naturally very bland and all mail clients in the past 2 decades support HTML-formatted emails. Today I will show how we can send such HTML emails from Matlab.
A quick recap: Matlab’s sendmail function uses Java (specifically, the standard javax.mail package) to prepare and send emails. The Java classes are extremely powerful and so there is no wonder that Mathworks chose to use them rather than reinventing the wheel. However, Matlab’s sendmail function only uses part of the functionality exposed by these classes (admittedly, the most important parts that deal with the basic mail-sending mechanism), and does not expose external hooks or input args that would enable the user to take full advantage of the more advanced features, HTML formatting included.
Only two small changes are needed in sendmail.m to support HTML formatting:
HTML formatting required calling the message-object’s setContent() method, rather than setText().
We need to specify 'text/html' as part of the message’s encoding
To implement these features, change the following (lines #119-130 in the original sendmail.m file of R2017a, changed lines highlighted):
% Construct the body of the message and attachments. body = formatText(theMessage); if numel(attachments) == 0 if ~isempty(charset) msg.setText(body, charset); else msg.setText(body); end else % Add body text. messageBodyPart = MimeBodyPart; if ~isempty(charset) messageBodyPart.setText(body, charset); ...
to this (changed lines highlighted):
% Construct the body of the message and attachments. body = formatText(theMessage); isHtml = ~isempty(body) && body(1) == '<'; % msg starting with '<' indicates HTMLif isHtml if isempty(charset) charset = 'text/html; charset=utf-8'; else charset = ['text/html; charset=' charset]; endendif numel(attachments) == 0 && ~isHtml if isHtml msg.setContent(body, charset); elseif ~isempty(charset) msg.setText(body, charset); else msg.setText(body); end else % Add body text. messageBodyPart = MimeBodyPart; if isHtml messageBodyPart.setContent(body, charset); elseif ~isempty(charset) messageBodyPart.setText(body, charset); ...
In addition, I also found it useful to remove the hard-coded 75-character line-wrapping in text messages. This can be done by changing the following (line #291 in the original sendmail.m file of R2017a):
maxLineLength = 75;
to this:
maxLineLength = inf; % or some other large numeric value
Deployment
It’s useful to note two alternatives for making these fixes:
Making the changes directly in %matlabroot%/toolbox/matlab/iofun/sendmail.m. You will need administrator rights to edit this file. You will also need to redo the fix whenever you install Matlab, either installation on a different machine, or installing a new Matlab release. In general, I discourage changing Matlab’s internal files because it is simply not very maintainable.
Copying %matlabroot%/toolbox/matlab/iofun/sendmail.m into a dedicated wrapper function (e.g., sendEmail.m) that has a similar function signature and exists on the Matlab path. This has the benefit of working on multiple Matlab releases, and being copied along with the rest of our m-files when we install our Matlab program on a different computer. The downside is that our wrapper function will be stuck with the version of sendmail.m that we copied into it, and we’d lose any possible improvements that Mathworks may implement in future Matlab releases.
The basic idea for the second alternative, the sendEmail.m wrapper, is something like this (the top highlighted lines are the additions made to the original sendmail.m, with everything placed in sendEmail.m on the Matlab path):
function sendEmail(to,subject,theMessage,attachments)%SENDEMAIL Send e-mail wrapper (with HTML formatting) sendmail(to,subject,theMessage,attachments); % The rest of this file is copied from %matlabroot%/toolbox/matlab/iofun/sendmail.m (with the modifications mentioned above): function sendmail(to,subject,theMessage,attachments) %SENDMAIL Send e-mail. % SENDMAIL(TO,SUBJECT,MESSAGE,ATTACHMENTS) sends an e-mail. TO is either a % character vector specifying a single address, or a cell array of character vector ...
We would then call the wrapper function as follows:
sendEmail('abc@gmail.com', 'email subject', 'regular text message'); % will send a regular text message sendEmail('abc@gmail.com', 'email subject', 'HTML-formatted message'); % HTML-formatted message
In this case, the code automatically infers HTML formatting based on whether the first character in the message body is a ‘<‘ character. Instead, we could just as easily have passed an additional input argument (isHtml) to our sendEmail wrapper function.
Hopefully, in some future Matlab release Mathworks will be kind enough to enable sending 21^st-century HTML-formatted emails without needing such hacks. Until then, note that sendmail.m relies on standard non-GUI Java networking classes, which are expected to be supported far into the future, well after Java-based GUI may cease to be supported in Matlab. For this reason I believe that while it seems a bit tricky, the changes that I outlined in today’s post actually have a low risk of breaking in a future Matlab release.
Do you have some other advanced email feature that you use in your Matlab program by some crafty customization to sendmail? If so, please share it in a comment below.
Related posts:
Uitable sorting – Matlab's uitables can be sortable using simple undocumented features...
Uitable customization report – Matlab's uitable can be customized in many different ways. A detailed report explains how. ...
Types of undocumented Matlab aspects – This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
Multi-line uitable column headers – Matlab uitables can present long column headers in multiple lines, for improved readability. ...

Parsing XML strings

Yair Altman — Wed, 01 Feb 2017 09:52:45 +0000

I have recently consulted in a project where data was provided in XML strings and needed to be parsed in Matlab memory in an efficient manner (in other words, as quickly as possible). Now granted, XML is rather inefficient in storing data (JSON would be much better for this, for example). But I had to work with the given situation, and that required processing the XML.
I basically had two main alternatives:
I could either create a dedicated string-parsing function that searches for a particular pattern within the XML string, or
I could use a standard XML-parsing library to create the XML model and then parse its nodes
The first alternative is quite error-prone, since it relies on the exact format of the data in the XML. Since the same data can be represented in multiple equivalent XML ways, making the string-parsing function robust as well as efficient would be challenging. I was ~~lazy~~ expedient, so I chose the second alternative.
Unfortunately, Matlab’s xmlread function only accepts input filenames (of *.xml files), it cannot directly parse XML strings. Yummy!
The obvious and simple solution is to simply write the XML string into a temporary *.xml file, read it with xmlread, and then delete the temp file:
% Store the XML data in a temp *.xml file filename = [tempname '.xml']; fid = fopen(filename,'Wt'); fwrite(fid,xmlString); fclose(fid); % Read the file into an XML model object xmlTreeObject = xmlread(filename); % Delete the temp file delete(filename); % Parse the XML model object ...
This works well and we could move on with our short lives. But cases such as this, where a built-in function seems to have a silly limitation, really fire up the investigative reporter in me. I decided to drill into xmlread to discover why it couldn’t parse XML strings directly in memory, without requiring costly file I/O. It turns out that xmlread accepts not just file names as input, but also Java object references (specifically, java.io.File, java.io.InputStream or org.xml.sax.InputSource). In fact, there are quite a few other inputs that we could use, to specify a validation parser etc. – I wrote about this briefly back in 2009 (along with other similar semi-documented input altermatives in xmlwrite and xslt).

In our case, we could simply send xmlread as input a java.io.StringBufferInputStream(xmlString) object (which is an instance of java.io.InputStream) or org.xml.sax.InputSource(java.io.StringReader(xmlString)):
% Read the xml string directly into an XML model object inputObject = java.io.StringBufferInputStream(xmlString); % alternative #1 inputObject = org.xml.sax.InputSource(java.io.StringReader(xmlString)); % alternative #2 xmlTreeObject = xmlread(inputObject); % Parse the XML model object ...
If we don’t want to depend on undocumented functionality (which might break in some future release, although it has remained unchanged for at least the past decade), and in order to improve performance even further by passing xmlread‘s internal validity checks and processing, we can use xmlread‘s core functionality to parse our XML string directly. We can add a fallback to the standard (fully-documented) functionality, just in case something goes wrong (which is good practice whenever using any undocumented functionality):
try % The following avoids the need for file I/O: inputObject = java.io.StringBufferInputStream(xmlString); % or: org.xml.sax.InputSource(java.io.StringReader(xmlString)) try % Parse the input data directly using xmlread's core functionality parserFactory = javaMethod('newInstance','javax.xml.parsers.DocumentBuilderFactory'); p = javaMethod('newDocumentBuilder',parserFactory); xmlTreeObject = p.parse(inputObject); catch % Use xmlread's semi-documented inputObject input feature xmlTreeObject = xmlread(inputObject); end catch % Fallback to standard xmlread usage, using a temporary XML file: % Store the XML data in a temp *.xml file filename = [tempname '.xml']; fid = fopen(filename,'Wt'); fwrite(fid,xmlString); fclose(fid); % Read the file into an XML model object xmlTreeObject = xmlread(filename); % Delete the temp file delete(filename); end % Parse the XML model object ...
Related posts:
Undocumented XML functionality – Matlab's built-in XML-processing functions have several undocumented features that can be used by Java-savvy users...
Matlab-Java memory leaks, performance – Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
Types of undocumented Matlab aspects – This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
Pause for the better – Java's thread sleep() function is much more accurate than Matlab's pause() function. ...

Afterthoughts on implicit expansion

Yair Altman — Wed, 30 Nov 2016 20:28:44 +0000

Matlab release R2016b introduced implicit arithmetic expansion, which is a great and long-awaited natural expansion of Matlab’s arithmetic syntax (if you are still unaware of this or what it means, now would be a good time to read about it). This is a well-documented new feature. The reason for today’s post is that this new feature contains an undocumented aspect that should very well have been documented and even highlighted.
The undocumented aspect that I’m referring to is the fact that code that until R2016a produced an error, in R2016b produces a valid result:
% R2016a >> [1:5] + [1:3]' Error using + Matrix dimensions must agree. % R2016b >> [1:5] + [1:3]' ans = 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8
This incompatibility is indeed documented, but not where it matters most (read on).
I first discovered this feature by chance when trying to track down a very strange phenomenon with client code that produced different numeric results on R2015b and earlier, compared to R2016a Pre-release. After some debugging the problem was traced to a code snippet in the client’s code that looked something like this (simplified):
% Ensure compatible input data try dataA + dataB; % this will (?) error if dataA, dataB are incompatible catch dataB = dataB'; end
The code snippet relied on the fact that incompatible data (row vs. col) would error when combined, as it did up to R2015b. But in R2016a Pre-release it just gave a valid numeric matrix, which caused numerically incorrect results downstream in the code. The program never crashed, so everything appeared to be in order, it just gave different numeric results. I looked at the release notes and none of the mentioned release incompatibilities appeared relevant. It took me quite some time, using side-by-side step-by-step debugging on two separate instances of Matlab (R2015b and R2016aPR) to trace the problem to this new feature.
This implicit expansion feature was removed from the official R2016a release for performance reasons. This was apparently fixed in time for R2016b’s release.
I’m totally in favor of this great new feature, don’t get me wrong. I’ve been an ardent user of bsxfun for many years and (unlike many) have even grown fond of it, but I still find the new feature to be better. I use it wherever there is no significant performance penalty, a need to support older Matlab releases, or a possibility of incorrect results due to dimensional mismatch.
So what’s my point?
What I am concerned about is that I have not seen the new feature highlighted as a potential backward compatibility issue in the documentation or the release notes. Issues of far lesser importance are clearly marked for their backward incompatibility in the release notes, but not this important major change. A simple marking of the new feature with the warning icon () and in the “Functionality being removed or changed” section would have saved my client and me a lot of time and frustration.
MathWorks are definitely aware of the potential problems that the new feature might cause in rare use cases such as this. As Steve Eddins recently noted, there were plenty of internal discussions about this very thing. MathWorks were careful to ensure that the feature’s benefits far outweigh its risks (and I concur). But this also highlights the fact that MathWorks were fully aware that in some rare cases it might indeed break existing code. For those cases, I believe that they should have clearly marked the incompatibility implications in the release notes and elsewhere.
I have several clients who scour Matlab’s release notes before each release, trying to determine the operational risk of a Matlab upgrade. Having a program that returns different results in R2016b compared to R2016a, without being aware of this risk, is simply unacceptable to them, and leaves users with a disinclination to upgrade Matlab, to MathWorks’ detriment.
MathWorks in general are taking a very serious methodical approach to compatibility issues, and are clearly investing a lot of energy in this (a recent example). It’s too bad that sometimes this chain is broken. I find it a pity, and think that this can still be corrected in the online doc pages. If and when this is fixed, I’ll be happy to post an addendum here.
In my humble opinion from the backbenches, increasing the transparency on compatibility issues and open bugs will increase user confidence and result in greater adoption and upgrades of Matlab. Just my 2 cents…
Addendum December 27, 2016:
Today MathWorks added the following compatibility warning to the release notes (R2016b, Mathematics section, first item) – thanks for listening MathWorks
Related posts:
tic / toc – undocumented option – Matlab's built-in tic/toc functions have an undocumented option enabling multiple nested clockings...
Plot LimInclude properties – The plot objects' XLimInclude, YLimInclude, ZLimInclude, ALimInclude and CLimInclude properties are an important feature, that has both functional and performance implications....
Matrix processing performance – Matrix operations performance is affected by internal subscriptions in a counter-intuitive way....
Performance: accessing handle properties – Handle object property access (get/set) performance can be significantly improved using dot-notation. ...

Speeding up Matlab-JDBC SQL queries

Yair Altman — Wed, 16 Nov 2016 11:43:17 +0000

Many of my consulting projects involve interfacing a Matlab program to an SQL database. In such cases, using MathWorks’ Database Toolbox is a viable solution. Users who don’t have the toolbox can also easily connect directly to the database using either the standard ODBC bridge (which is horrible for performance and stability), or a direct JDBC connection (which is also what the Database Toolbox uses under the hood). I explained this Matlab-JDBC interface in detail in chapter 2 of my Matlab-Java programming book. A bare-bones implementation of an SQL SELECT query follows (data update queries are a bit different and will not be discussed here):
% Load the appropriate JDBC driver class into Matlab's memory % (but not directly, to bypass JIT pre-processing - we must do it in run-time!) driver = eval('com.mysql.jdbc.Driver'); % or com.microsoft.sqlserver.jdbc.SQLServerDriver or whatever % Connect to DB dbPort = '3306'; % mySQL=3306; SQLServer=1433; Oracle=... connectionStr = ['jdbc:mysql://' dbURL ':' dbPort '/' schemaName]; % or ['jdbc:sqlserver://' dbURL ':' dbPort ';database=' schemaName ';'] or whatever dbConnObj = java.sql.DriverManager.getConnection(connectionStr, username, password); % Send an SQL query statement to the DB and get the ResultSet stmt = dbConnObj.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE, java.sql.ResultSet.CONCUR_READ_ONLY); try stmt.setFetchSize(1000); catch, end % the default fetch size is ridiculously small in many DBs rs = stmt.executeQuery(sqlQueryStr); % Get the column names and data-types from the ResultSet's meta-data MetaData = rs.getMetaData; numCols = MetaData.getColumnCount; data = cell(0,numCols); % initialize for colIdx = numCols : -1 : 1 ColumnNames{colIdx} = char(MetaData.getColumnLabel(colIdx)); ColumnType{colIdx} = char(MetaData.getColumnClassName(colIdx)); % http://docs.oracle.com/javase/7/docs/api/java/sql/Types.html end ColumnType = regexprep(ColumnType,'.*\.',''); % Get the data from the ResultSet into a Matlab cell array rowIdx = 1; while rs.next % loop over all ResultSet rows (records) for colIdx = 1 : numCols % loop over all columns in the row switch ColumnType{colIdx} case {'Float','Double'} data{rowIdx,colIdx} = rs.getDouble(colIdx); case {'Long','Integer','Short','BigDecimal'} data{rowIdx,colIdx} = double(rs.getDouble(colIdx)); case 'Boolean' data{rowIdx,colIdx} = logical(rs.getBoolean(colIdx)); otherwise %case {'String','Date','Time','Timestamp'} data{rowIdx,colIdx} = char(rs.getString(colIdx)); end end rowIdx = rowIdx + 1; end % Close the connection and clear resources try rs.close(); catch, end try stmt.close(); catch, end try dbConnObj.closeAllStatements(); catch, end try dbConnObj.close(); catch, end % comment this to keep the dbConnObj open and reuse it for subsequent queries
Naturally, in a real-world implementation you also need to handle database timeouts and various other errors, handle data-manipulation queries (not just SELECTs), etc.
Anyway, this works well in general, but when you try to fetch a ResultSet that has many thousands of records you start to feel the pain – The SQL statement may execute much faster on the DB server (the time it takes for the stmt.executeQuery call), yet the subsequent double-loop processing to fetch the data from the Java ResultSet object into a Matlab cell array takes much longer.
In one of my recent projects, performance was of paramount importance, and the DB query speed from the code above was simply not good enough. You might think that this was due to the fact that the data cell array is not pre-allocated, but this turns out to be incorrect: the speed remains nearly unaffected when you pre-allocate data properly. It turns out that the main problem is due to Matlab’s non-negligible overhead in calling methods of Java objects. Since the JDBC interface only enables retrieving a single data item at a time (in other words, bulk retrieval is not possible), we have a double loop over all the data’s rows and columns, in each case calling the appropriate Java method to retrieve the data based on the column’s type. The Java methods themselves are extremely efficient, but when you add Matlab’s invocation overheads the total processing time is much much slower.
So what can be done? As Andrew Janke explained in much detail, we basically need to push our double loop down into the Java level, so that Matlab receives arrays of primitive values, which can then be processed in a vectorized manner in Matlab.
So let’s create a simple Java class to do this:
// Copyright (c) Yair Altman UndocumentedMatlab.com import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Types; public class JDBC_Fetch { public static int DEFAULT_MAX_ROWS = 100000; // default cache size = 100K rows (if DB does not support non-forward-only ResultSets) public static Object[] getData(ResultSet rs) throws SQLException { try { if (rs.last()) { // data is available int numRows = rs.getRow(); // row # of the last row rs.beforeFirst(); // get back to the top of the ResultSet return getData(rs, numRows); // fetch the data } else { // no data in the ResultSet return null; } } catch (Exception e) { return getData(rs, DEFAULT_MAX_ROWS); } } public static Object[] getData(ResultSet rs, int maxRows) throws SQLException { // Read column number and types from the ResultSet's meta-data ResultSetMetaData metaData = rs.getMetaData(); int numCols = metaData.getColumnCount(); int[] colTypes = new int[numCols+1]; int numDoubleCols = 0; int numBooleanCols = 0; int numStringCols = 0; for (int colIdx = 1; colIdx <= numCols; colIdx++) { int colType = metaData.getColumnType(colIdx); switch (colType) { case Types.FLOAT: case Types.DOUBLE: case Types.REAL: colTypes[colIdx] = 1; // double numDoubleCols++; break; case Types.DECIMAL: case Types.INTEGER: case Types.TINYINT: case Types.SMALLINT: case Types.BIGINT: colTypes[colIdx] = 1; // double numDoubleCols++; break; case Types.BIT: case Types.BOOLEAN: colTypes[colIdx] = 2; // boolean numBooleanCols++; break; default: // 'String','Date','Time','Timestamp',... colTypes[colIdx] = 3; // string numStringCols++; } } // Loop over all ResultSet rows, reading the data into the 2D matrix caches int rowIdx = 0; double [][] dataCacheDouble = new double [numDoubleCols] [maxRows]; boolean[][] dataCacheBoolean = new boolean[numBooleanCols][maxRows]; String [][] dataCacheString = new String [numStringCols] [maxRows]; while (rs.next() && rowIdx < maxRows) { int doubleColIdx = 0; int booleanColIdx = 0; int stringColIdx = 0; for (int colIdx = 1; colIdx <= numCols; colIdx++) { try { switch (colTypes[colIdx]) { case 1: dataCacheDouble[doubleColIdx++][rowIdx] = rs.getDouble(colIdx); break; // numeric case 2: dataCacheBoolean[booleanColIdx++][rowIdx] = rs.getBoolean(colIdx); break; // boolean default: dataCacheString[stringColIdx++][rowIdx] = rs.getString(colIdx); break; // string } } catch (Exception e) { System.out.println(e); System.out.println(" in row #" + rowIdx + ", col #" + colIdx); } } rowIdx++; } // Return only the actual data in the ResultSet int doubleColIdx = 0; int booleanColIdx = 0; int stringColIdx = 0; Object[] data = new Object[numCols]; for (int colIdx = 1; colIdx <= numCols; colIdx++) { switch (colTypes[colIdx]) { case 1: data[colIdx-1] = dataCacheDouble[doubleColIdx++]; break; // numeric case 2: data[colIdx-1] = dataCacheBoolean[booleanColIdx++]; break; // boolean default: data[colIdx-1] = dataCacheString[stringColIdx++]; // string } } return data; } }
So now we have a JDBC_Fetch class that we can use in our Matlab code, replacing the slow double loop with a single call to JDBC_Fetch.getData(), followed by vectorized conversion into a Matlab cell array (matrix):
% Get the data from the ResultSet using the JDBC_Fetch wrapper data = cell(JDBC_Fetch.getData(rs)); for colIdx = 1 : numCols switch ColumnType{colIdx} case {'Float','Double'} data{colIdx} = num2cell(data{colIdx}); case {'Long','Integer','Short','BigDecimal'} data{colIdx} = num2cell(data{colIdx}); case 'Boolean' data{colIdx} = num2cell(data{colIdx}); otherwise %case {'String','Date','Time','Timestamp'} %data{colIdx} = cell(data{colIdx}); % no need to do anything here! end end data = [data{:}];
On my specific program the resulting speedup was 15x (this is not a typo: 15 times faster). My fetches are no longer limited by the Matlab post-processing, but rather by the DB’s processing of the SQL statement (where DB indexes, clustering, SQL tuning etc. come into play).
Additional speedups can be achieved by parsing dates at the Java level (rather than returning strings), as well as several other tweaks in the Java and Matlab code (refer to Andrew Janke’s post for some ideas). But certainly the main benefit (the 80% of the gain that was achieved in 20% of the worktime) is due to the above push of the main double processing loop down into the Java level, leaving Matlab with just a single Java call to JDBC_Fetch.
Many additional ideas of speeding up database queries and Matlab programs in general can be found in my second book, Accelerating Matlab Performance.
If you’d like me to help you speed up your Matlab program, please email me (altmany at gmail), or fill out the query form on my consulting page.
Related posts:
Using SQLite in Matlab – SQLite databases can be accessed in a variety of different ways in Matlab. ...
Borderless button used for plot properties – A borderless button can be used to add unobtrusive functionality to plot axes...
Matlab-Java memory leaks, performance – Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
File deletion memory leaks, performance – Matlab's delete function leaks memory and is also slower than the equivalent Java function. ...

Working with non-standard DPI displays

Yair Altman — Wed, 09 Nov 2016 21:47:27 +0000

With high-density displays becoming increasingly popular, some users set their display’s DPI to a higher-than-standard (i.e., >100%) value, in order to compensate for the increased pixel density to achieve readable interfaces. This OS setting tells the running applications that there are fewer visible screen pixels, and these are spread over a larger number of physical pixels. This works well for most cases (at least on recent OSes, it was a bit buggy in non-recet ones). Unfortunately, in some cases we might actually want to know the screen size in physical, rather than logical, pixels. Apparently, Matlab root’s ScreenSize property only reports the logical (scaled) pixel size, not the physical (unscaled) one:
>> get(0,'ScreenSize') % with 100% DPI (unscaled standard) ans = 1 1 1366 768 >> get(0,'ScreenSize') % with 125% DPI (scaled) ans = 1 1 1092.8 614.4
The same phenomenon also affects other related properties, for example MonitorPositions.
Raimund Schlüßler, a reader on this blog, was kind enough to point me to this problem and its workaround, which I thought worthy to share here: To get the physical screen-size, use the following builtin Java command:
>> jScreenSize = java.awt.Toolkit.getDefaultToolkit.getScreenSize jScreenSize = java.awt.Dimension[width=1366,height=768] >> width = jScreenSize.getWidth width = 1366 >> height = jScreenSize.getHeight height = 768
Also see the related recent article on an issue with the DPI-aware feature starting with R2015b.
Upcoming travels – London/Belfast, Zürich & Geneva
I will shortly be traveling to consult some clients in Belfast (via London), Zürich and Geneva. If you are in the area and wish to meet me to discuss how I could bring value to your work, then please email me (altmany at gmail):
Belfast: Nov 28 – Dec 1 (flying via London)
Zürich: Dec 11-12
Geneva: Dec 13-15
Related posts:
FindJObj – find a Matlab component’s underlying Java object – The FindJObj utility can be used to access and display the internal components of Matlab controls and containers. This article explains its uses and inner mechanism....
FindJObj GUI – display container hierarchy – The FindJObj utility can be used to present a GUI that displays a Matlab container's internal Java components, properties and callbacks....
Blurred Matlab figure window – Matlab figure windows can be blurred using a semi-transparent overlaid window - this article explains how...
Borderless button used for plot properties – A borderless button can be used to add unobtrusive functionality to plot axes...

Aligning uicontrol contents

Yair Altman — Thu, 22 Sep 2016 13:10:18 +0000

Matlab automatically aligns the text contents of uicontrols: button labels are centered, listbox contents are left-aligned, and table cells align depending on their contents (left-aligned for strings, centered for logical values, and right-aligned for numbers). Unfortunately, the control’s HorizontalAlignment property is generally ignored by uicontrols. So how can we force Matlab buttons (for example) to have right-aligned labels, or for listbox/table cells to be centered? Undocumented Matlab has the answer, yet again…
It turns out that there are at least two distinct ways to set uicontrol alignment, using HTML and using Java. Today I will only discuss the HTML variant.
The HTML method relies on the fact that Matlab uicontrols accept and process HTML strings. This was true ever since Matlab GUI started relying on Java Swing components (which inherently accept HTML labels) over a decade ago. This is expected to remain true even in Matlab’s upcoming web-based GUI system, since Matlab would need to consciously disable HTML in its web components, and I see no reason for MathWorks to do so. In short, HTML parsing of GUI control strings is here to stay for the foreseeable future.
% note: no need to close HTML tags, e.g. uicontrol('Style','list', 'Position',[10,10,70,70], 'String', ... {'Hello', 'world', ... 'What a', ... 'nice day!'});
Listbox with HTML items
While HTML formatting is generally frowned-upon compared to the alternatives, it provides a very quick and easy way to format text labels in various different manners, including using a combination of font faces, sizes, colors and other aspects (bold, italic, super/sub-script, underline etc.) within a single text label. This is naturally impossible to do with Matlab’s standard properties, but is super-easy with HTML placed in the label’s String property.
Unfortunately, while Java Swing (and therefore Matlab) honors only a [large] sub-set of HTML and CSS. The most important directives are parsed but some others are not, and this is often difficult to debug. Luckily, using HTML and CSS there are often multiple ways to achieve the same visual effect, so if one method fails we can usually find an alternative. Such was the case when a reader asked me why the following seemingly-simple HTML snippet failed to right-align his button label:
hButton.String = '
text';
As I explained in my answer, it’s not Matlab that ignores the CSS align directive but rather the underlying Swing behavior, which snugly fits the text in the center of the button, and of course aligning text within a tight-fitting box has no effect. The workaround that I suggested simply forces Swing to use a non-tightly-fitting boundary box, within which we can indeed align the text:
pxPos = getpixelposition(hButton); hButton.String = ['
num2str(pxPos(3)-20) 'px" align="right">text']; % button margins use 20px

Centered (default) and right-aligned button labels
This solution is very easy to set up and maintain, and requires no special knowledge other than a bit of HTML/CSS, which most programmers know in this day and age.
Of course, the solution relies on the actual button size. So, if the button is created with normalized units and changes its size when its parent container is resized, we’d need to set a callback function on the parent (e.g., SizeChangedFcn of a uipanel) to automatically adjust the button’s string based on its updated size. A better solution that would be independent of the button’s pixel-size and would work even when the button is resized needs to use Java.
A related solution for table cells uses a different HTML-based trick: this time, we embed an HTML table cell within the Matlab control’s cell, employing the fact that HTML table cells can easily be aligned. We just need to ensure that the HTML cell is defined to be larger than the actual cell width, so that the alignment fits well. We do this by setting the HTML cell width to 9999 pixels (note that the tr and td HTML tags are necessary, but the table tag is optional):
uitable('Units','norm','Pos',[0,0,0.3,0.3], 'Data', ... {'Left', ... 'Center', ... 'Right'});
Non-default alignment of uitable cells
As noted above, a better solution might be to set the underlying Java component’s alignment properties (or in the case of the uitable, its underlying JTable component’s cellrenderer’s alignment). But in the general case, simple HTML such as above could well be sufficient.
Related posts:
Spicing up Matlab uicontrol tooltips – Matlab uicontrol tooltips can be spiced-up using HTML and CSS, including fonts, colors, tables and images...
Rich-contents log panel – Matlab listboxes and editboxes can be used to display rich-contents HTML-formatted strings, which is ideal for log panels. ...
Multi-line uitable column headers – Matlab uitables can present long column headers in multiple lines, for improved readability. ...
Undocumented button highlighting – Matlab button uicontrols can easily be highlighted by simply setting their Value property. ...

Zero-testing performance

Yair Altman — Wed, 31 Aug 2016 17:00:44 +0000

I would like to introduce guest blogger Ken Johnson, a MATLAB Connections partner specializing in electromagnetic optics simulation. Today Ken will explore some performance subtleties of zero testing in Matlab.
I often have a need to efficiently test a large Matlab array for any nonzero elements, e.g.
>> a = zeros(1e4); >> tic, b = any(a(:)~=0); toc Elapsed time is 0.126118 seconds.
Simple enough. In this case, when a is all-zero, the internal search algorithm has no choice but to inspect every element of the array to determine whether it contains any nonzeros. In the more typical case where a contains many nonzeros you would expect the search to terminate almost immediately, as soon as it finds the first nonzero. But that’s not how it works:
>> a = round(rand(1e4)); >> tic, b = any(a(:)~=0); toc Elapsed time is 0.063404 seconds.
There is significant runtime overhead in constructing the logical array “a(:)~=0”, although the “any(…)” operation apparently terminates at the first true value it finds.
The overhead can be eliminated by taking advantage of the fact that numeric values may be used as logicals in Matlab, with zero implicitly representing false and nonzero representing true. Repeating the above test without “~=0”, we get a huge runtime improvement:
>> a = round(rand(1e4)); >> tic, b = any(a(:)); toc Elapsed time is 0.000026 seconds.
However, there is no runtime benefit when a is all-zero:
>> a = zeros(1e4); >> tic, b = any(a(:)); toc Elapsed time is 0.125120 seconds.
(I do not quite understand this. There should be some runtime benefit from bypassing the logical array construction.)
NaN values
There is also another catch: The above efficiency trick does not work when a contains NaN values (if you consider NaN to be nonzero), e.g.
>> any([0,nan]) ans = 0
The any function ignores entries that are NaN, meaning it treats NaNs as zero-equivalent. This is inconsistent with the behavior of the inequality operator:
>> any([0,nan]~=0) ans = 1
To avoid this problem, an explicit isnan test is needed. Efficiency is not impaired when a contains many nonzeros, but there is a 2x efficiency loss when a is all-zero:
>> a = round(rand(1e4)); >> tic, b = any(a(:)) || any(isnan(a(:))); toc Elapsed time is 0.000027 seconds. >> a = zeros(1e4); >> tic, b = any(a(:)) || any(isnan(a(:))); toc Elapsed time is 0.256604 seconds.
For testing all-nonzero the NaN problem does not occur:
>> all([1 nan]) ans = 1
In this context NaN is treated as nonzero and the all-nonzero test is straightforward:
>> a = round(rand(1e4)); >> tic, b = all(a(:)); toc Elapsed time is 0.000029 seconds.
For testing any-zero and all-zero, use the complements of the above tests:
>> b = ~any(a(:)) || any(isnan(a(:))); % all zero? >> b = ~all(a(:)); % any zero?
Efficient find
The find operation can also be optimized by bypassing construction of a logical temporary array, e.g.
>> a = round(rand(1e4)); >> tic, b = find(a(:)~=0, 1); toc Elapsed time is 0.065697 seconds. >> tic, b = find(a(:), 1); toc Elapsed time is 0.000029 seconds.
There is no problem with NaNs in this case; the find function treats NaN as nonzero, e.g.
>> find([0,nan,1], 1) ans = 2
Related posts:
uicontextmenu performance – Matlab uicontextmenus are not automatically deleted with their associated objects, leading to leaks and slow-downs. ...
rmfield performance – The performance of the builtin rmfield function (as with many other builtin functions) can be improved by simple profiling. ...
tic / toc – undocumented option – Matlab's built-in tic/toc functions have an undocumented option enabling multiple nested clockings...
Solving a MATLAB bug by subclassing – Matlab's Image Processing Toolbox's impoint function contains an annoying bug that can be fixed using some undocumented properties....

Customizing axes part 5 – origin crossover and labels

Yair Altman — Wed, 27 Jul 2016 17:00:02 +0000

When HG2 graphics was finally released in R2014b, I posted a series of articles about various undocumented ways by which we can customize Matlab’s new graphic axes: rulers (axles), baseline, box-frame, grid, back-drop, and other aspects. Today I extend this series by showing how we can customize the axes rulers’ crossover location.
Non-default axes crossover location

The documented/supported stuff
Until R2015b, we could only specify the axes’ YAxisLocation as 'left' (default) or 'right', and XAxisLocation as 'bottom' (default) or 'top'. For example:
x = -2*pi : .01 : 2*pi; plot(x, sin(x)); hAxis = gca; hAxis.YAxisLocation = 'left'; % 'left' (default) or 'right' hAxis.XAxisLocation = 'bottom'; % 'bottom' (default) or 'top'
Default axis locations: axes crossover is non-fixed
The crossover location is non-fixed in the sense that if we zoom or pan the plot, the axes crossover will remain at the bottom-left corner, which changes its coordinates depending on the X and Y axes limits.
Since R2016a, we can also specify 'origin' for either of these properties, such that the X and/or Y axes pass through the chart origin (0,0) location. For example, move the YAxisLocation to the origin:
hAxis.YAxisLocation = 'origin';
Y-axis location at origin: axes crossover at 0 (fixed), -1 (non-fixed)
And similarly also for XAxisLocation:
hAxis.XAxisLocation = 'origin';
X and Y-axis location at origin: axes crossover fixed at (0,0)
The axes crossover location is now fixed at the origin (0,0), so as we move or pan the plot, the crossover location changes its position in the chart area, without changing its coordinates. This functionality has existed in other graphic packages (outside Matlab) for a long time and until now required quite a bit of coding to emulate in Matlab, so I’m glad that we now have it in Matlab by simply updating a single property value. MathWorks did a very nice job here of dynamically updating the axles, ticks and labels as we pan (drag) the plot towards the edges – try it out!
The undocumented juicy stuff
So far for the documented stuff. The undocumented aspect is that we are not limited to using the (0,0) origin point as the fixed axes crossover location. We can use any x,y crossover location, using the FirstCrossoverValue property of the axes’ hidden XRuler and YRuler properties. In fact, we could do this since R2014b, when the new HG2 graphics engine was released, not just starting in R2016a!
% Set a fixed crossover location of (pi/2,-0.4) hAxis.YRuler.FirstCrossoverValue = pi/2; hAxis.XRuler.FirstCrossoverValue = -0.4;
Custom fixed axes crossover location at (π/2,-0.4)
For some reason (bug?), setting XAxisLocation/YAxisLocation to ‘origin’ has no visible effect in 3D plots, nor is there any corresponding ZAxisLocation property. Luckily, we can set the axes crossover location(s) in 3D plots using FirstCrossoverValue just as easily as for 2D plots. The rulers also have a SecondCrossoverValue property (default = -inf) that controls the Z-axis crossover, as Yaroslav pointed out in a comment below. For example:
N = 49; x = linspace(-10,10,N); M = peaks(N); mesh(x,x,M);
Default crossover locations at (-10,±10,-10)
hAxis.XRuler.FirstCrossoverValue = 0; % X crossover with Y axis hAxis.YRuler.FirstCrossoverValue = 0; % Y crossover with X axis hAxis.ZRuler.FirstCrossoverValue = 0; % Z crossover with X axis hAxis.ZRuler.SecondCrossoverValue = 0; % Z crossover with Y axis
Custom fixed axes crossover location at (0,0,-10)
hAxis.XRuler.SecondCrossoverValue = 0; % X crossover with Z axis hAxis.YRuler.SecondCrossoverValue = 0; % Y crossover with Z axis
Custom fixed axes crossover location at (0,0,0)
Labels
Users will encounter the following unexpected behavior (bug?) when using either the documented *AxisLocation or the undocumented FirstCrossoverValue properties: when setting an x-label (using the xlabel function, or the internal axes properties), the label moves from the center of the axes (as happens when XAxisLocation=’top’ or ‘bottom’) to the right side of the axes, where the secondary label (e.g., exponent) usually appears, whereas the secondary label is moved to the left side of the axis:
Unexpected label positions
In such cases, we would expect the labels locations to be reversed, with the main label on the left and the secondary label in its customary location on the right. The exact same situation occurs with the Y labels, where the main label unexpectedly appears at the top and the secondary at the bottom. Hopefully MathWorks will fix this in the next release (it is probably too late to make it into R2016b, but hopefully R2017a). Until then, we can simply switch the strings of the main and secondary label to make them appear at the expected locations:
% Switch the Y-axes labels: ylabel(hAxis, '\times10^{3}'); % display secondary ylabel (x10^3) at top set(hAxis.YRuler.SecondaryLabel, 'Visible','on', 'String','main y-label'); % main label at bottom % Switch the X-axes labels: xlabel(hAxis, '2^{nd} label'); % display secondary xlabel at right set(hAxis.XRuler.SecondaryLabel, 'Visible','on', 'String','xlabel'); % main label at left
As can be seen from the screenshot, there’s an additional nuisance: the main label appears a bit larger than the axes font size (the secondary label uses the correct font size). This is because by default Matlab uses a 110% font-size for the main axes label, ostensibly to make them stand out. We can modify this default factor using the rulers’ hidden LabelFontSizeMultiplier property (default=1.1). For example:
hAxis.YRuler.LabelFontSizeMultiplier = 1; % use 100% font-size (same as tick labels) hAxis.XRuler.LabelFontSizeMultiplier = 0.8; % use 80% (smaller than standard) font-size
Note: I described the ruler objects in my first article of the axes series. Feel free to read it for more ideas on customizing the axes rulers.
Related posts:
Customizing axes rulers – HG2 axes can be customized in numerous useful ways. This article explains how to customize the rulers. ...
Customizing axes part 2 – Matlab HG2 axes can be customized in many different ways. This article explains some of the undocumented aspects. ...
Undocumented scatter plot jitter – Matlab's scatter plot can automatically jitter data to enable better visualization of distribution density. ...
HG2 update – HG2 appears to be nearing release. It is now a stable mature system. ...

Low risk of breaking in future versions – Undocumented Matlab

Undocumented plot marker types

USA visit

Multi-threaded Mex

Source code and discussion

Speed-up tips

Debugging MEX files

Consulting

String/char compatibility

Blocked wait with timeout for asynchronous events

The basic idea

Using a stand-alone generic signaling object

Speeding-up builtin Matlab functions – part 2

Profiling

Optimizing input args pre-processing

Vectorizing the main loop

Related functions

Conclusions

Speeding-up builtin Matlab functions – part 1

Profiling

The speed-up improvement process

Speeding-up stablefit()

Speeding-up stablelike()

Conclusions

Adding custom properties to GUI objects

Customizing axes tick labels

Using SQLite in Matlab

sqlite4java

JDBC

Database toolbox

mksqlite

Other alternatives

PlotEdit context-menu customization

Tips for accelerating Matlab performance

Faster csvwrite/dlmwrite

Trapping Ctrl-C interrupts

Matlab performance webinars

Sending HTML emails from Matlab

Deployment

Parsing XML strings

Afterthoughts on implicit expansion

So what’s my point?

Addendum December 27, 2016:

Speeding up Matlab-JDBC SQL queries

Working with non-standard DPI displays

Upcoming travels – London/Belfast, Zürich & Geneva

Aligning uicontrol contents

Zero-testing performance

NaN values

Efficient find

Customizing axes part 5 – origin crossover and labels

The documented/supported stuff

The undocumented juicy stuff

Labels