Posts Tagged ‘performance’

Plot performance

Wednesday, June 16th, 2010

I recently consulted to a client who wanted to display an interactive plot with numerous data points that kept updating in real-time. Matlab’s standard plotting functions simply could not keep up with the rate of data change. Today, I want to share a couple of very simple undocumented hacks that significantly improve plotting performance and fixed my problem.

I begin by stating the obvious: whenever possible, try to vectorize your code, preallocate the data and other performance-improving techniques suggested by Matlab. Unfortunately, sometimes (as in my specific case above) all these cannot help. Even in such cases, we can still find important performance tricks, such as these:

Performance hack #1: manual limits

Whenever Matlab updates plot data, it checks whether any modification needs to be done to any of its limits. This computation-intensive task is done for any limit that is set to ‘Auto’ mode, which is the default axes limits mode. If instead we manually set the axes limits to the requested range, Matlab skips these checks, enabling much faster plotting performance.

Let us simulate the situation by adding 500 data points to a plot, one at a time:

>> x=0:0.02:10; y=sin(x);
>> clf; cla; tic;
>> drawnow; plot(x(1),y(1)); hold on; legend data;
>> for idx = 1 : length(x); plot(x(idx),y(idx)); drawnow; end;
>> toc
 
Elapsed time is 21.583668 seconds.

simple plot with 500 data points

simple plot with 500 data points

And now let’s use static axes limits:

>> x=0:0.02:10; y=sin(x);
>> clf; cla; tic; drawnow; 
>> plot(x(1),y(1)); hold on; legend data;
 
>> xlim([0,10]); ylim([-1,1]);  % static limits
 
>> for idx = 1 : length(x); plot(x(idx),y(idx)); drawnow; end;
>> toc
 
Elapsed time is 16.863090 seconds.

Note that this trick is the basis for the performance improvement that occurs when using the plot’s undocumented set of LimInclude properties.

Of course, setting manual limits prevents the axes limits from growing and shrinking automatically with the data, which can actually be a very useful feature sometimes. But if performance is important, we now know that we have this tool to improve it.

Performance hack #2: static legend

Hack #1 gave us a 22% performance boost, but we can do much better. Running the profiler on the code above we see that much of the time is spent recomputing the legend. Looking inside the legend code (specifically, the legendcolorbarlayout function), we detect several short-circuits that we can use to make the legend static and prevent recomputation:

>> x=0:0.02:10; y=sin(x);
>> clf; cla; tic; drawnow; 
>> plot(x(1),y(1)); hold on; legend data;
 
>> xlim([0,10]); ylim([-1,1]);  % static limits
 
>> % Static legend
>> set(gca,'LegendColorbarListeners',[]); 
>> setappdata(gca,'LegendColorbarManualSpace',1);
>> setappdata(gca,'LegendColorbarReclaimSpace',1);
 
>> for idx = 1 : length(x); plot(x(idx),y(idx)); drawnow; end;
>> toc
 
Elapsed time is 5.209053 seconds.

Now this is much much better – a 76% performance boost compared to the original plot (i.e., 4 times faster!). Of course, it prevents the legend from being dynamically updated. Sometimes we actually wish for this dynamic effect (last year I explained how to use the legend’s undocumented -DynamicLegend feature for even greater dynamic control). But when performance is important, we can still display a legend without its usual performance cost.

In conclusion, I have demonstrated that Matlab performance can often be improved significantly, even in the absence of any vectorization, by simply understanding the internal mechanisms and bypassing those which are irrelevant in our specific case.

Have you found other similar performance hacks? If so, please share them in the comments section below.

Plot LimInclude properties

Wednesday, March 31st, 2010

Concluding my three-part mini-series on hidden and undocumented plot/axes properties, I would like to present a set of properties that I find very useful in dynamic plots: XLimInclude, YLimInclude, ZLimInclude, ALimInclude and CLimInclude. These properties, which are relevant for plot/axes objects, have an ‘on’ value by default. When set to ‘off’, they exclude their object from the automatic computation of the corresponding axes limits (XLim/YLim/ZLim/ALim/CLim).

For example, here’s a simple sine wave with a wavefront line marker. Note how the too-tall wavefront line affects the entire axes Y-limits:

cla;
t=0:.01:7.5;
plot(t,sin(t));
line('xdata',[7.5,7.5], 'ydata',[-5,5], 'color','r'); 
box off

Regular plot (YLimInclude on)

Regular plot (YLimInclude on)

This situation is quickly fixed using the YLimInclude property:

cla;
t=0:.01:7.5;
plot(t,sin(t));
line('xdata',[7.5,7.5], 'ydata',[-5,5], 'color','r', ...
     'YLimInclude','off'); 
box off

YLimInclude off

YLimInclude off

Beside the functional importance of this feature, it also has a large potential for improved application performance: I recently designed a monitor-like GUI for a medical application, where the data is constantly updated from an external sensor connected to the computer. The GUI presents the latest 10 seconds of monitored data, which bounce up and down the chart. A red wave-front line is presented and constantly updated, to indicate the current data position. Since the monitored data jumps up and down, the Y-limits of the monitor chart often changes, and with it I would need to modify the wavefront’s YData based on the updated axes YLim. This turned out to steal precious CPU time from the actual monitoring application. Came YLimInclude to the rescue, by letting me specify the wavefront line as:

hWavefront = line(..., 'YData',[-99,99], 'YLimInclude','off');

Now the wavefront line never needed to update its YData (only XData, which is much less CPU-intensive) – it always spanned the entire axes height, since [-99,99] were assured (in my particular case) to exceed the actual monitored data. This looked better (no flicker effects) and performed faster than the regular (documented) approach.

Note that although all these properties exist, to the best of my knowledge, for all Handle-Graphic plot objects, they are sometimes meaningless. For example, ZLimInclude is irrelevant for a 2D patchless plot; CLimInclude relates to the axes color limits which are irrelevant if you’re not using a colormap or something similar; ALimInclude relates to patch transparency (alpha-channel) and is irrelevant elsewhere. In these and similar cases, setting these properties, while allowed and harmless, will simply have no effect.

This concludes my mini-series of undocumented plot/axes properties. To recap, the other articles dealt with the LooseInset and LineSmoothing properties.

Have you found other similar properties or use-cases that you find useful? I will be most interested to read about them in the comments section below.

Performance: scatter vs. line

Wednesday, October 14th, 2009

Following my previous article on the undocumented behavior of the scatter function, one of my readers, Benoit Charles, reported a discovery that in many circumstances the line function generates identical plots much faster than scatter.

Unlike scatter, line does not enable specific data-point marker customization, although the colors could be modified. On the other hand, line only uses a single handle object, saving memory and system resources compared to scatter keeping a separate handle for each data point. So, if you just need to quickly plot a bunch of scattered points then line could be a better choice than scatter.

Here is a simple code snippet, which generates identical plots and shows the performance difference:

>> x=rand(1000,1); y=rand(1000,1);
 
>> tic, for idx=1:100, cla; h=scatter(x,y); end; toc
Elapsed time is 2.521322 seconds.
 
>> props = {'LineStyle','none','Marker','o','MarkerEdge','b','MarkerSize',6};
>> tic, for idx=1:100, cla; h=line([x,x],[y,y],props{:}); end; toc
Elapsed time is 0.333369 seconds.

In the past, I have posted about other undocumented performance aspects, comparing the documented ismember function with the undocumented ismembc and about cellfun’s undocumented options. If you are aware of other similar functions having identical outputs and a significant performance difference, please let me know.

tic / toc – undocumented option

Friday, May 22nd, 2009

I have recently discussed undocumented features in Matlab’s built-in profiler. Another tool often used for ad-hoc performance profiling of Matlab code are the built-in duo functions tic and toc:

>> tic; magic(500); toc
Elapsed time is 0.474647 seconds.

Both tic and toc are documented functions, but they contain an undocumented option (at least until R2008b – see history below) that enables nested clocking. The problem that is solved using this undocumented option is that if we try to use tic/toc in external function parentFcn and also in the internal function childFcn, we get unexpected results. This happens because our childFcn’s invocation of tic has reset the clock and so parentFcn’s clock is now obviously incorrect:

function parentFcn
 
    % Start clocking
    tic;
 
    % Do something long...
    m1 = magic(1500);
 
    % Call nested function
    toc1 = childFcn;
 
    % Present clock timing
    toc2 = toc;
    fprintf('Elapsed time for childFcn:  %0.3f secs\n',toc1)
    fprintf('Elapsed time for parentFcn: %0.3f secs\n',toc2)
 
 
    % Nested function with separate clocking
    function t = childFcn
        tic;
        m = magic(800);
        t = toc;
    end  % childFcn
 
end  % parentFcn
>> parentFcn
Elapsed time for childFcn:  1.063 secs
Elapsed time for parentFcn: 1.068 secs

The solution: use a separate handle for each clock. The format is very simple:

>> t = tic;
>> toc(t)

This format ensures independent clocking of the clocks. Clockings can now be safely nested and even partial overlap is possible:

function parentFcn
 
    % Start clocking
    t1 = tic;
 
    % Do something long...
    m1 = magic(1500);
 
    % Call nested function
    toc1 = childFcn;
 
    % Present clock timing
    toc2 = toc(t1);
    fprintf('Elapsed time for childFcn:  %0.3f secs\n',toc1)
    fprintf('Elapsed time for parentFcn: %0.3f secs\n',toc2)
 
 
    % Nested function with separate clocking
    function t = childFcn
        tic;
        m = magic(800);
        t = toc;
    end  % childFcn
 
end  % parentFcn
>> parentFcn
Elapsed time for childFcn:  1.123 secs
Elapsed time for parentFcn: 4.992 secs

For the record, this undocumented option was first presented by the MathWorks support team as an official workaround for the aforementioned problem. The solution was originally posted for Matlab R2006b, but actually the option was supported by Matlab versions at least as early as R14SP3 (7.1) – perhaps even earlier (I don’t have ready access to earlier versions). I do know that it was not supported on release R12 aka 6.0, but I don’t know whether it was introduced in 6.5, 7.0 or 7.1. This option has only been documented since R2008b, although it has existed in many earlier releases.

Now here’s a puzzle for all you undoc fans out there: toc has a really undocumented option of using an uninitialized uint64 value, rather than an actual tic handle. It seems that whichever value is passed will always result in the same result, but this result is uncorrelated to any possible event (start of Matlab, midnight etc.). It even works without any prior tic invocation! I would love to hear your thoughts on what you think this strange toc result might represent:

>> toc(uint64(1))
Elapsed time is 72825.947547 seconds.
>> toc(uint64(1234))
Elapsed time is 72826.538296 seconds.

cellfun – undocumented performance boost

Monday, May 11th, 2009

Matlab’s built-in cellfun function has traditionally enabled several named (string) processing functions such as ‘isempty’. The relevant code would look like this:

data = cellfun('isempty',cellArray);

In recent years, newer Matlab releases has added support for function handles, so the previous code snippet can now be written as follows:

data = cellfun(@isempty,cellArray);

The newer function-handle format is “cleaner” and more extensive than the former format, accepting any function, not just the limited list of pre-specified processing function names (‘isreal’, ‘islogical’, ‘length’, ‘ndims’, ‘prodofsize’). Some have even reported that the older format has limitations vis-a-vis compilation etc.

All this is well known and documented. However, it turns out that, counter-intuitively (and undocumented), the older format is actually much faster than the newer format for those pre-specified processing function names. The reason appears to be that ‘isempty’ (as well as the other predefined string functions) uses specific code-branches optimized for performance:

>> c = mat2cell(1:1e6,1,repmat(1,1,1e6));
>> tic, d=cellfun('isempty',c); toc
Elapsed time is 0.115583 seconds.
>> tic, d=cellfun(@isempty,c); toc
Elapsed time is 7.493989 seconds.

Perhaps a future Matlab release will improve cellfun’s internal code, to check for function-handle equality to the optimized functions, and use the optimized code branch if possible. When I posted this issue today as a correction to a reader’s misconception, Matlab’s Loren Shure commented as follows:

We could improve cellfun to check function handles to see if they match specified strings. Even then MATLAB would have to be careful in case the user has overridden the built-in version of whatever the string points to.

While this comment seems to imply that the performance boost feature will be maintained and possibly improved in future releases, users should note that this is not guarantied. One could even argue that future code optimizations would be applied to the new (function-handle) format rather than the old string format. The performance pendulum might also change based on user platform. Therefore, users for whom performance is critical should always test both versions on their target system: ‘isempty’ vs. @isempty etc. (note that the corresponding function for ‘prodofsize’ is @numel).

ismembc – undocumented helper function

Wednesday, April 8th, 2009

Matlab has a variety of internal helper functions which are used by the main (documented) functions. Some of these helper functions are undocumented and unsupported, but may be helpful in their own right – not just as internal support functions.

In this post I want to present Matlab’s built-in ismembc helper function. This function is used within the stock Matlab ismember and setxor functions for fast processing of the core ismember functionality in “regular” cases: arrays of sorted, non-sparse, non-NaN data in which we’re only interested in the logical membership information (not the index locations of the found members). In such cases, ismembc can be used directly, saving ismember’s sanity-checks overhead. ismembc uses the same interface (two inputs, single logical output) as ismember and can be a drop-in replacement for ismember for these “regular” cases.

The performance improvement may be significant: In a recent post, MathWorks’ Loren Shure presented different approaches for fast data retrieval, highlighting the ismember function. Let’s compare:

>> % Initial setup
>> n=2e6; a=ceil(n*rand(n,1)); b=ceil(n*rand(n,1));

>> % Run ismember several times, to rule-out JIT compilation overheads
>> tic;ismember(a,b);toc;
Elapsed time is 2.882907 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 2.818318 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 3.005967 seconds.

>> % Now use ismembc:
>> tic;ismembc(a,b);toc;
Elapsed time is 0.162108 seconds.
>> tic;ismembc(a,b);toc;
Elapsed time is 0.204108 seconds.
>> tic;ismembc(a,b);toc;
Elapsed time is 0.156963 seconds.

ismembc is actually a MEX file (%matlabroot%\toolbox\matlab\ops\ismembc.mexw32). Its source code is included in the same folder (%matlabroot%\toolbox\matlab\ops\ismembc.cpp) and is actually very readable. From the source code comments we learn that the comment in setxor about ismembc usage is misleading: that comment stated that the inputs must be real, but the source-code indicates that imaginary numbers are also accepted and that only the real-part should be sorted.

ismembc should not be used carelessly: as noted, its inputs must be sorted non-sparse non-NaN values. In the general case we should either ensure this programmatically (as done in setxor) or use ismember, which handles this for us.

The nice thing about ismembc is that its source code (ismembc.cpp) is included, so even if future Matlab releases stop using this function, you can always mex-compile the source code and use it.

Readers interested in ismembc might also be interested in its sibling help function, ismembc2, which is also a mex file located (with source-code) in the same folder as ismembc. Whereas ismembc returns an array of logical values, ismembc2 returns the index locations of the found members.

Undocumented profiler options

Thursday, April 2nd, 2009

The Matlab profiler is a powerful tool for debugging performance-related issues in Matlab applications. However, it also has some undocumented options that facilitate other forms of debugging, namely memory bottlenecks and JIT (Just-In-Time compilation) problems.

To turn on memory stats in the profile report, run this (only once is necessary – will be remembered for future profiling runs):

profile -memory on;
profile('-memory','on');  % an alternative

To turn on JIT information, run this (again, only once is necessary, prior to profile report):

setpref('profiler','showJitLines',1);

You will then see additional JIT and memory (allocated, freed and peak) information displayed in the profile report, as well as the options to sort by allocated, freed and peak memory:


Profile report with memory & JIT infoProfile report with memory & JIT info

Profile report with memory & JIT info

Profile report with memory & JIT info

For those interested, the references to these two options appear within the code of profview.m (line 1199 on R2007b), for the JIT option:

showJitLines = getpref('profiler','showJitLines',false);

…and profile.m (lines 163-165 on R2007b), for the memory option:

if memory ~= -1
    callstats('memory', memory);
end

Note that there appears to be two undocumented additional memory-related options in profile.m (lines 311-312):

options = {'detail', 'timer', 'history', 'nohistory', 'historysize', ...
           'timestamp', 'memory', 'callmemory', 'nomemory' };

However, ‘-nomemory’ appears to simply turn the memory stats collection off, and ‘-callmemory’ is not recognized because of a bug in line 349, which looks for ‘callnomemory’…:

    case 'callnomemory'   % should be 'callmemory'
           memory = 2;

When this bug is fixed, we see that we get only partial memory information, so the ‘-callmemory’ option is really not useful – use ‘-memory’ instead.