Some Matlab performance-tuning tips

Today’s post is about performance. My goal is to show that contrary to widespread perception, Matlab is not inherently too slow to be used for real-life programs. In fact, by taking a small amount of time (compared to the overall dev time), Matlab programs can be accelerated by a large factor. I wish to demonstrate this claim with work that I recently completed for the Crustal Dynamics research group at Harvard University. They have created interactive Matlab GUIs for earthquake hazard, visualizing deformation and motion at plate boundary zones, recorded as GPS velocities:

These GUIs served them well for several years. But when they recently tried to analyze larger problems that involved far more data, it was getting so slow that it limited their ability to interrogate the GUI results effectively and do productive science (take a look at all the data points around New Zealand in the screenshot above). This is when I stepped in to help.

Two main performance issues stood out above the rest: Loading the data, and displaying the textual labels of numeric slip rates.

I/O: Loading data

I began with optimizing the data load time. This involves loading several different data files, which have custom textual formats. Of course, had the data files been initially created in some binary format, we could load it much faster. But we were faced with an existing situation where the textual data format was a given fact. Using Matlab’s profiler, it quickly emerged, as expected, that most of the time was spent parsing the text files. Two specific types of parsing were used, and they were both quite slow: reading the input files using textread, and parsing the input data using str2num.
To read the data from the files, I replaced the textread calls with corresponding textscan ones:

% Old (slow) code:
[Station.lon, Station.lat, Station.eastVel, Station.northVel, ...
 Station.eastSig, Station.northSig, Station.corr, ...
 Station.other1, Station.tog, Station.name] = textread(fileName, '%f%f%f%f%f%f%f%d%d%s');
% New (fast) code:
fid = fopen(fileName,'rt');
c = textscan(fid, '%f%f%f%f%f%f%f%d%d%s');
fclose(fid);
fn = {'lon', 'lat', 'eastVel', 'northVel', 'eastSig', 'northSig', 'corr', 'other1', 'tog', 'name'};
Station = cell2struct(c,fn,2);

To improve the performance of the str2num calls, I differentiated between two sub-cases:
In some cases, str2num was simply used to round numeric input data to a certain numeric precision. I improved this by changing the str2num calls with corresponding calls to round (which accepts an optional precision argument since R2014b):

% Old (slow) code
Station.lon = str2num(num2str(Station.lon, '%3.3f'));
% New (fast) code
Station.lon = round(Station.lon,3);        % R2014b or newer
Station.lon = round(Station.lon*1e3)/1e3;  % R2014a or older

In other cases, str2num was used to convert strings into numeric values. This should normally be done using a textscan parameter but in this specific case this was complicated due to the way the data was formatted, which required parsing iterative blocks. Still, converting strings into numbers is far faster using sscanf than str2num. The down side is that str2num also works in certain edge-cases where sscanf doesn’t. For this reason, I created a utility function (str2num_fast.m) that uses sscanf where possible, and falls back to str2num in case of problems. I then simply replaced all calls to str2num in the code to str2num_fast:

% str2num_fast - faster alternative to str2num
function data = str2num_fast(str, numCols)
    try
        % Fast code:
        str = char(str);
        str(:,end+1) = ' ';
        data = sscanf(str','%f');
        if nargin>1 && ~isempty(numCols)
            data = reshape(data,numCols,[])';
        end
    catch
        % This is much simpler but also much slower...
        data = str2num(str);
    end
end

The result: loading a medium-sized data set, which used to take 5-6 minutes (and much longer for larger data sets), now takes less than 1 second, a speedup of x500. When you continuously load and compare different data sets, it can mean the difference between a usable and an unusable program. Not bad for starters…
For many additional related techniques, read chapters 4 and 11 of my Accelerating MATLAB Performance book (string processing and I/O, respectively), or other performance-related articles on this website.

Displaying data

I now turned my attention to the graphic visualization aspects. As can be seen in the screenshot above, there are multiple layers of textual labels, arrows, lines and data points that can be added to the chart.
It turned out that in the interest of improved performance, the various checkboxes were designed such that they merely turned the visibility of the graphic components on and off. This does indeed improve performance in the specific use-case of checking and unchecking a specific checkbox. But in the general case, it significantly degrades performance by adding numerous graphic handles to the plot. By just checking 3 of these checkboxes (not all of them), I found that 37365 different graphic handles were created in the plot axes. That’s a HUGE number, and it’s no surprise that adding additional visualization layers, or zooming/panning the axes, became excruciatingly slow, even when the layers were turned off (i.e., made invisible). This is because Matlab’s internal graphics engine needs to manage all these handles, even when they are not visible.
The first rule of improving graphics performance is that except if the handles need to be frequently turned on/off, no graphic element should remain plotted if it is not visible. In our case, this meant that when a visualization layer’s checkbox is deselected, the corresponding handles are deleted, not made invisible (there is of course a throughput/latency tradeoff in the general case, between the recurring handle’s creation time and the performance impact of keeping numerous invisible handles):

% hCheckbox is the handle of the selected/deselected checkbox
% hPlotHandles is the list of corresponding plotted graphic handles
if get(hCheckbox, 'Value') == 0
   %set(hPlotHandles, 'Visible', 'off');  % Old (slow) code
   delete(hPlotHandles);  % Faster throughput in our use-case
else
   hPlotHandles = ...
end

A related aspect is that if the axes is zoomed-in (as is often the case in this specific GUI), then there is no need to plot any graphic element which is outside the axes limits:

% Old (slow) code:
text(lons, lats, labels);
% Much faster: limit the labels only to the visible axes area
hAxes = handle(gca);
validIdx = within(lons, hAxes.XLim) & within(lats, hAxes.YLim);
text(lons(validIdx), lats(validIdx), labels(validIdx,:));
function validIdx = within(data,limits)
    validIdx = data >= limits(1) & data <= limits(2);
end

Finally, in order to reduce the number of displayed graphic handles, we can unify the separate line segments into a single line that has NaN (or Inf) values interspaced between the segments. This is a very important technique, that enabled a reduction of ~7000 separate line handles into a single line, which improves both the line creation time and any subsequent axes action (e.g., zoom/pan). This is even faster than limiting the display to the axes limits (and yes, we could combine them by displaying a single line that has fewer data points that fit the axes limits, but the extra performance benefit would be negligible):

% Old (slow) code:
line([Segment.lon1'; Segment.lon2'], [Segment.lat1'; Segment.lat2']);
% Faster code: limit the display to the axes limits
hAxes = handle(gca);
lonLimits = hAxes.XLim;
latLimits = hAxes.YLim;
valid = (within(Segment.lon1,lonLimits) | within(Segment.lon2,lonLimits)) & ...
        (within(Segment.lat1,latLimits) | within(Segment.lat2,latLimits));
line([Segment.lon1(valid)', Segment.lon2(valid)'], ...
     [Segment.lat1(valid)', Segment.lat2(valid)']);
% New (fastest) code:
lons = [Segment.lon1'; Segment.lon2'; nan(1,numel(Segment.lon2))];
lats = [Segment.lat1'; Segment.lat2'; nan(1,numel(Segment.lat2))];
line(lons(:), lats(:));

The result: the time for displaying the slip-rate labels in the zoomed-in Australasia region in the screenshot above was reduced from 33 seconds to 0.6 secs; displaying the residual velocity vectors was reduced from 1.63 secs to 0.02 secs — speedups of x50-100. Again, not bad at all… The GUI is now fast enough to enable true interactivity. In Prof. Brendan Meade’s words:

[The GUI] is now tremendously fast with all arrow type visualizations! It’s amazing and enabling us to work with large scale data sets much more efficiently. Simply fantastic. Just awesome… It’s so fast on the load and showing the slip rates in zoom regions with large models almost instant! This is exactly what we were hoping for. This is really making a difference in terms of how fast we can do science!

The technique of converting multiple lines into a single line using NaNs was discussed last week by Mike Garrity. Users who are interested in additional ideas for improving Matlab graphics performance are encouraged to visit Mike’s blog. For many additional techniques, read chapter 10 of my Accelerating MATLAB Performance book, or other performance-related articles on this website.
Some other techniques for speeding up graphic objects creation that I’ve found useful over the years include:

Avoid plotting non-visible elements, including elements that are outside the current axes limits, or have their Visible property set to ‘off’ or have a color that is the same as the axes background color.
Avoid plotting overlapped elements, esp. those that are occluded by non-transparent patches, or lines having the same coordinates.
Avoid using the scatter function with fewer than 100 data points – instead, duplicate these points so that scatter will work with more than 100 points, where vectorization kicks in (details), or even better: use line rather than scatter.
Use low-level rather than high-level plotting functions – i.e., line instead of scatter/plot/plot3; surface instead of surf.
Avoid creating straight line with multiple data points – instead, only keep the end-points for plotting such lines. I find that this is a very common use-case, which is often overlooked and could have a significant performance impact.
Avoid using plot markers if possible, and use simple markers if this cannot be avoided. Various markers have different performance impacts in various situations, but ‘.’ and ‘o’ are typically faster than others.
Use the plot function’s input triplets format, rather than multiple calls to plot. For example:
plot(data1x,data1y,'r', data2x,data2y,'g', data3x,data3y,'b', ...);
plot(data1x,data1y,'r', data2x,data2y,'g', data3x,data3y,'b', ...);
Set the axes properties to static values before plotting, in order to avoid run-time dynamic computation and update of things like the limits, tick-marks etc.
Avoid creating legends or colorbars initially – let the user create them by clicking the corresponding toolbar icon if needed. Legends and colorbars take a second or more to create and can often be avoided in the initial display. If this cannot be avoided, then at least create static legends/colorbars.
Only call drawnow once you’ve finished plotting. I’ve seen numerous cases where users call drawnow within the plotting loop and this has a horrendous performance impact. However, note that in some cases drawnow is very important (example1, example2).
Generate the plot while the figure is hidden.
Data-reduce the plotted data. We can program this ourselves, or use former MathWorker Tucker McClure’s reduce_plot utility (a POTW selection) to do it for us. Data reduction is especially important when displaying images that do not need to be zoomed-in.
Cast image data to uint8 before using image or imagesc to display the image.
Avoid clearing/deleting and then recreating the axes when we need to replot – instead, just delete the relevant axes children objects and replot in the existing axes.
Avoid using the axes function to set the focus on a specific axes – instead, set the figure’s CurrentAxes property, or pass the axes handle directly to the plotting function.

Naturally, there are certain use-cases where applicative requirements might prevent using one or more of these techniques, but in the general case I find that following them improves performance.

Conclusions

Matlab is NOT inherently slow. It can be made to run much faster than many people assume, by simply using the built-in profiler tool, following several simple coding techniques and employing common sense. Too often I find that complains about Matlab’s speed stem from the fact that not even a minimal effort was invested in trying to follow these steps. The difference between an unoptimized and optimized Matlab code can be far larger in Matlab than in other programming languages, so Matlab users should invest more time optimizing their code than they would perhaps need to do in other programming environments. The potential benefits, as I’ve shown above and in my book, could be enormous.
MathWorks is constantly investing in making Matlab’s engine faster by default, and there is certainly room for improvement in certain aspects (e.g, OOP and HG2 performance). But there will always be room for human insight to optimize performance, and we should not neglect this. Moreover, we should not blame Matlab for our failure to invest even a minimal optimization effort. MathWorks cannot (of course) say this to their users, but I don’t have this limitation and I say it out loud: people should stop blaming MathWorks for everything. If you create a car with square wheels, don’t complain if it doesn’t drive as fast as you expect (even if its engine could indeed be improved). In this case, the customer is not always (or entirely) right.
Perhaps it’s just a matter of setting the user expectations straight: we do not expect Matlab to automatically solve our equations or generate the perfect engineering model. So too should we not expect Matlab to automatically run fast enough for our needs. Just as we expect to spend time to solve the scientific or engineering problem, so too should we spend a bit of time to optimize the code to run fast enough.
Luckily, there are numerous different ways in which we can improve Matlab’s performance. In fact, there are so many different ways to achieve our performance goals that we can take a pick based on aesthetic preferences and subjective experience: Some people use vectorization, others like parallelization, some others prefer to invest in smarter algorithms or faster hardware, others trade memory for performance or latency for throughput, still others display a GUI that just provides a faster impression with dynamic feedback. Even if one technique fails or is inapplicable, there are many other alternatives that we could try. Just use the profiler and some common sense and you are half-way there. Good luck!

Additional resources

Interested readers can find out more information about improving Matlab’s performance in my book “Accelerating MATLAB Performance” (CRC Press, 2014, ISBN 978-1482211290). If you already have this book, please be kind enough to post your feedback on it on Amazon (link), for the benefit of others.
I am offering a couple of webinars about various ways to improve Matlab’s run-time performance:

Matlab performance tuning part 1 (3:39 hours, syllabus) – $195 (buy)
Matlab performance tuning part 2 (3:43 hours, syllabus) – $195 (buy)
==> or buy both Matlab performance tuning webinars for only $345 (buy)

Both the webinar videos and their corresponding slide-decks are available for download. The webinars content is based on onsite training courses that I presented at multiple client locations (details).
Email me if you would like additional information on the webinars or my consulting, or to inquire regarding an onsite training course.

4 Responses

Brad Stiritz June 17, 2015 at 17:08 Reply

Hi Yair, thanks for the great blog post! Fantastic troubleshooting example, well-done 🙂

I just finished reading Code Complete 2 (highly recommended!), which discusses among other things the huge variance in productivity among programmers. With that theme in mind, I’m sure some of your other readers may be interested as well in the following questions:

May I ask you please to give some color on how long it took you to assess the indicated problems? Your post gives the impression that the Harvard team’s guidance may have been non-specific (“This is too slow, please speed up wherever you can”). Was that the case? Did you just dig in then and start methodically going through things using your own judgment? Or did the team provide an itemized list of concerns? If it’s not inappropriate to ask please, about how many hours of work went into each of these stages, to the extent they were distinct: problem assessment, code improvements, and QA validation?

I believe your answers will be helpful to the Matlab community at large. It’s always good to have benchmark impressions in mind. In this case, what can a world-class expert accomplish? Thanks in advance.

Yair Altman June 18, 2015 at 03:01 Reply

@Brad – thanks for your kind words, and for your very rich coffee – what a nice way to wake up in the morning 🙂

The program’s code size was not small (close to 80K source-code lines in ~600 files), so it took me some time to understand what’s going on and what’s relevant (only ~5K lines of code in perhaps a dozen files, it turns out). It then took me some time to figure out how to improve performance, and finally to fix all the code behind the numerous GUI controls so that they work correctly with the new graphics (i.e., now we only have a single large line handle rather than multiple separate handles, so wherever there is a reference to a particular line segment the code needed to be changed). Finally, testing took another hour or two. All-in all I’d say that I invested perhaps a full workday on this. I did get some specific guidance about things that were particularly bothersome for the team (both in terms of performance and functionality), so I focused mainly on those, but along the way I saw other parts that could be improved for speed or functionality and I improved them as well. There were also a few functional bugs that I discovered during my work, and I fixed them too. All these extra fixes took perhaps another half a day. So all in all this specific GUI took me about 15 workhours, but this included both functional and speedup improvements.

I think that it’s difficult to use these times as benchmarks, because as I said the code was large and complex, and I needed some time to get up to speed (pun intended) with it. It’s quite likely that programmers who create their program would be able to do things faster, since they’d already have all the fine details in their mind. Also, it takes some time (and perhaps a bit of nerve and self-confidence) to understand that another person’s code has a functional bug and should behave differently than the way it actually does.

Some performance-tuning tips | Undocumented Matlab | Stefan Grandl July 27, 2015 at 15:39 Reply

[…] Some performance-tuning tips | Undocumented Matlab Teilen auf:GoogleTwitterTumblr Dieser Eintrag wurde veröffentlicht in MATLAB und verschlagwortet […]
jinu August 22, 2015 at 04:05 Reply

nice writeup! about a year ago i moved to python/cython because of how slow matlab gui’s were with large datasets. i wish i had known some of your tricks back then. will definitely check out your new book. nice work again!

I/O: Loading data

Displaying data

Conclusions

Additional resources

Related posts:

4 Responses

Leave a Reply