Speeding-up builtin Matlab functions – part 1

April 29, 2018 7 Comments

A client recently asked me to assist with his numeric analysis function – it took 45 minutes to run, which was unacceptable (5000 runs of ~0.55 secs each). Accelerating MATLAB Performance The code had to run in 10 minutes or less to be useful. It turns out that 99% of the run-time was taken up by Matlab’s built-in fitdist function (part of the Statistics Toolbox), which my client was certain is already optimized for maximal performance. He therefore assumed that to get the necessary speedup he must either switch to another programming language (C/Java/Python), and/or upgrade his computer hardware at considerable expense, since parallelization was not feasible in his specific case.
Luckily, I was there to assist and was able to quickly speed-up the code down to 7 minutes, well below the required run-time. In today’s post I will show how I did this, which is relevant for a wide variety of other similar performance issues with Matlab. Many additional speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

Profiling

The first step in speeding up any function is to profile its current run-time behavior using Matlab’s builtin Profiler tool, which can either be started from the Matlab Editor toolstrip (“Run and Time”) or via the profile function.
The profile report for the client’s function showed that 99% of the time was spent in the Statistics Toolbox’s fitdist function. Drilling into this function in the profiling report, we see onion-like functions that processed input parameters, ensured data validation etc. The core processing is done inside a class that is unique to each required distribution (e.g., prob.StableDistribution, prob.BetaDistribution etc.) that is invoked within fitdist using an feval call, based on the distribution name that was specified by the user.
In our specific case, the external onion layers of sanity checks were unnecessary and could be avoided. In general, I advise not to discard such checks, because you never know whether future uses might have a problem with outlier data or input parameters. Moreover, in the specific case of fitdist they take only a very minor portion of the run-time (this may be different in other cases, such as the ismember function that I described years ago, where the sanity checks have a significant run-time impact compared to the core processing in the internal ismembc function).
However, since we wanted to significantly improve the total run-time and this was spent within the distribution class (prob.StableDistribution in the case of my client), we continue to drill-down into this class to determine what can be done.
It turns out that prob.StableDistribution basically does 3 things in its main fit() method:

pre-process the input data (prob.ToolboxFittableParametricDistribution.processFitArgs() and .removeCensoring() methods) – this turned out to be unnecessary in my client’s data, but has little run-time impact.
call stablefit() in order to get fitting parameters – this took about half the run-time
call stablelike() in order to get likelihood data – this took about half the run-time as well
call prob.StableDistribution.makeFitted() to create a probability-distribution object returned to the caller – this also took little run-time that was not worth bothering about.

The speed-up improvement process

With user-created code we could simply modify our code in-place. However, a more careful process is necessary when modifying built-in Matlab functions (either in the core Matlab distribution or in one of the toolboxes).
The basic idea here is to create a side-function that would replicate the core processing of fitdist. This is preferable to modifying Matlab’s installation files because we could then reuse the new function in multiple computers, without having to mess around in the Matlab installation (which may not even be possible if we do not have admin privileges). Also, if we ever upgrade our Matlab we won’t need to remember to update the installed files (and obviously retest).
I called the new function fitdist2 and inside it I initially placed only the very core functionality of prob.StableDistribution.fit():

% fitdist2 - fast replacement for fitdist(data,'stable')
% equivalent to pd = prob.StableDistribution.fit(data);
function pd = fitdist2(data)
    % Bypass pre-processing (not very important)
    [cens,freq,opt] = deal([]);
    %[data,cens,freq,opt] = prob.ToolboxFittableParametricDistribution.processFitArgs(data);
    %data = prob.ToolboxFittableParametricDistribution.removeCensoring(data,cens,freq,'stable');
    % Main processing in stablefit(), stablelike()
    params = stablefit(data,0.05,opt);
    [nll,cov] = stablelike(params,data);
    % Combine results into the returned probability distribution object
    pd = prob.StableDistribution.makeFitted(params,nll,cov,data,cens,freq);
end

If we try to run this as-is, we’d see errors because stablefit() and stablelike() are both sub-functions within %matlabroot%/toolbox/stats/stats/+prob/StableDistribution.m. So we copy these sub-functions (and their dependent subfunctions infoMtxCal(), intMle(), tabpdf(), neglog_pdf(), stable_nloglf(), varTrans) to the bottom of our fitdist2.m file, about 400 lines in total.
We also remove places that call checkargs(…) since that seems to be unnecessary – if you want to keep it then add the checkargs() function as well.
Now we re-run our code, after each speedup iteration verifying that the returned pd object returned by our fitdist2 is equivalent to the original object returned by fitdist.

Speeding-up stablefit()

A new profiling run shows that the vast majority of the time in stablefit() is spent in two lines:

s = load('private/StablePdfTable.mat');
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);

The first of these lines is reloading a static data file. The very same static data file is later reloaded in stablelike(). Both of these data-loads is done in every single invocation of fitdist, so if we have 5000 data fits, we load the same static data file 10,000 times! This is certainly not indicative of good programming practice. It would be much faster to reload the static data once into memory, and then use this cached data for the next 9,999 invocation. Since the original authors of StableDistribution.m seem to like single-character global variables (another bad programming practice, for multiple reasons), we’ll follow their example (added lines are highlighted):

persistent s  % this should have a more meaningful name (but at least is not global...)!if isempty(s)    fit_path = fileparts(which('fitdist'));    s = load([fit_path '/private/StablePdfTable.mat']);
    a = s.a;
    b = s.b;
    xgd = s.xgd;
    p = s.p;
end

In order to speed-up the second line (that calls fminsearch), we can reduce the tolerances used by this function, by updating the options structure passed to it, so that we use tolerances of 1e-3 rather than the default 1e-6 (in our specific case this resulted in acceptable errors of ~0.1%). Specifically, we modify the code from this:

function [parmhat,parmci] = stablefit(x,alpha,options)
...
if nargin < 3 || isempty(options)
    options = statset('stablefit');
else
    options = statset(statset('stablefit'),options);
end
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

to this (note that the line that actually calls fminsearch remains unchanged):

function [parmhat,parmci] = stablefit(x,alpha,unused_options)...
persistent optionsif isempty(options)    options = statset('stablefit');
    options.TolX   = 1e-3;    options.TolFun = 1e-3;    options.TolBnd = 1e-3;end
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

The fminsearch internally calls tabpdf() repeatedly. Drilling down in the profiling report we see that it recomputes a griddedInterpolant object that is essentially the same for all iterations (and therefore a prime candidate for caching), and also that it uses the costly cubic interpolation rather than a more straight-forward linear interpolation:

function y = tabpdf(x,alpha,beta)
...
persistent G  % this should have a more meaningful name (but at least is not global...)!if isempty(G)    G = griddedInterpolant({b, a, xgd}, p, 'linear','none');  % 'linear' instead of 'cubic'end%G = griddedInterpolant({b, a, xgd}, p, 'cubic','none');  % original
y = G({beta,alpha,x});
...

These cases illustrate two important speedup technique categories: caching data in order to reduce the number of times that a certain code hotspot is being run, and modifying the parameters/inputs in order to reduce the individual run-time of the hotspot code. Variations of these techniques form the essence of effective speedup and can often be achieved by just reflecting on the problem and asking yourself two questions:

can I reduce the number of times that this code is being run? and
can I reduce the run-time of this specific code section?

Additional important speed-up categories include parallelization, vectorization and algorithmic modification. These are sometimes more difficult programmatically and riskier in terms of functional equivalence, but may be required in case the two main techniques above are not feasible. Of course, we can always combine these techniques, we don’t need to choose just one or the other.

Speeding-up stablelike()

We now turn our attentions to stablelike(). As for the loaded static file, we could simply use the cached s to load the data in order to avoid repeated reloading of the data from disk. But it turns out that this data is actually not used at all inside the function (!) so we just comment-out the old code:

%s = load('private/StablePdfTable.mat');
%a = s.a;
%b = s.b;
%xgd = s.xgd;
%p = s.p;

Think about this – the builtin Matlab code loads a data file from disk and then does absolutely nothing with it – what a waste!
Another important change is to reduce the run-time of the integral function, which is called thousands of times within a double loop. We do this by reducing the tolerances specified in the integral call from 1e-6:

F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-6,'RelTol',1e-4); % original
F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-3,'RelTol',1e-3); % new

You can see that once again these two cases follow the two techniques that I mentioned above: we reduced the number of times that we load the data file (to 0 in our case), and we improved the run-time of the individual integral calculation by reducing its tolerances.

Conclusions

The final result of applying the techniques above was a 6-fold speedup, reducing the total run-time from 45 minutes down to 7 minutes. I could probably have improved the run-time even further, but since we reached our target run-time I stopped there. The point after all was to make the code usable, not to reach a world speed record.
In my next article I will present another example of dramatically improving the run-time speed of a built-in Matlab function, this time a very commonly-used function in the Financial Toolbox that I was able to speed-up by a factor of 40.
Matlab releases improve continuously, so hopefully my techniques above (or alternatives) would find their way into the builtin Matlab functions, making them faster than today, out-of-the-box.
Until this happens, we should not lose hope when faced with a slow Matlab function, even if it is a built-in/internal one, as I hope to have clearly illustrated today, and will also show in my next article. Improving the performance is often easy. In fact, it took me much longer to write this article than it was to improve my client’s code…
Let me know if you’d like me to assist with your Matlab project, either developing it from scratch or improving your existing code, or just training you in how to improve your Matlab code’s run-time/robustness/usability/appearance. I will be visiting Boston and New York in early June and would be happy to meet you in person to discuss your specific needs.

7 Responses

Hanan Kavitz April 30, 2018 at 10:09 Reply

Hi Yair,

Loved this post.
Here’s another caching example I recently implemented:
It turns out we use verLessThan function alot in our code base. I discovered that some of our UI’s call this function many times over and over until the UI is finally visible. It was one of the bottle necks for fast and UX friendly UI’s.
Since the version of a toolbox can’t just change in the middle of MATLAB process there is no need recalculating it over and over again.
I created verLessThan_fast function that caches the results of verLessThan by toolbox and version number first time this function is called and returns cached data on every consecutive call.
I got a very significant UX improvement.
Cris Luengo April 30, 2018 at 19:07 Reply

`global s` and `global options`? Yes, that’s not going to give any problems at all.

Why not use `persistent` to cache data?

Yair Altman April 30, 2018 at 19:23 Reply

@Cris – you are absolutely right that persistent is better for caching than global in most cases; and in the rare cases where global vars are used, they should be named appropriately, with meaningful names that are longer than a single character. In the code I wrote for my client this is indeed the case – I just simplified it in this article, following the [very bad] coding conventions that the original MathWorker developer used in these functions. In hindsight this was a bad decision because it is misleading (as your comment has shown), so I have now updated the text accordingly. Thanks for pointing this out.

Sam Roberts May 1, 2018 at 13:55 Reply

> single-character global variables

There’s really no excuse for this. Quite apart from the performance issues that you bring out, there will be newbie programmers all over the place that will be using global a in their code. This is going to clobber it, they’ll have no idea what happened, and it will be very hard for them to figure it out – especially if they trust MathWorks code not to do bad things like that. And if/when they do figure it out, they’re going to be pretty angry with MathWorks. At the very least, even if the developer felt a genuine need to use a global, they could have named the variable something like _internal_param_a to reduce the likelihood of a name clash.
Michelle Hirsch May 1, 2018 at 19:47 Reply

Thanks for this post, Yair. I’ve passed along the issues to the development team.
Gregor Lehmiller May 3, 2018 at 01:31 Reply

Unfortunately, while it has added many features, the Matlab stats/machine-learning toolbox has been going backwards for a number of versions. In particular, the obsession with making everything a class has made code transparency and algorithm inspection a nightmare. The related attempt to make classes have a jack-of-all-trades capability has slowed things down even more.

In our applications, we are now frequently either writing our own statistical/numerical routines (usually in C-mex) or borrowing standard algorithms. At some point we may abandon ship to a different platform – for example, just stay in C/C++ with a Python overlay.

This has been very disappointing development in Matlab evolution, especially for those of us for which speed is critical.
Matlabler May 17, 2019 at 15:00 Reply

Great tip, thank you! I speeded up the rangesearch function this way. I just replaced it with pdist2(X,Y,distance,’RADIUS’,radius); and now I save 93% of the time it takes before. So now I only need about 4 hours for my calculation, before it took over 36 hours!!

Thanks!

HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:

<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>

I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.