Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Speeding-up builtin Matlab functions – part 3

April 6, 2020 4 Comments

A recurring theme in this website is that despite a common misperception, builtin Matlab functions are typically coded for maximal accuracy and correctness, but not necessarily best run-time performance. Despite this, we can often identify and fix the hotspots in these functions and use a modified faster variant in our code. I have shown multiple examples for this in various posts (example1, example2, many others).

Today I will show another example, this time speeding up the mvksdensity (multi-variate kernel probability density estimate) function, part of the Statistics toolbox since R2016a. You will need Matlab R2016a or newer with the Stats Toolbox to recreate my results, but the general methodology and conclusions hold well for numerous other builtin Matlab functions that may be slowing down your Matlab program. In my specific problem, this function was used to compute the probability density-function (PDF) over a 1024×1024 data mesh.

The builtin mvksdensity function took 76 seconds to run on my machine; I got this down to 13 seconds, a 6x speedup, without compromising accuracy. Here’s how I did this:

Preparing the work files

While we could in theory modify Matlab’s installed m-files if we have administrator privileges, doing this is not a good idea for several reasons. Instead, we should copy and rename the relevant internal files to our work folder, and only modify our local copies.

To see where the builtin files are located, we can use the which function:

>> which('mvksdensity')
C:\Program Files\Matlab\R2020a\toolbox\stats\stats\mvksdensity.m

>> which('mvksdensity') C:\Program Files\Matlab\R2020a\toolbox\stats\stats\mvksdensity.m

In our case, we copy \toolbox\stats\stats\mvksdensity.m as mvksdensity_.m to our work folder, replace the function name at the top of the file from mvksdensity to mvksdensity_, and modify our code to call mvksdensity_ rather than mvksdensity.

If we run our code, we get an error telling us that Matlab can’t find the statkscompute function (in line #107 of our mvksdensity_.m). So we find statkscompute.m in the \toolbox\stats\stats\private\ folder, copy it as statkscompute_.m to our work folder, rename its function name (at the top of the file) to statkscompute_, and modify our mvksdensity_.m to call statkscompute_ rather than statkscompute:

[fout,xout,u] = statkscompute_(ftype,xi,xispecified,npoints,u,L,U,weight,cutoff,...

[fout,xout,u] = statkscompute_(ftype,xi,xispecified,npoints,u,L,U,weight,cutoff,...

We now repeat the process over and over, until we have all copied all the necessary internal files for the program to run. In our case, it tuns out that in addition to mvksdensity.m and statkscompute.m, we also need to copy statkskernelinfo.m.

Finally, we check that the numeric results using the copied files are exactly the same as from the builtin method, just to be on the safe side that we have not left out some forgotten internal file.

Now that we have copied these 3 files, in practice all our attentions will be focused on the dokernel sub-function inside statkscompute_.m, since the profiling report (below) indicates that this is where all of the run-time is spent.

Identifying the hotspots

Now we run the code through the Matlab Profiler, using the “Run and Time” button in the Matlab Editor, or profile on/report in the Matlab console (Command Window). The results show that 99.8% of mvksdensity‘s time was spent in the internal dokernel function, 75% of which was spent in self-time (meaning code lines within dokernel):

Initial profiling results - pretty slow...
Initial profiling results - pretty slow...

Let’s drill into dokernel and see where the problems are:

Initial dokernel profiling results
Initial dokernel profiling results

Evaluating the normal kernel distribution

We can immediately see from the profiling results that a single line (#386) in statkscompute_.m is responsible for nearly 40% of the total run-time:

fk = feval(kernel,z);

fk = feval(kernel,z);

In this case, kernel is a function handle to the normal-distribution function in \stats\private\statkskernelinfo>normal, which is evaluated 1,488,094 times. Using feval incurs an overhead, as can be seen by the difference in run-times: line #386 takes 29.55 secs, whereas the normal function evaluations only take 18.53 secs. In fact, if you drill into the normal function in the profiling report, you’ll see that the actual code line that computes the normal distribution only takes 8-9 seconds – all the rest (~20 secs, or ~30% of the total) is totally redundant function-call overhead. Let’s try to remove this overhead by calling the kernel function directly:

fk = kernel(z);

fk = kernel(z);

Now that we have a local copy of statkscompute_.m, we can safely modify the dokernel sub-function, specifically line #386 as explained above. It turns out that just bypassing the feval call and using the function-handle directly does not improve the run-time (decrease the function-call overhead) significantly, at least on recent Matlab releases (it has a greater effect on old Matlab releases, but that’s a side-issue).

We now recognize that the program only evaluates the normal-distribution kernel, which is the default kernel. So let’s handle this special case by inlining the kernel’s one-line code (from statkskernelinfo_.m) directly (note how we move the condition outside of the loop, so that it doesn’t get recomputed 1 million times):

...
isKernelNormal = strcmp(char(kernel),'normal');  % line #357 for i = 1:m
    Idx = true(n,1);
    cdfIdx = true(n,1);
    cdfIdx_allBelow = true(n,1);
    for j = 1:d
        dist = txi(i,j) - ty(:,j);
        currentIdx = abs(dist) <= halfwidth(j);
        Idx = currentIdx & Idx; % pdf boundary
        if iscdf
            currentCdfIdx = dist >= -halfwidth(j);
            cdfIdx = currentCdfIdx & cdfIdx; %cdf boundary1, equal or below the query point in all dimension
            currentCdfIdx_below = dist - halfwidth(j) > 0;                   
            cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension
        end
    end
    if ~iscdf
        nearby = index(Idx);
    else
        nearby = index((Idx|cdfIdx)&(~cdfIdx_allBelow));
    end
    if ~isempty(nearby)
        ftemp = ones(length(nearby),1);
        for k =1:d
            z = (txi(i,k) - ty(nearby,k))./u(k);
            if reflectionPDF
                zleft  = (txi(i,k) + ty(nearby,k)-2*L(k))./u(k);
                zright = (txi(i,k) + ty(nearby,k)-2*U(k))./u(k);
                fk = kernel(z) + kernel(zleft) + kernel(zright);  % old: =feval()+...
            elseif isKernelNormal                fk = exp(-0.5 * (z.*z)) ./ sqrt(2*pi);            else
                fk = kernel(z);  %old: =feval(kernel,z);            end
            if needUntransform(k)
                fk = untransform_f(fk,L(k),U(k),xi(i,k));
            end
            ftemp = ftemp.*fk;
        end
        f(i) = weight(nearby) * ftemp;
    end
    if iscdf && any(cdfIdx_allBelow)
        f(i) = f(i) + sum(weight(cdfIdx_allBelow));
    end
end
...

... isKernelNormal = strcmp(char(kernel),'normal'); % line #357 for i = 1:m Idx = true(n,1); cdfIdx = true(n,1); cdfIdx_allBelow = true(n,1); for j = 1:d dist = txi(i,j) - ty(:,j); currentIdx = abs(dist) <= halfwidth(j); Idx = currentIdx & Idx; % pdf boundary if iscdf currentCdfIdx = dist >= -halfwidth(j); cdfIdx = currentCdfIdx & cdfIdx; %cdf boundary1, equal or below the query point in all dimension currentCdfIdx_below = dist - halfwidth(j) > 0; cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension end end if ~iscdf nearby = index(Idx); else nearby = index((Idx|cdfIdx)&(~cdfIdx_allBelow)); end if ~isempty(nearby) ftemp = ones(length(nearby),1); for k =1:d z = (txi(i,k) - ty(nearby,k))./u(k); if reflectionPDF zleft = (txi(i,k) + ty(nearby,k)-2*L(k))./u(k); zright = (txi(i,k) + ty(nearby,k)-2*U(k))./u(k); fk = kernel(z) + kernel(zleft) + kernel(zright); % old: =feval()+... elseif isKernelNormal fk = exp(-0.5 * (z.*z)) ./ sqrt(2*pi); else fk = kernel(z); %old: =feval(kernel,z); end if needUntransform(k) fk = untransform_f(fk,L(k),U(k),xi(i,k)); end ftemp = ftemp.*fk; end f(i) = weight(nearby) * ftemp; end if iscdf && any(cdfIdx_allBelow) f(i) = f(i) + sum(weight(cdfIdx_allBelow)); end end ...

This reduced the kernel evaluation run-time from ~30 secs down to 8-9 secs. Not only did we remove the direct function-call overhead, but also the overheads associated with calling a sub-function in a different m-file. The total run-time is now down to 45-55 seconds (expect some fluctuations from run to run). Not a bad start.

Main loop – bottom part

Now let’s take a fresh look at the profiling report, and focus separately on the bottom and top parts of the main loop, which you can see above. We start with the bottom part, since we already messed with it in our fix to the kernel evaluation:

Profiling results for bottom part of the main loop
Profiling results for bottom part of the main loop

The first thing we note is that there’s an inner loop that runs d=2 times (d is set in line #127 of mvksdensity_.m – it is the input mesh’s dimensionality, and also the number of columns in the txi data matrix). We can easily vectorize this inner loop, but we take care to do this only for the special case of d==2 and when some other special conditions occur.

In addition, we hoist outside of the main loop anything that we can (such as the constant exponential power, and the weight multiplication when it is constant [which is typical]), so that they are only computed once instead of 1 million times:

...
isKernelNormal = strcmp(char(kernel),'normal');
anyNeedTransform = any(needUntransform);uniqueWeights = unique(weight);isSingleWeight = ~iscdf && numel(uniqueWeights)==1;isSpecialCase1 = isKernelNormal && ~reflectionPDF && ~anyNeedTransform && d==2;expFactor = -0.5 ./ (u.*u)';TWO_PI = 2*pi;for i = 1:m
    ...
    if ~isempty(nearby)
        if isSpecialCase1            z = txi(i,:) - ty(nearby,:);            ftemp = exp((z.*z) * expFactor);        else            ftemp = 1;  % no need for the slow ones()            for k = 1:d
                z = (txi(i,k) - ty(nearby,k)) ./ u(k);
                if reflectionPDF
                    zleft  = (txi(i,k) + ty(nearby,k)-2*L(k)) ./ u(k);
                    zright = (txi(i,k) + ty(nearby,k)-2*U(k)) ./ u(k);
                    fk = kernel(z) + kernel(zleft) + kernel(zright);  % old: =feval()+...
                elseif isKernelNormal
                    fk = exp(-0.5 * (z.*z)) ./ sqrt(TWO_PI);
                else
                    fk = kernel(z);  % old: =feval(kernel,z)
                end
                if needUntransform(k)
                    fk = untransform_f(fk,L(k),U(k),xi(i,k));
                end
                ftemp = ftemp.*fk;
            end
            ftemp = ftemp * TWO_PI;
        end        if isSingleWeight            f(i) = sum(ftemp);        else            f(i) = weight(nearby) * ftemp;
        end    end
    if iscdf && any(cdfIdx_allBelow)
        f(i) = f(i) + sum(weight(cdfIdx_allBelow));
    end
end
if isSingleWeight    f = f * uniqueWeights;endif isKernelNormal && ~reflectionPDF    f = f ./ TWO_PI;end...

... isKernelNormal = strcmp(char(kernel),'normal'); anyNeedTransform = any(needUntransform); uniqueWeights = unique(weight); isSingleWeight = ~iscdf && numel(uniqueWeights)==1; isSpecialCase1 = isKernelNormal && ~reflectionPDF && ~anyNeedTransform && d==2; expFactor = -0.5 ./ (u.*u)'; TWO_PI = 2*pi; for i = 1:m ... if ~isempty(nearby) if isSpecialCase1 z = txi(i,:) - ty(nearby,:); ftemp = exp((z.*z) * expFactor); else ftemp = 1; % no need for the slow ones() for k = 1:d z = (txi(i,k) - ty(nearby,k)) ./ u(k); if reflectionPDF zleft = (txi(i,k) + ty(nearby,k)-2*L(k)) ./ u(k); zright = (txi(i,k) + ty(nearby,k)-2*U(k)) ./ u(k); fk = kernel(z) + kernel(zleft) + kernel(zright); % old: =feval()+... elseif isKernelNormal fk = exp(-0.5 * (z.*z)) ./ sqrt(TWO_PI); else fk = kernel(z); % old: =feval(kernel,z) end if needUntransform(k) fk = untransform_f(fk,L(k),U(k),xi(i,k)); end ftemp = ftemp.*fk; end ftemp = ftemp * TWO_PI; end if isSingleWeight f(i) = sum(ftemp); else f(i) = weight(nearby) * ftemp; end end if iscdf && any(cdfIdx_allBelow) f(i) = f(i) + sum(weight(cdfIdx_allBelow)); end end if isSingleWeight f = f * uniqueWeights; end if isKernelNormal && ~reflectionPDF f = f ./ TWO_PI; end ...

This brings the run-time down to 31-32 secs. Not bad at all, but we can still do much better:

Main loop – top part

Now let’s take a look at the profiling report’s top part of the main loop:

Profiling results for top part of the main loop
Profiling results for top part of the main loop

Again we note is that there’s an inner loop that runs d=2 times, which we can again easily vectorize. In addition, we note the unnecessary repeated initializations of the true(n,1) vector, which can easily be hoisted outside the loop:

...
TRUE_N = true(n,1);isSpecialCase2 = ~iscdf && d==2;for i = 1:m
    if isSpecialCase2        dist = txi(i,:) - ty;        currentIdx = abs(dist) <= halfwidth;        currentIdx = currentIdx(:,1) & currentIdx(:,2);        nearby = index(currentIdx);    else        Idx = TRUE_N;        cdfIdx = TRUE_N;        cdfIdx_allBelow = TRUE_N;        for j = 1:d
            dist = txi(i,j) - ty(:,j);
            currentIdx = abs(dist) <= halfwidth(j);
            Idx = currentIdx & Idx; % pdf boundary
            if iscdf
                currentCdfIdx = dist >= -halfwidth(j);
                cdfIdx = currentCdfIdx & cdfIdx; % cdf boundary1, equal or below the query point in all dimension
                currentCdfIdx_below = dist - halfwidth(j) > 0;
                cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension
            end
        end
        if ~iscdf
            nearby = index(Idx);
        else
            nearby = index((Idx|cdfIdx)&(~cdfIdx_allBelow));
        end
    end    if ~isempty(nearby)
        ...

... TRUE_N = true(n,1); isSpecialCase2 = ~iscdf && d==2; for i = 1:m if isSpecialCase2 dist = txi(i,:) - ty; currentIdx = abs(dist) <= halfwidth; currentIdx = currentIdx(:,1) & currentIdx(:,2); nearby = index(currentIdx); else Idx = TRUE_N; cdfIdx = TRUE_N; cdfIdx_allBelow = TRUE_N; for j = 1:d dist = txi(i,j) - ty(:,j); currentIdx = abs(dist) <= halfwidth(j); Idx = currentIdx & Idx; % pdf boundary if iscdf currentCdfIdx = dist >= -halfwidth(j); cdfIdx = currentCdfIdx & cdfIdx; % cdf boundary1, equal or below the query point in all dimension currentCdfIdx_below = dist - halfwidth(j) > 0; cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension end end if ~iscdf nearby = index(Idx); else nearby = index((Idx|cdfIdx)&(~cdfIdx_allBelow)); end end if ~isempty(nearby) ...

This brings the run-time down to 24 seconds.

We next note that instead of using numeric indexes to compute the nearby vector, we could use faster logical indexes:

...
%index = (1:n)';  % this is no longer neededTRUE_N = true(n,1);
isSpecialCase2 = ~iscdf && d==2;
for i = 1:m
    if isSpecialCase2
        dist = txi(i,:) - ty;
        currentIdx = abs(dist) <= halfwidth;
        nearby = currentIdx(:,1) & currentIdx(:,2);    else
        Idx = TRUE_N;
        cdfIdx = TRUE_N;
        cdfIdx_allBelow = TRUE_N;
        for j = 1:d
            dist = txi(i,j) - ty(:,j);
            currentIdx = abs(dist) <= halfwidth(j);
            Idx = currentIdx & Idx; % pdf boundary
            if iscdf
                currentCdfIdx = dist >= -halfwidth(j);
                cdfIdx = currentCdfIdx & cdfIdx; % cdf boundary1, equal or below the query point in all dimension
                currentCdfIdx_below = dist - halfwidth(j) > 0;
                cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension
            end
        end
        if ~iscdf
            nearby = Idx;  % not index(Idx)        else
            nearby = (Idx|cdfIdx) & ~cdfIdx_allBelow;  % no index()        end
    end
    if any(nearby)        ...

... %index = (1:n)'; % this is no longer needed TRUE_N = true(n,1); isSpecialCase2 = ~iscdf && d==2; for i = 1:m if isSpecialCase2 dist = txi(i,:) - ty; currentIdx = abs(dist) <= halfwidth; nearby = currentIdx(:,1) & currentIdx(:,2); else Idx = TRUE_N; cdfIdx = TRUE_N; cdfIdx_allBelow = TRUE_N; for j = 1:d dist = txi(i,j) - ty(:,j); currentIdx = abs(dist) <= halfwidth(j); Idx = currentIdx & Idx; % pdf boundary if iscdf currentCdfIdx = dist >= -halfwidth(j); cdfIdx = currentCdfIdx & cdfIdx; % cdf boundary1, equal or below the query point in all dimension currentCdfIdx_below = dist - halfwidth(j) > 0; cdfIdx_allBelow = currentCdfIdx_below & cdfIdx_allBelow; %cdf boundary2, below the pdf lower boundary in all dimension end end if ~iscdf nearby = Idx; % not index(Idx) else nearby = (Idx|cdfIdx) & ~cdfIdx_allBelow; % no index() end end if any(nearby) ...

This brings the run-time down to 20 seconds.

We now note that the main loop runs m=1,048,576 (=1024×1024) times over all rows of txi. This is expected, since the loop runs over all the elements of a 1024×1024 mesh grid, which are reshaped as a 1,048,576-element column array at some earlier point in the processing, resulting in a m-by-d matrix (1,048,576-by-2 in our specific case). This information helps us, because we know that there are only 1024 unique values in each of the two columns of txi. Therefore, instead of computing the “closeness” metric (which leads to the nearby vector) for all 1,048,576 x 2 values of txi, we calculate separate vectors for each of the 1024 unique values in each of its 2 columns, and then merge the results inside the loop:

...
isSpecialCase2 = ~iscdf && d==2;
if isSpecialCase2    [unique1Vals, ~, unique1Idx] = unique(txi(:,1));    [unique2Vals, ~, unique2Idx] = unique(txi(:,2));    dist1 = unique1Vals' - ty(:,1);    dist2 = unique2Vals' - ty(:,2);    currentIdx1 = abs(dist1) <= halfwidth(1);    currentIdx2 = abs(dist2) <= halfwidth(2);endfor i = 1:m
    if isSpecialCase2
        idx1 = unique1Idx(i);        idx2 = unique2Idx(i);        nearby = currentIdx1(:,idx1) & currentIdx2(:,idx2);    else
        ...

... isSpecialCase2 = ~iscdf && d==2; if isSpecialCase2 [unique1Vals, ~, unique1Idx] = unique(txi(:,1)); [unique2Vals, ~, unique2Idx] = unique(txi(:,2)); dist1 = unique1Vals' - ty(:,1); dist2 = unique2Vals' - ty(:,2); currentIdx1 = abs(dist1) <= halfwidth(1); currentIdx2 = abs(dist2) <= halfwidth(2); end for i = 1:m if isSpecialCase2 idx1 = unique1Idx(i); idx2 = unique2Idx(i); nearby = currentIdx1(:,idx1) & currentIdx2(:,idx2); else ...

This brings the run-time down to 13 seconds, a total speedup of almost ~6x compared to the original version. Not bad at all.

For reference, here’s a profiling summary of the dokernel function again, showing the updated performance hotspots:

Profiling results after optimization
Profiling results after optimization

The 2 vectorized code lines in the bottom part of the main loop now account for 72% of the remaining run-time:

    ...
    if ~isempty(nearby)
        if isSpecialCase1
            z = txi(i,:) - ty(nearby,:);            ftemp = exp((z.*z) * expFactor);        else
            ...

... if ~isempty(nearby) if isSpecialCase1 z = txi(i,:) - ty(nearby,:); ftemp = exp((z.*z) * expFactor); else ...

If I had the inclination, speeding up these two code lines would be the next logical step, but I stop at this point. Interested readers could pick up this challenge and post a solution in the comments section below. I haven’t tried it myself, so perhaps there’s no easy way to improve this. Then again, perhaps the answer is just around the corner – if you don’t try, you’ll never know…

Data density/resolution

So far, all the optimization I made have not affected code accuracy, generality or resolution. This is always the best approach if you have some spare coding time on your hands.

In some cases, we might have a deep understanding of our domain problem to be able to sacrifice a bit of accuracy in return for run-time speedup. In our case, we identify the main loop over 1024×1024 elements as the deciding factor in the run-time. If we reduce the grid-size by 50% in each dimension (i.e. 512×512), the run-time decreases by an additional factor of almost 4, down to ~3.5 seconds, which is what we would have expected since the main loop size has decreased 4 times in size. While this reduces the results resolution/accuracy, we got a 4x speedup in a fraction of the time that it took to make all the coding changes above.

Different situations may require different approaches: in some cases we cannot sacrifice accuracy/resolution, and must spend time to improve the algorithm implementation; in other cases coding time is at a premium and we can sacrifice accuracy/resolution; and in other cases still, we could use a combination of both approaches.

Conclusions

Matlab is composed of thousands of internal functions. Each and every one of these functions was meticulously developed and tested by engineers, who are after all only human. Whereas supreme emphasis is always placed with Matlab functions on their accuracy, run-time performance often takes a back-seat. Make no mistake about this: code accuracy is almost always more important than speed, so I’m not complaining about the current state of affairs.

But when we run into a specific run-time problem in our Matlab program, we should not despair if we see that built-in functions cause slowdown. We can try to avoid calling those functions (for example, by reducing the number of invocations, or decreasing the data resolution, or limiting the target accuracy, etc.), or we could optimize these functions in our own local copy, as I have shown today. There are multiple techniques that we could employ to improve the run time. Just use the profiler and keep an open mind about alternative speed-up mechanisms, and you’d be half-way there. For ideas about the multitude of different speedup techniques that you could use in Matlab, see my book Accelerating Matlab Performance.

Let me know if you’d like me to assist with your Matlab project, either developing it from scratch or improving your existing code, or just training you in how to improve your Matlab code’s run-time/robustness/usability/appearance.

In the meantime, Happy Easter/Passover everyone, and stay healthy!

Related posts:

  1. Speeding-up builtin Matlab functions – part 2 – Built-in Matlab functions can often be profiled and optimized for improved run-time performance. This article shows a typical example. ...
  2. Speeding-up builtin Matlab functions – part 1 – Built-in Matlab functions can often be profiled and optimized for improved run-time performance. This article shows a typical example. ...
  3. Speeding up Matlab-JDBC SQL queries – Fetching SQL ResultSet data from JDBC into Matlab can be made significantly faster. ...
  4. Callback functions performance – Using anonymous functions in Matlab callbacks can be very painful for performance. Today's article explains how this can be avoided. ...
  5. Undocumented mouse pointer functions – Matlab contains several well-documented functions and properties for the mouse pointer. However, some very-useful functions have remained undocumented and unsupported. This post details their usage....
  6. Speeding up compiled apps startup – The MCR_CACHE_ROOT environment variable can reportedly help to speed-up deployed Matlab executables....
Performance Pure Matlab Toolbox
Print Print
« Previous
4 Responses
  1. Michelle Hirsch April 7, 2020 at 22:58 Reply

    Thanks Yair, as always, for the thoughtful post. I’ve forwarded it to the folks from both the Statistics team and the Performance team for their input. Happy Pesach!

  2. Loren Shure April 10, 2020 at 15:48 Reply

    Hi Yair-

    Very nice example showing people how to delve in – plus a nice heads up for us to always go back and look at our shipping code. Probably feval was the only option for evaluation at the time the functionality was first introduced. But I’m only guessing.

    Hag Sameach and take care!
    –loren

  3. Adam Langer June 17, 2020 at 00:41 Reply

    Hi Yair,

    I had no doubt you would find other ways to keep busy if you were not heading to the States this spring.

    It will only make a small difference since in your example this line of code is not called as often, but in “Main Loop – Bottom Part” you can eliminate the repetitive call to sqrt() in the inner for loop by precalculating it:

    TWO_PI = 2*pi;
    SQRT_TWO_PI = sqrt(TWO_PI); 
    ...
                    elseif isKernelNormal
                        fk = exp(-0.5 * (z.*z)) ./ SQRT_TWO_PI);

    TWO_PI = 2*pi; SQRT_TWO_PI = sqrt(TWO_PI); ... elseif isKernelNormal fk = exp(-0.5 * (z.*z)) ./ SQRT_TWO_PI);

    If I understand your final profile viewer screenshot showing the 13 seconds, the non-vectorized branch under if any(nearby) gets called around 300K times, which looks like 300K calls to sqrt(). This speedup won’t show up in your profiler results (that line is even in the top results now), but it’s another easy speedup some folks may overlook.

    In the common case I see that you cleverly avoid using sqrt() at all by instead dividing by TWO_PI at the end of the snippet.

    Stay healthy and safe.

  4. Jan June 26, 2020 at 00:55 Reply

    The sad thing is that only old functions can be delved into like that. Many new functions or classes are hidden in p-code encrypted files which is a bummer. Plus they are sealed (in case of classes), so it’s quite complicated to expand or improve their functionality/performance.

    This is one of the reasons we’re looking for alternatives like python or Julia.

    Don’t get me wrong, it’s not about the price, we’re happy to pay for functionality and documentation.
    But not if the trend is to make things more and more unaccessible

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
ActiveX (6) AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
  • Nicholas (6 days 22 hours ago): Hi Yair, Thanks for the reply. I am on Windows 10. I also forgot to mention that this all works wonderfully out of the editor. It only fails once compiled. So, yes, I have tried a...
  • Nicholas (6 days 22 hours ago): Hi Yair, Thanks for the reply. I am on Windows 10. I also forgot to mention that this all works wonderfully out of the editor. It only fails once compiled. So, yes, I have tried a...
  • Yair Altman (7 days 5 hours ago): Nicholas – yes, I used it in a compiled Windows app using R2022b (no update). You didn’t specify the Matlab code location that threw the error so I can’t help...
  • Nicholas (8 days 1 hour ago): Hi Yair, Have you attempted your displayWebPage utility (or the LightweightHelpPanel in general) within a compiled application? It appears to fail in apps derived from both R2022b...
  • João Neves (11 days 6 hours ago): I am on matlab 2021a, this still works: url = struct(struct(struct(struct(hF ig).Controller).PlatformHost). CEF).URL; but the html document is empty. Is there still a way to do...
  • Yair Altman (14 days 4 hours ago): Perhaps the class() function could assist you. Or maybe just wrap different access methods in a try-catch so that if one method fails you could access the data using another...
  • Jeroen Boschma (14 days 7 hours ago): Never mind, the new UI components have an HTML panel available. Works for me…
  • Alexandre (14 days 8 hours ago): Hi, Is there a way to test if data dictionnatry entry are signal, simulink parameters, variables … I need to access their value, but the access method depends on the data...
  • Nicholas (14 days 22 hours ago): In case anyone is looking for more info on the toolbar: I ran into some problems creating a toolbar with the lightweight panel. Previously, the Browser Panel had an addToolbar...
  • Jeroen Boschma (18 days 5 hours ago): I do not seem to get the scrollbars (horizontal…) working in Matlab 2020b. Snippets of init-code (all based on Yair’s snippets on this site) handles.text_explorer...
  • Yair Altman (46 days 7 hours ago): m_map is a mapping tool, not even created by MathWorks and not part of the basic Matlab system. I have no idea why you think that the customizations to the builtin bar function...
  • chengji chen (46 days 14 hours ago): Hi, I have tried the method, but it didn’t work. I plot figure by m_map toolbox, the xticklabel will add to the yticklabel at the left-down corner, so I want to move down...
  • Yair Altman (54 days 7 hours ago): @Alexander – this is correct. Matlab stopped including sqlite4java in R2021b (it was still included in 21a). You can download the open-source sqlite4java project from...
  • Alexander Eder (60 days 2 hours ago): Unfortunately Matlab stopped shipping sqlite4java starting with R2021(b?)
  • K (66 days 13 hours ago): Is there a way to programmatically manage which figure gets placed where? Let’s say I have 5 figures docked, and I split it into 2 x 1, I want to place 3 specific figures on the...
Contact us
Captcha image for Custom Contact Forms plugin. You must type the numbers shown in the image
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top