Toolbox – Undocumented Matlab

Speeding-up builtin Matlab functions – part 2

Yair Altman — Sun, 06 May 2018 16:26:19 +0000

Last week I showed how we can speed-up built-in Matlab functions, by creating local copies of the relevant m-files and then optimizing them for improved speed using a variety of techniques. Today I will show another example of such speed-up, this time of the Financial Toolbox’s maxdrawdown function, which is widely used to estimate the relative risk of a trading strategy or asset. One might think that such a basic indicator would be optimized for speed, but experience shows otherwise. In fact, this function turned out to be the main run-time performance hotspot for one of my clients. The vast majority of his code was optimized for speed, and he naturally assumed that the built-in Matlab functions were optimized as well, but this was not the case. Fortunately, I was able to create an equivalent version that was 30-40 times faster (!), and my client remains a loyal Matlab fan.

In today’s post I will show how I achieved this speed-up, using different methods than the ones I showed last week. A combination of these techniques can be used in a wide range of other Matlab functions. Additional speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

Profiling

As I explained last week, the first step in speeding up any function is to profile its current run-time behavior using Matlab’s builtin Profiler tool, which can either be started from the Matlab Editor toolstrip (“Run and Time”) or via the profile function.

The profile report for the client’s function showed that it had two separate performance hotspots:

Code that checks the drawdown format (optional 2nd input argument) against a set of allowed formats
Main code section that iteratively loops over the data-series values to compute the maximal drawdown

In order top optimize the code, I copied %matlabroot%/toolbox/finance/finance/maxdrawdown.m to a local folder on the Matlab path, renaming the file (and the function) maxdrawdn, in order to avoid conflict with the built-in version.

Optimizing input args pre-processing

The main problem with the pre-processing of the optional format input argument is the string comparisons, which are being done even when the default format is used (which is by far the most common case). String comparison are often more expensive than numerical computations. Each comparison by itself is very short, but when maxdrawdown is run in a loop (as it often is), the run-time adds up.

Here’s a snippet of the original code:

if nargin < 2 || isempty(Format)
    Format = 'return';
end
if ~ischar(Format) || size(Format,1) ~= 1
    error(message('finance:maxdrawdown:InvalidFormatType'));
end
choice = find(strncmpi(Format,{'return','arithmetic','geometric'},length(Format)));
if isempty(choice)
    error(message('finance:maxdrawdown:InvalidFormatValue'));
end

An equivalent code, which avoids any string processing in the common default case, is faster, simpler and more readable:

if nargin < 2 || isempty(Format)
    choice = 1;
elseif ~ischar(Format) || size(Format,1) ~= 1
    error(message('finance:maxdrawdown:InvalidFormatType'));
else
    choice = find(strncmpi(Format,{'return','arithmetic','geometric'},length(Format)));
    if isempty(choice)
        error(message('finance:maxdrawdown:InvalidFormatValue'));
    end
end

The general rule is that whenever you have a common case, you should check it first, avoiding unnecessary processing downstream. Moreover, for improved run-time performance (although not necessarily maintainability), it is generally preferable to work with numbers rather than strings (choice rather than Format, in our case).

Vectorizing the main loop

The main processing loop uses a very simple yet inefficient iterative loop. I assume that the code was originally developed this way in order to assist debugging and to ensure correctness, and that once it was ready nobody took the time to also optimize it for speed. It looks something like this:

MaxDD = zeros(1,N);
MaxDDIndex = ones(2,N);
...
if choice == 1   % 'return' format
    MaxData = Data(1,:);
    MaxIndex = ones(1,N);
    for i = 1:T
        MaxData = max(MaxData, Data(i,:));
        q = MaxData == Data(i,:);
        MaxIndex(1,q) = i;
        DD = (MaxData - Data(i,:)) ./ MaxData;
        if any(DD > MaxDD)
            p = DD > MaxDD;
            MaxDD(p) = DD(p);
            MaxDDIndex(1,p) = MaxIndex(p);
            MaxDDIndex(2,p) = i;
        end
    end
else             % 'arithmetic' or 'geometric' formats
    ...

This loop can relatively easily be vectorized, making the code much faster, and arguably also simpler, more readable, and more maintainable:

if choice == 3
    Data = log(Data);
end
MaxDDIndex = ones(2,N);
MaxData = cummax(Data);
MaxIndexes = find(MaxData==Data);
DD = MaxData - Data;
if choice == 1	% 'return' format
    DD = DD ./ MaxData;
end
MaxDD = cummax(DD);
MaxIndex2 = find(MaxDD==DD,1,'last');
MaxIndex1 = MaxIndexes(find(MaxIndexes<=MaxIndex2,1,'last'));
MaxDDIndex(1,:) = MaxIndex1;
MaxDDIndex(2,:) = MaxIndex2;
MaxDD = MaxDD(end,:);

Let’s make a short run-time and accuracy check – we can see that we achieved a 31-fold speedup (YMMV), and received the exact same results:

>> data = rand(1,1e7);
 
>> tic, [MaxDD1, MaxDDIndex1] = maxdrawdown(data); toc  % builtin Matlab function
Elapsed time is 7.847140 seconds.
 
>> tic, [MaxDD2, MaxDDIndex2] = maxdrawdn(data); toc  % our optimized version
Elapsed time is 0.253130 seconds.
 
>> speedup = round(7.847140 / 0.253130)
speedup =
    31
 
>> isequal(MaxDD1,MaxDD2) && isequal(MaxDDIndex1,MaxDDIndex2)  % check accuracy
ans =
  logical
   1

Disclaimer: The code above seems to work (quite well in fact) for a 1D data vector. You’d need to modify it a bit to handle 2D data – the returned maximal drawdown are still computed correctly but not the returned indices, due to their computation using the find function. This modification is left as an exercise for the reader…

Very similar code could be used for the corresponding maxdrawup function. Although this function is used much less often than maxdrawdown, it is in fact widely used and very similar to maxdrawdown, so it is surprising that it is missing in the Financial Toolbox. Here is the corresponding code snippet:

% Code snippet for maxdrawup (similar to maxdrawdn)
MaxDUIndex = ones(2,N);
MinData = cummin(Data);
MinIndexes = find(MinData==Data);
DU = Data - MinData;
if choice == 1	% 'return' format
    DU = DU ./ MinData;
end
MaxDU = cummax(DU);
MaxIndex = find(MaxDD==DD,1,'last');
MinIndex = MinIndexes(find(MinIndexes<=MaxIndex,1,'last'));
MaxDUIndex(1,:) = MinIndex;
MaxDUIndex(2,:) = MaxIndex;
MaxDU = MaxDU(end,:);

Similar vectorization could be applied to the emaxdrawdown function. This too is left as an exercise for the reader…

Conclusions

Matlab is composed of thousands of internal functions. Each and every one of these functions was meticulously developed and tested by engineers, who are after all only human. Whereas supreme emphasis is always placed with Matlab functions on their accuracy, run-time performance sometimes takes a back-seat. Make no mistake about this: code accuracy is almost always more important than speed (an exception are cases where some accuracy may be sacrificed for improved run-time performance). So I’m not complaining about the current state of affairs.

But when we run into a specific run-time problem in our Matlab program, we should not despair if we see that built-in functions cause slowdown. We can try to avoid calling those functions (for example, by reducing the number of invocations, or limiting the target accuracy, etc.), or optimize these functions in our own local copy, as I’ve shown last week and today. There are multiple techniques that we could employ to improve the run time. Just use the profiler and keep an open mind about alternative speed-up mechanisms, and you’d be half-way there.

Let me know if you’d like me to assist with your Matlab project, either developing it from scratch or improving your existing code, or just training you in how to improve your Matlab code’s run-time/robustness/usability/appearance. I will be visiting Boston and New York in early June and would be happy to meet you in person to discuss your specific needs.

Speeding-up builtin Matlab functions – part 1

Yair Altman — Sun, 29 Apr 2018 09:46:29 +0000

A client recently asked me to assist with his numeric analysis function – it took 45 minutes to run, which was unacceptable (5000 runs of ~0.55 secs each). The code had to run in 10 minutes or less to be useful. It turns out that 99% of the run-time was taken up by Matlab’s built-in fitdist function (part of the Statistics Toolbox), which my client was certain is already optimized for maximal performance. He therefore assumed that to get the necessary speedup he must either switch to another programming language (C/Java/Python), and/or upgrade his computer hardware at considerable expense, since parallelization was not feasible in his specific case.

Luckily, I was there to assist and was able to quickly speed-up the code down to 7 minutes, well below the required run-time. In today’s post I will show how I did this, which is relevant for a wide variety of other similar performance issues with Matlab. Many additional speed-up techniques can be found in other performance-related posts on this website, as well in my book Accelerating MATLAB Performance.

Profiling

The first step in speeding up any function is to profile its current run-time behavior using Matlab’s builtin Profiler tool, which can either be started from the Matlab Editor toolstrip (“Run and Time”) or via the profile function.

The profile report for the client’s function showed that 99% of the time was spent in the Statistics Toolbox’s fitdist function. Drilling into this function in the profiling report, we see onion-like functions that processed input parameters, ensured data validation etc. The core processing is done inside a class that is unique to each required distribution (e.g., prob.StableDistribution, prob.BetaDistribution etc.) that is invoked within fitdist using an feval call, based on the distribution name that was specified by the user.

In our specific case, the external onion layers of sanity checks were unnecessary and could be avoided. In general, I advise not to discard such checks, because you never know whether future uses might have a problem with outlier data or input parameters. Moreover, in the specific case of fitdist they take only a very minor portion of the run-time (this may be different in other cases, such as the ismember function that I described years ago, where the sanity checks have a significant run-time impact compared to the core processing in the internal ismembc function).

However, since we wanted to significantly improve the total run-time and this was spent within the distribution class (prob.StableDistribution in the case of my client), we continue to drill-down into this class to determine what can be done.

It turns out that prob.StableDistribution basically does 3 things in its main fit() method:

pre-process the input data (prob.ToolboxFittableParametricDistribution.processFitArgs() and .removeCensoring() methods) – this turned out to be unnecessary in my client’s data, but has little run-time impact.
call stablefit() in order to get fitting parameters – this took about half the run-time
call stablelike() in order to get likelihood data – this took about half the run-time as well
call prob.StableDistribution.makeFitted() to create a probability-distribution object returned to the caller – this also took little run-time that was not worth bothering about.

The speed-up improvement process

With user-created code we could simply modify our code in-place. However, a more careful process is necessary when modifying built-in Matlab functions (either in the core Matlab distribution or in one of the toolboxes).

The basic idea here is to create a side-function that would replicate the core processing of fitdist. This is preferable to modifying Matlab’s installation files because we could then reuse the new function in multiple computers, without having to mess around in the Matlab installation (which may not even be possible if we do not have admin privileges). Also, if we ever upgrade our Matlab we won’t need to remember to update the installed files (and obviously retest).

I called the new function fitdist2 and inside it I initially placed only the very core functionality of prob.StableDistribution.fit():

% fitdist2 - fast replacement for fitdist(data,'stable')
% equivalent to pd = prob.StableDistribution.fit(data);
function pd = fitdist2(data)
    % Bypass pre-processing (not very important)
    [cens,freq,opt] = deal([]);
    %[data,cens,freq,opt] = prob.ToolboxFittableParametricDistribution.processFitArgs(data);
    %data = prob.ToolboxFittableParametricDistribution.removeCensoring(data,cens,freq,'stable');
 
    % Main processing in stablefit(), stablelike()
    params = stablefit(data,0.05,opt);
    [nll,cov] = stablelike(params,data);
 
    % Combine results into the returned probability distribution object
    pd = prob.StableDistribution.makeFitted(params,nll,cov,data,cens,freq);
end

If we try to run this as-is, we’d see errors because stablefit() and stablelike() are both sub-functions within %matlabroot%/toolbox/stats/stats/+prob/StableDistribution.m. So we copy these sub-functions (and their dependent subfunctions infoMtxCal(), intMle(), tabpdf(), neglog_pdf(), stable_nloglf(), varTrans) to the bottom of our fitdist2.m file, about 400 lines in total.

We also remove places that call checkargs(…) since that seems to be unnecessary – if you want to keep it then add the checkargs() function as well.

Now we re-run our code, after each speedup iteration verifying that the returned pd object returned by our fitdist2 is equivalent to the original object returned by fitdist.

Speeding-up stablefit()

A new profiling run shows that the vast majority of the time in stablefit() is spent in two lines:

s = load('private/StablePdfTable.mat');
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);

The first of these lines is reloading a static data file. The very same static data file is later reloaded in stablelike(). Both of these data-loads is done in every single invocation of fitdist, so if we have 5000 data fits, we load the same static data file 10,000 times! This is certainly not indicative of good programming practice. It would be much faster to reload the static data once into memory, and then use this cached data for the next 9,999 invocation. Since the original authors of StableDistribution.m seem to like single-character global variables (another bad programming practice, for multiple reasons), we’ll follow their example (added lines are highlighted):

persistent s  % this should have a more meaningful name (but at least is not global...)!if isempty(s)    fit_path = fileparts(which('fitdist'));    s = load([fit_path '/private/StablePdfTable.mat']);
    a = s.a;
    b = s.b;
    xgd = s.xgd;
    p = s.p;
end

In order to speed-up the second line (that calls fminsearch), we can reduce the tolerances used by this function, by updating the options structure passed to it, so that we use tolerances of 1e-3 rather than the default 1e-6 (in our specific case this resulted in acceptable errors of ~0.1%). Specifically, we modify the code from this:

function [parmhat,parmci] = stablefit(x,alpha,options)
...
if nargin < 3 || isempty(options)
    options = statset('stablefit');
else
    options = statset(statset('stablefit'),options);
end
 
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

to this (note that the line that actually calls fminsearch remains unchanged):

function [parmhat,parmci] = stablefit(x,alpha,unused_options)...
persistent optionsif isempty(options)    options = statset('stablefit');
    options.TolX   = 1e-3;    options.TolFun = 1e-3;    options.TolBnd = 1e-3;end
 
% Maximize the log-likelihood with respect to the transformed parameters
[parmhat,~,err,output] = fminsearch(@(params)stable_nloglf(x,params),phi0,options);
...
end

The fminsearch internally calls tabpdf() repeatedly. Drilling down in the profiling report we see that it recomputes a griddedInterpolant object that is essentially the same for all iterations (and therefore a prime candidate for caching), and also that it uses the costly cubic interpolation rather than a more straight-forward linear interpolation:

function y = tabpdf(x,alpha,beta)
...
persistent G  % this should have a more meaningful name (but at least is not global...)!if isempty(G)    G = griddedInterpolant({b, a, xgd}, p, 'linear','none');  % 'linear' instead of 'cubic'end%G = griddedInterpolant({b, a, xgd}, p, 'cubic','none');  % original
y = G({beta,alpha,x});
...

These cases illustrate two important speedup technique categories: caching data in order to reduce the number of times that a certain code hotspot is being run, and modifying the parameters/inputs in order to reduce the individual run-time of the hotspot code. Variations of these techniques form the essence of effective speedup and can often be achieved by just reflecting on the problem and asking yourself two questions:

can I reduce the number of times that this code is being run? and
can I reduce the run-time of this specific code section?

Additional important speed-up categories include parallelization, vectorization and algorithmic modification. These are sometimes more difficult programmatically and riskier in terms of functional equivalence, but may be required in case the two main techniques above are not feasible. Of course, we can always combine these techniques, we don’t need to choose just one or the other.

Speeding-up stablelike()

We now turn our attentions to stablelike(). As for the loaded static file, we could simply use the cached s to load the data in order to avoid repeated reloading of the data from disk. But it turns out that this data is actually not used at all inside the function (!) so we just comment-out the old code:

%s = load('private/StablePdfTable.mat');
%a = s.a;
%b = s.b;
%xgd = s.xgd;
%p = s.p;

Think about this – the builtin Matlab code loads a data file from disk and then does absolutely nothing with it – what a waste!

Another important change is to reduce the run-time of the integral function, which is called thousands of times within a double loop. We do this by reducing the tolerances specified in the integral call from 1e-6:

F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-6,'RelTol',1e-4); % original
F(i,j) = integral(@(x)infoMtxCal(x,params,step,i,j),-Inf,Inf,'AbsTol',1e-3,'RelTol',1e-3); % new

You can see that once again these two cases follow the two techniques that I mentioned above: we reduced the number of times that we load the data file (to 0 in our case), and we improved the run-time of the individual integral calculation by reducing its tolerances.

Conclusions

The final result of applying the techniques above was a 6-fold speedup, reducing the total run-time from 45 minutes down to 7 minutes. I could probably have improved the run-time even further, but since we reached our target run-time I stopped there. The point after all was to make the code usable, not to reach a world speed record.

In my next article I will present another example of dramatically improving the run-time speed of a built-in Matlab function, this time a very commonly-used function in the Financial Toolbox that I was able to speed-up by a factor of 40.

Matlab releases improve continuously, so hopefully my techniques above (or alternatives) would find their way into the builtin Matlab functions, making them faster than today, out-of-the-box.

Until this happens, we should not lose hope when faced with a slow Matlab function, even if it is a built-in/internal one, as I hope to have clearly illustrated today, and will also show in my next article. Improving the performance is often easy. In fact, it took me much longer to write this article than it was to improve my client’s code…

Matlab compilation quirks – take 2

Yair Altman — Wed, 31 May 2017 18:00:42 +0000

Once again I would like to welcome guest blogger Hanan Kavitz of Applied Materials. Hanan posted a couple of guest posts here over the past few years, including a post last year about quirks with Matlab-compiled DLLs. Today Hanan will follow up on that post by discussing several additional quirks that they have encountered with Matlab compilations/deployment.

Don’t fix it, if it ain’t broke…

In Applied Materials Israel (PDC) we use Matlab code for both algorithm development and deployment (production). As part of the dev-ops build system, which builds our product software versions, we build Matlab artifacts (binaries) from the Matlab source code.

A typical software version has several hundreds Matlab artifacts that are automatically rebuilt on a daily basis, and we have many such versions – totaling many thousands of compilations each day.

This process takes a long time, so we were looking for a way to make it more efficient.

The idea that we chose to implement sounds simple – take a single binary module in any software version (Ex. foo.exe – Matlab-compiled exe) and check it: if the source code for this module has not changed since the last compilation then simply don’t compile it, just copy it from previous software version repository. Since most of our code doesn’t change daily (some of it hasn’t changed in years), we can skip the compilation time of most binaries and just copy them from some repository of previously compiled binaries.

In a broader look, avoiding lengthy compilations cycles by not compiling unchanged code is a common programming practice, implemented by all modern compilers. For example, the ‘make’ utility uses a ‘makefile’ to check the time stamps of all dependencies of every object file in order to decide which object requires recompilation. In reality, this is not always the best solution as time stamps may be incorrect, but it works well in the vast majority of cases.

Coming back to Matlab, now comes the hard part – how could our build system know that nothing has changed in module X and that something has changed in module Y? How does it even know which source files it needs to ensure didn’t change?

The credit for the idea goes to my manager, Lior Cohen, as follows: You can actually check the dependency of a given binary after compilation. The basis of the solution is that a Matlab executable is in fact a compressed (zip) file. The idea is then to:

Compile the binary once
Unzip the binary and “see” all your dependencies (source files are encrypted and resources are not, but we only need the list of file names – not their content).
Now build a list of all your dependency files and compute the CRC value of each from the source control. Save it for the next time you are required to compile this module.
In the next compilation cycle, find this dependency list, review it, dependency source file at a time and make sure CRC of the dependency hasn’t changed since last time.
If no dependency CRC has changed, then copy the binary from the repository of previous software version, without compiling.
Otherwise, recompile the binary and rebuild the CRC list of all dependencies again, in preparation for the next compilation cycle.

That’s it! That simple? Well… not really – the reality is a bit more complex since there are many other dependencies that need to be checked. Some of them are:

Did the requested Matlab version of the binary change since the last compilation?
Did the compilation instructions themselves (we have a sort of ‘makefile’) change?

Basically, I implemented a policy that if anything changed, or if the dependency check itself failed, then we don’t take any chances and just compile this binary. Keeping in mind that this dependencies check and file copying is much faster than a Matlab compilation, we save a lot of actual compilation time using this method.

Bottom line: Given a software version containing hundreds of compilation instructions to execute and assuming not much has changed in the version (which is often the case), we skip over 90% of compilations altogether and only rebuild what really changed. The result is a version build that takes about half an hour, instead of many hours. Moreover, since the compilation process is working significantly less, we get fewer failures, fewer stuck or crashed mcc processes, and [not less importantly] less maintenance required by me.

Note that in our implementation we rely on the undocumented fact that Matlab binaries are in fact compressed zip archives. If and when a future Matlab release will change the implementation such that the binaries will no longer be zip archives, another way will need to be devised in order to ensure the consistency of the target executable with its dependent source files.

Don’t kill it, if it ain’t bad…

I want to share a very weird issue I investigated over a year ago when using Matlab compiled exe. It started with a user showed me a Matlab compiled exe that didn’t run – I’m not talking about a regular Matlab exception: the process was crashing with an MS Windows popup window popping, stating something very obscure.

It was a very weird behavior that I couldn’t explain – the compiler seemed to work well but the compiled executable process kept crashing. Compiling completely different code showed the same behavior.

This issue has to do with the system compiler configuration that is being used. As you might know, when installing the Matlab compiler, before the first compilation is ever made, the user has to state the C compiler that the Matlab compiler should use in its compilation process. This is done by command ‘mbuild –setup’. This command asks the users to choose the C compiler and saves the configuration (batch file back then, xml in the newer versions of Matlab) in the user’s prefdir folder. At the time we were using Microsoft Visual C++ compiler 9.0 SP1.

The breakthrough in the investigation came when I ran mcc command with –verbose flag, which outputs much more compilation info than I would typically ever want… I discovered that although the target executable file had been created, a post compilation step failed to execute, while issuing a very cryptic error message:

mt.exe : general error c101008d: Failed to write the updated manifest to the resource of file “…”. Access is denied.

cryptic compilation error (click to zoom)

The failure was in one of the ‘post link’ commands in the configuration batch file – something obscure such as this:

set POSTLINK_CMDS2=mt.exe -outputresource: %MBUILD_OUTPUT_FILE_NAME%;%MANIFEST_RESOURCE% -manifest "%MANIFEST_FILE_NAME%"

This line of code takes an XML manifest file and inserts it into the generated binary file (additional details).

If you open a valid R2010a (and probably other old versions as well) Matlab-generated exe in a text editor you can actually see a small XML code embedded in it, while in a non-functioning exe I could not see this XML code.

So why would this command fail?

It turned out, as funny as it sounds, to be an antivirus issue – our IT department updated its antivirus policies and this ‘post link’ command suddenly became an illegal operation. Once our IT eased the policy, this command worked well again and the compiled executables stopped crashing, to our great joy.

Speeding up Matlab-JDBC SQL queries

Yair Altman — Wed, 16 Nov 2016 11:43:17 +0000

Many of my consulting projects involve interfacing a Matlab program to an SQL database. In such cases, using MathWorks’ Database Toolbox is a viable solution. Users who don’t have the toolbox can also easily connect directly to the database using either the standard ODBC bridge (which is horrible for performance and stability), or a direct JDBC connection (which is also what the Database Toolbox uses under the hood). I explained this Matlab-JDBC interface in detail in chapter 2 of my Matlab-Java programming book. A bare-bones implementation of an SQL SELECT query follows (data update queries are a bit different and will not be discussed here):

% Load the appropriate JDBC driver class into Matlab's memory
% (but not directly, to bypass JIT pre-processing - we must do it in run-time!)
driver = eval('com.mysql.jdbc.Driver');  % or com.microsoft.sqlserver.jdbc.SQLServerDriver or whatever
 
% Connect to DB
dbPort = '3306'; % mySQL=3306; SQLServer=1433; Oracle=...
connectionStr = ['jdbc:mysql://' dbURL ':' dbPort '/' schemaName];  % or ['jdbc:sqlserver://' dbURL ':' dbPort ';database=' schemaName ';'] or whatever
dbConnObj = java.sql.DriverManager.getConnection(connectionStr, username, password);
 
% Send an SQL query statement to the DB and get the ResultSet
stmt = dbConnObj.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE, java.sql.ResultSet.CONCUR_READ_ONLY);
try stmt.setFetchSize(1000); catch, end  % the default fetch size is ridiculously small in many DBs
rs = stmt.executeQuery(sqlQueryStr);
 
% Get the column names and data-types from the ResultSet's meta-data
MetaData = rs.getMetaData;
numCols = MetaData.getColumnCount;
data = cell(0,numCols);  % initialize
for colIdx = numCols : -1 : 1
    ColumnNames{colIdx} = char(MetaData.getColumnLabel(colIdx));
    ColumnType{colIdx}  = char(MetaData.getColumnClassName(colIdx));  % http://docs.oracle.com/javase/7/docs/api/java/sql/Types.html
end
ColumnType = regexprep(ColumnType,'.*\.','');
 
% Get the data from the ResultSet into a Matlab cell array
rowIdx = 1;
while rs.next  % loop over all ResultSet rows (records)
    for colIdx = 1 : numCols  % loop over all columns in the row
        switch ColumnType{colIdx}
            case {'Float','Double'}
                data{rowIdx,colIdx} = rs.getDouble(colIdx);
            case {'Long','Integer','Short','BigDecimal'}
                data{rowIdx,colIdx} = double(rs.getDouble(colIdx));
            case 'Boolean'
                data{rowIdx,colIdx} = logical(rs.getBoolean(colIdx));
            otherwise %case {'String','Date','Time','Timestamp'}
                data{rowIdx,colIdx} = char(rs.getString(colIdx));
        end
    end
    rowIdx = rowIdx + 1;
end
 
% Close the connection and clear resources
try rs.close();   catch, end
try stmt.close(); catch, end
try dbConnObj.closeAllStatements(); catch, end
try dbConnObj.close(); catch, end  % comment this to keep the dbConnObj open and reuse it for subsequent queries

Naturally, in a real-world implementation you also need to handle database timeouts and various other errors, handle data-manipulation queries (not just SELECTs), etc.

Anyway, this works well in general, but when you try to fetch a ResultSet that has many thousands of records you start to feel the pain – The SQL statement may execute much faster on the DB server (the time it takes for the stmt.executeQuery call), yet the subsequent double-loop processing to fetch the data from the Java ResultSet object into a Matlab cell array takes much longer.

In one of my recent projects, performance was of paramount importance, and the DB query speed from the code above was simply not good enough. You might think that this was due to the fact that the data cell array is not pre-allocated, but this turns out to be incorrect: the speed remains nearly unaffected when you pre-allocate data properly. It turns out that the main problem is due to Matlab’s non-negligible overhead in calling methods of Java objects. Since the JDBC interface only enables retrieving a single data item at a time (in other words, bulk retrieval is not possible), we have a double loop over all the data’s rows and columns, in each case calling the appropriate Java method to retrieve the data based on the column’s type. The Java methods themselves are extremely efficient, but when you add Matlab’s invocation overheads the total processing time is much much slower.

So what can be done? As Andrew Janke explained in much detail, we basically need to push our double loop down into the Java level, so that Matlab receives arrays of primitive values, which can then be processed in a vectorized manner in Matlab.

So let’s create a simple Java class to do this:

// Copyright (c) Yair Altman UndocumentedMatlab.com
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Types;
 
public class JDBC_Fetch {
 
	public static int DEFAULT_MAX_ROWS = 100000;   // default cache size = 100K rows (if DB does not support non-forward-only ResultSets)
 
	public static Object[] getData(ResultSet rs) throws SQLException {
		try {
			if (rs.last()) {  // data is available
				int numRows = rs.getRow();    // row # of the last row
				rs.beforeFirst();             // get back to the top of the ResultSet
				return getData(rs, numRows);  // fetch the data
			} else {  // no data in the ResultSet
				return null;
			}
		} catch (Exception e) {
			return getData(rs, DEFAULT_MAX_ROWS);
		}
	}
 
	public static Object[] getData(ResultSet rs, int maxRows) throws SQLException {
		// Read column number and types from the ResultSet's meta-data
		ResultSetMetaData metaData = rs.getMetaData();
		int numCols = metaData.getColumnCount();
		int[] colTypes = new int[numCols+1];
		int numDoubleCols = 0;
		int numBooleanCols = 0;
		int numStringCols = 0;
		for (int colIdx = 1; colIdx <= numCols; colIdx++) {
			int colType = metaData.getColumnType(colIdx);
			switch (colType) {
				case Types.FLOAT:
				case Types.DOUBLE:
				case Types.REAL:
					colTypes[colIdx] = 1;  // double
					numDoubleCols++;
					break;
				case Types.DECIMAL:
				case Types.INTEGER:
				case Types.TINYINT:
				case Types.SMALLINT:
				case Types.BIGINT:
					colTypes[colIdx] = 1;  // double
					numDoubleCols++;
					break;
				case Types.BIT:
				case Types.BOOLEAN:
					colTypes[colIdx] = 2;  // boolean
					numBooleanCols++;
					break;
				default: // 'String','Date','Time','Timestamp',...
					colTypes[colIdx] = 3;  // string
					numStringCols++;
			}
		}
 
		// Loop over all ResultSet rows, reading the data into the 2D matrix caches
		int rowIdx = 0;
		double [][] dataCacheDouble  = new double [numDoubleCols] [maxRows];
		boolean[][] dataCacheBoolean = new boolean[numBooleanCols][maxRows];
		String [][] dataCacheString  = new String [numStringCols] [maxRows];
		while (rs.next() && rowIdx < maxRows) {
			int doubleColIdx = 0;
			int booleanColIdx = 0;
			int stringColIdx = 0;
			for (int colIdx = 1; colIdx <= numCols; colIdx++) {
				try {
					switch (colTypes[colIdx]) {
						case 1:  dataCacheDouble[doubleColIdx++][rowIdx]   = rs.getDouble(colIdx);   break;  // numeric
						case 2:  dataCacheBoolean[booleanColIdx++][rowIdx] = rs.getBoolean(colIdx);  break;  // boolean
						default: dataCacheString[stringColIdx++][rowIdx]   = rs.getString(colIdx);   break;  // string
					}
				} catch (Exception e) {
					System.out.println(e);
					System.out.println(" in row #" + rowIdx + ", col #" + colIdx);
				}
			}
			rowIdx++;
		}
 
		// Return only the actual data in the ResultSet
		int doubleColIdx = 0;
		int booleanColIdx = 0;
		int stringColIdx = 0;
		Object[] data = new Object[numCols];
		for (int colIdx = 1; colIdx <= numCols; colIdx++) {
			switch (colTypes[colIdx]) {
				case 1:   data[colIdx-1] = dataCacheDouble[doubleColIdx++];    break;  // numeric
				case 2:   data[colIdx-1] = dataCacheBoolean[booleanColIdx++];  break;  // boolean
				default:  data[colIdx-1] = dataCacheString[stringColIdx++];            // string
			}
		}
		return data;
	}
}

So now we have a JDBC_Fetch class that we can use in our Matlab code, replacing the slow double loop with a single call to JDBC_Fetch.getData(), followed by vectorized conversion into a Matlab cell array (matrix):

% Get the data from the ResultSet using the JDBC_Fetch wrapper
data = cell(JDBC_Fetch.getData(rs));
for colIdx = 1 : numCols
   switch ColumnType{colIdx}
      case {'Float','Double'}
          data{colIdx} = num2cell(data{colIdx});
      case {'Long','Integer','Short','BigDecimal'}
          data{colIdx} = num2cell(data{colIdx});
      case 'Boolean'
          data{colIdx} = num2cell(data{colIdx});
      otherwise %case {'String','Date','Time','Timestamp'}
          %data{colIdx} = cell(data{colIdx});  % no need to do anything here!
   end
end
data = [data{:}];

On my specific program the resulting speedup was 15x (this is not a typo: 15 times faster). My fetches are no longer limited by the Matlab post-processing, but rather by the DB’s processing of the SQL statement (where DB indexes, clustering, SQL tuning etc. come into play).

Additional speedups can be achieved by parsing dates at the Java level (rather than returning strings), as well as several other tweaks in the Java and Matlab code (refer to Andrew Janke’s post for some ideas). But certainly the main benefit (the 80% of the gain that was achieved in 20% of the worktime) is due to the above push of the main double processing loop down into the Java level, leaving Matlab with just a single Java call to JDBC_Fetch.

Many additional ideas of speeding up database queries and Matlab programs in general can be found in my second book, Accelerating Matlab Performance.

If you’d like me to help you speed up your Matlab program, please email me (altmany at gmail), or fill out the query form on my consulting page.

Simple GUI Tabs for Advanced Matlab Trading App

Yair Altman — Wed, 17 Feb 2016 18:00:40 +0000

I’d like to introduce guest blogger Alex Boykov, one of the developers of the Walk-Forward Analysis Toolbox for Matlab (WFAToolbox), which enables accelerated trading strategies development using Matlab. Today, Alex will explain how they used tabs in a way that can be replicated by any other Matlab GUI, not necessarily having the latest Matlab release.

In this post, we want to tell you about how we solved the problem of tab creation for WFAToolbox. We required the following criteria:

The tabs need to be attractive and look like tabs, not like buttons with panels
The tabs need to have been drawn using the editor GUIDE so that the contents of the tab panel can be easily edited
The tabs can be easily added and removed without significant code additions. They must be simple to use in different projects and tasks

The sophisticated user of Matlab might think that this is a trivial objective, seeing as there are numerous solutions for this problem in the Matlab Exchange and since Matlab R2014b, it supports creating native tabs with the help of uitab and uitabgroup functions. Also, with the addition of App Designer, it might appear that this issue will be solved with the new interface for GUI creation; tabs can be created right in the editor. However, in this post, we will attempt to explain why none of the methods above fit the three criteria stated and we will present our own solution for the tabs.

Regardless of the fact that we only took on the problem in 2013, when we first started creating our WFAToolbox, at the moment of writing this article (January 2016), this problem is still a relevant issue for many Matlab users. After the release of R2016a, it is doubtful the problem will be entirely solved. This is why we created our own example of a code which we have released on the Matlab File Exchange (see below).

Tab-enabled WFAToolbox (Matlab app for algorithmic trading)

1. The tabs have to look like tabs

When we created WFAToolbox, our goal was to create an application which would allow everyone interested to create a strategy for trading on the financial markets to be able to do so, along with having the opportunity to use the full potential of Matlab and its progressive tools, including genetic algorithms, parallel computing, econometrics, neural networks, and much, much more (basically, any data analysis that can be done in Matlab). At the same time, we do not want our users to spend time on developing an advanced software environment for testing, analysis, and strategy execution, but rather to do it from an easy-to use GUI. Thus, in WFAToolbox, you can create, test, and, finally, launch your own trading strategy or test a hypothesis within minutes, even with little prior knowledge of Matlab programming.

Of course, in order to fit these features into a single application, guarantee that it would be easy to understand even by beginners, and that it would be simple to operate, it was necessary to pay special attention to the graphic interface. In our opinion, perhaps the most intelligent solution for placing the many controls and functions necessary for sophisticated applications is by creating tabs. Because we knew that we were not the only ones who thought this way, we started looking for examples of codes that were previously created in the Matlab Exchange. We were very surprised when we found only a few solutions, most of which did not even match our first criteria of tab attractiveness! Unfortunately, a majority of them were old and quite unattractive (they looked more like buttons with panels). Even the new App Designer has tabs that in our eyes look more like buttons than tabs.

Having tried a lot of these utilities in our test versions, we came to the conclusion that Tab Panel Constructor v.2.8 would be the best option for us. It fits all three criteria above. In 2013, we used it quite successfully in our first versions of WFAToolbox. Everything looked great, but, unfortunately, it later turned out that the problem was far from being solved.

Tab-enabled WFAToolbox (Matlab app for algorithmic trading)

2. The tabs need to be created through GUIDE

Unfortunately, with time, it turned out that with the newer version of Matlab it didn’t work smoothly and the code we wanted to use as our solution practically fell apart in front of us. After adding a couple of elements in GUI, partial formatting was lost and we had to redo everything. The process of adding the tags created a lot of bugs which needed to be solved immediately.

In 2014, we already had more than 500 clients using our application. We started hearing, more and more often, that it would be great if the colors and locations of the tabs could be changed. It turned out that, depending on the operating system and Matlab version, the tab format changes. So, we made the decision to change our tabs.

By that time, a new version of Matlab was released, R2014b. It allowed us to build tabs with the help of the uitabgroup and uitab functions. The results looked exactly how we wanted: attractive, pleasant, and appeared like real tabs: UI With Tab Panel in Matlab R2014b. However, we were discouraged that they could not be created in GUIDE!

During that time, we were developing a module for WFAToolbox which would allow users to download data from Google Finance: 10,000+ free daily and intraday quotes from 20+ exchanges. Tabs were the easiest to use when switching between downloading free data from Google Finance and downloading custom user data from the Matlab Workspace. But entering so many elements through code and not through an editor? What will happen when we add 100,000+ free historical data from Yahoo Finance for futures, bonds, currency, stocks and others? We didn’t want to create all of this without the GUIDE editor! This is why we came to the conclusion that it is necessary for us to create a code of tabs, starting from scratch, so that they would correspond with all three of our criteria.

Tab-enabled WFAToolbox (Matlab app for algorithmic trading)

3. The tabs should be easy to add and edit

We chose the Simple Tab Panel, which has existed in the Matlab File Exchange since 2007, as a base for our new code because we considered it to be the most elegant and attractive example of GUIDE tabs. This solution fit our first two criteria, but we really wanted it to be universal and easy to use. We also wanted to have a simplified process of tab addition and deletion so that instead of having to copy and rewrite a large amount of code and other details, we could just add a single line of code. We wanted to save on labor costs, because we often add new features to WFAToolbox and this includes having to constantly add new elements to existing tabs, as well as adding new tabs.

So, we rewrote the code and created our own universal example so that everyone could use it to their advantage. We uploaded the code to the Matlab File Exchange, where it can be freely downloaded: Simple Optimized GUI Tab.

Next, we will describe how to use this code for tab addition and how to use the process for the implementation of tasks.

So, in order to add a new tab, you need to:

Open GUIDE and apply uipanel and uitext in a way that will make uipanel easier to work with in the future, and place uitext in a place where the tab switch will be located.
Rename the Tag of the uitext to [‘tab’,N,’text’], where N is the tab index. In our example, we are creating the 3rd tab, so our tag would be ‘tab3text’. Using this same principle, [‘tab’,N,’Panel’] needs to be renamed to tag of uipanel in the ‘tab3Panel’.
Add the name of the new tab to the TabNames variable. In our example, we use ‘Tab3’ (but you can use any name).

TabNames = {'Tab 1','Tab 2','Tab3'};

How the code was created

The primary principle of how our code works is that we create the uipanel and uitext objects in GUIDE, then we take the uitext coordinates and replace the objects to the axes and text objects. We assign a callback function to them which works when the object is clicked on. The function makes the uipanels visible/invisible and changes the colors of tab.

Let’s look at the function code SimpleOptimizedTabs2.m, which is part of the Simple Optimized GUI Tab submission.

1. Tab settings

% Settings
TabFontSize = 10;
TabNames = {'Tab 1','Tab 2'};
FigWidth = 0.265;

If we change the parameters under Settings, we can control the appearance of our GUI and tabs. So, the parameter of TabFontSize changes the font size on the tab switch, and, with the help of TabNames we can rename or add tab names, and with FigWidth, we can determine the normalized width of the GUI.

2. Changing the figure width

% Figure resize
set(handles.SimpleOptimizedTab,'Units','normalized')
pos = get(handles. SimpleOptimizedTab, 'Position');
set(handles. SimpleOptimizedTab, 'Position', [pos(1) pos(2) FigWidth pos(4)])

The GUI width changes in the code because it isn’t comfortable to manually stretch and narrow the figure. It is more beneficial to see the contents of all tabs and work with them without having to change the width every time you make a small change. If you want to make your uipanels bigger than in the example, then do this with the GUIDE editor. However, don’t forget to change the FigWidth parameter.

Please note that, due to the peculiarities of the editor, you cannot narrow a figure by height without shifting tab locations. You can only do this if you are changing the width, so we only recommend adding tabs by increasing the width of the figure and not the length.

3. Creating tabs

Do the following for each tab: obtain the uitext coordinates, which we entered into the GUI panel, and position the axes and text using these coordinates (using the necessary settings of external apparel). Using the ButtonDownFcn parameter, we can link the callback function, called ClickOnTab, in order to switch tabs when clicking on the text or axes.

% Tabs Execution
handles = TabsFun(handles,TabFontSize,TabNames);
 
% --- TabsFun creates axes and text objects for tabs
function handles = TabsFun(handles,TabFontSize,TabNames)
 
% Set the colors indicating a selected/unselected tab
handles.selectedTabColor=get(handles.tab1Panel,'BackgroundColor');
handles.unselectedTabColor=handles.selectedTabColor-0.1;
 
% Create Tabs
TabsNumber = length(TabNames);
handles.TabsNumber = TabsNumber;
TabColor = handles.selectedTabColor;
for i = 1:TabsNumber
    n = num2str(i);
 
    % Get text objects position
    set(handles.(['tab',n,'text']),'Units','normalized')
    pos=get(handles.(['tab',n,'text']),'Position');
 
    % Create axes with callback function
    handles.(['a',n]) = axes('Units','normalized',...
                    'Box','on',...
                    'XTick',[],...
                    'YTick',[],...
                    'Color',TabColor,...
                    'Position',[pos(1) pos(2) pos(3) pos(4)+0.01],...
                    'Tag',n,...
                    'ButtonDownFcn',[mfilename,'(''ClickOnTab'',gcbo,[],guidata(gcbo))']);
 
    % Create text with callback function
    handles.(['t',n]) = text('String',TabNames{i},...
                    'Units','normalized',...
                    'Position',[pos(3),pos(2)/2+pos(4)],...
                    'HorizontalAlignment','left',...
                    'VerticalAlignment','middle',...
                    'Margin',0.001,...
                    'FontSize',TabFontSize,...
                    'Backgroundcolor',TabColor,...
                    'Tag',n,...
                    'ButtonDownFcn',[mfilename,'(''ClickOnTab'',gcbo,[],guidata(gcbo))']);
 
    TabColor = handles.unselectedTabColor;
end
 
% Manage panels (place them in the correct position and manage visibilities)
set(handles.tab1Panel,'Units','normalized')
pan1pos=get(handles.tab1Panel,'Position');
set(handles.tab1text,'Visible','off')
for i = 2:TabsNumber
    n = num2str(i);
    set(handles.(['tab',n,'Panel']),'Units','normalized')
    set(handles.(['tab',n,'Panel']),'Position',pan1pos)
    set(handles.(['tab',n,'Panel']),'Visible','off')
    set(handles.(['tab',n,'text']),'Visible','off')
end

Actually, if you have long tab names and you want to change the switch size, then it you may possibly need to correct the Position parameter for the text object by adding the correcting coefficients to it. Unfortunately, this is also a feature of GUIDE. If someone can solve this problem so that the text would always be shown in the middle of the switch tab regardless of the width, we would be happy to read any suggestions in the comments to this post.

4. The callback function ClickOnTab

The callback function ClickOnTab is used every time when clicking on the tab switch and the result of the switches are visible/invisible in the uipanels and in changes to the colors of the switches.

% --- Callback function for clicking on tab
function ClickOnTab(hObject,~,handles)
m = str2double(get(hObject,'Tag'));
 
for i = 1:handles.TabsNumber;
    n = num2str(i);
    if i == m
        set(handles.(['a',n]),'Color',handles.selectedTabColor)
        set(handles.(['t',n]),'BackgroundColor',handles.selectedTabColor)
        set(handles.(['tab',n,'Panel']),'Visible','on')
    else
        set(handles.(['a',n]),'Color',handles.unselectedTabColor)
        set(handles.(['t',n]),'BackgroundColor',handles.unselectedTabColor)
        set(handles.(['tab',n,'Panel']),'Visible','off')
    end
end

More information about our Walk-Forward Analysis Toolbox for Algorithmic Trading (WFAToolbox) can be found at wfatoolbox.com.

Matlab compiler bug and workaround

Yair Altman — Wed, 28 Jan 2015 18:00:02 +0000

I recently consulted at a client who uses R2010a and compiles his code for distribution. Debugging compiled code is often tricky, since we do not have the Matlab desktop to help us debug stuff (well, actually we do have access to a scaled-down desktop for minimal debugging using some undocumented internal hooks, but that’s a topic for a separate article). In my client’s case, I needed to debug a run-time error that threw an exception to the console:

Error using strtrim
Input should be a string or a cell array of strings.
Error in updateGUI (line 121)

Sounds simple enough to debug right? Just go to updateGUI.m line #121 and fix the call to strtrim(), correct?

Well, not so fast… It turns out that updateGUI.m line #121 is an empty line surrounded on either side by one-line comments. This is certainly not the line that caused the error. The actual call to strtrim() only occurs in line #147!

What’s going on?

The answer is that the Matlab compiler has a bug with comment blocks – lines surrounded by %{ and %}:

15   %{
16       This is a comment block that is
17       ignored by the Matlab engine,
18       and causes a line-numbering
19       error in the Matlab compiler!
20   %}
21 
22   a = strtrim(3.1416);  % this generates an error

In the example above, the 6-line comment block is ignored as expected by the Matlab engine in both interactive and compiled modes. However, whereas in interactive mode the error is correctly reported for line #22, in compiled mode it is reported for line #17. Apparently the compiler replaces any block comment with a single comment line before parsing the m-file.

The workaround is simple: count all the block-comment lines that precede the reported error line in the original m-file, and add that number to the reported line. In the example above, 17 + (6-1) = 22. Note that the source code could have multiple block-comments that precede the reported line, and they should all be counted. Here is a basic code snippet that does this parsing and opens the relevant m-file at the correct location:

% Read the source file (*.m)
str = fileread(filename,'*char');
 
% Split the contents into separate lines
lines = strtrim(strsplit(str',10));
 
% Search for comment block tokens
start_idx = find(strcmp(lines,'%{'));
end_idx   = find(strcmp(lines,'%}'));
 
% Count the number of block-comment lines in the source code
delta_idx = end_idx - start_idx;
relevant_idx = sum(start_idx < reported_line_number);
total_comment_block_lines = sum(delta_idx(1:relevant_idx));
 
% Open the source file at the correct line
opentoline(filename, reported_line_number + total_comment_block_lines);

This code snippet can fairly easily be wrapped in a stand-alone utility by anyone who wishes to do so. You’d need to generate the proper full-path source-code (m-file) filename of course, since this is not reported by the deployed application in its error message. You’d also need to take care of edge-cases such as p-coded or mex files, or missing source-code m-files. You’d also need to parse the input parameters (presumably filename and reported_line_number, but possibly also other inputs).

This is another example of a bug that does not officially exist in Matlab’s public bug parade (at least, I couldn’t find it). I’ve reported other such undocumented bugs in previous articles (here and here). Please don’t start another comments tirade on MathWorks’ bug-publication policies. I think that issue has already been discussed ad nauseam.

Update: according to Tom’s comment below, this bug was apparently fixed in R2012b.

Another related bug related to block comments is that at least on some Matlab releases (I haven’t tested properly to determine which releases), block comments are NOT properly ignored by the Editor’s publish functionality. Instead, at least on those releases where the bug occurs, publishing code that includes block comments parses the block’s contents as if it was runnable code. In other words, publish treats %{ and %} as one-line comments rather than as tokens marking the ends of a block comment.

New book: Accelerating MATLAB Performance

Yair Altman — Tue, 16 Dec 2014 21:22:03 +0000

I am pleased to announce that after three years of research and hard work, following my first book on Matlab-Java programming, my new book “Accelerating MATLAB Performance” is finally published.

The Matlab programming environment is often perceived as a platform suitable for prototyping and modeling but not for “serious” applications. One of the main complaints is that Matlab is just too slow.

Accelerating MATLAB Performance (CRC Press, ISBN 9781482211290, 785 pages) aims to correct this perception, by describing multiple ways to greatly improve Matlab program speed.

The book:

Demonstrates how to profile MATLAB code for performance and resource usage, enabling users to focus on the program’s actual hotspots
Considers tradeoffs in performance tuning, horizontal vs. vertical scalability, latency vs. throughput, and perceived vs. actual performance
Explains generic speedup techniques used throughout the software industry and their adaptation for Matlab, plus methods specific to Matlab
Analyzes the effects of various data types and processing functions
Covers vectorization, parallelization (implicit and explicit), distributed computing, optimization, memory management, chunking, and caching
Explains Matlab’s memory model and shows how to profile memory usage and optimize code to reduce memory allocations and data fetches
Describes the use of GPU, MEX, FPGA, and other forms of compiled code
Details acceleration techniques for GUI, graphics, I/O, Simulink, object-oriented Matlab, Matlab startup, and deployed applications
Discusses a wide variety of MathWorks and third-party functions, utilities, libraries, and toolboxes that can help to improve performance

Ideal for novices and professionals alike, the book leaves no stone unturned. It covers all aspects of Matlab, taking a comprehensive approach to boosting Matlab performance. It is packed with thousands of helpful tips, code examples, and online references. Supported by this active website, the book will help readers rapidly attain significant reductions in development costs and program run times.

Additional information about the book, including detailed Table-of-Contents, book structure, reviews, resources and errata list, can be found in a dedicated webpage that I’ve prepared for this book and plan to maintain.

Click here to get your book copy now!
Use promo code MZK07 for a 25% discount and free worldwide shipping on crcpress.com

Instead of focusing on just a single performance aspect, I’ve attempted to cover all bases at least to some degree. The basic idea is that there are numerous different ways to speed up Matlab code: Some users might like vectorization, others may prefer parallelization, still others may choose caching, or smart algorithms, or better memory-management, or compiled C code, or improved I/O, or faster graphics. All of these alternatives are perfectly fine, and the book attempts to cover every major alternative. I hope that you will find some speedup techniques to your liking among the alternatives, and at least a few new insights that you can employ to improve your program’s speed.

I am the first to admit that this book is far from perfect. There are several topics that I would have loved to explore in greater detail, and there are probably many speedup tips that I forgot to mention or have not yet discovered. Still, with over 700 pages of speedup tips, I thought this book might be useful enough as-is, flawed as it may be. After all, it will never be perfect, but I worked very hard to make it close enough, and I really hope that you’ll agree.

If your work relies on Matlab code performance in any way, you might benefit by reading this book. If your organization has several people who might benefit, consider inviting me for dedicated onsite training on Matlab performance and other advanced Matlab topics.

As always, your comments and feedback would be greatly welcome – please post them directly on the book’s webpage.

Happy Holidays everybody!

Docs of old Matlab releases

Yair Altman — Tue, 07 Sep 2010 20:10:26 +0000

I constantly monitor the official Matlab blogs – they are interesting, well written, and I often learn something new.

A few days ago I read Steve Eddins’ latest post on his Image Processing blog. Steve pointed out a recent addition to the MathWorks.com website, which added a section of archived documentation for previous Matlab releases all the way back to R13SP2 (from 2004), including all their corresponding toolboxes.

Prehistorical Matlab documentation from Draa River
note the incomplete formula (left) and the simplistic plot figure (right)

Archived Japanese-language documentation is also available, although not in as complete a manner as English-language docs, nor in an orderly list as the English docs, for R2009b, R2008a, R2007a, R2006a, R14 and R13sp2.

Of course, the latest documentation (lately of R2010b which was released a few days ago), is always available in the website’s main Matlab documentation page.

Unfortunately, older releases are not archived at the moment. There are several versions available online (university caches etc.), but I do not know whether they have a copyright license, so you will have to find them without my help (no pun intended…). If you have the installed version, you can always access the installed documentation – that’s what I do with an R12 (Matlab 6.0, from 2000) release that I keep for my code backward-compatibility checks.

Searching the state of earlier Matlab releases has always been difficult. I often found myself wading through a long list of lengthy release notes until I found the change I was looking for, assuming it was even documented. In this blog I often refer to undocumented, under-documented or mis-documented features and these are much harder, if not impossible, to find in the release notes. When I need to use an online reference, I needed to rely on some second-hand doc archives (for example, in my recent DDE article). Having a central repository of release docs greatly simplifies some of these tasks.

As I mentioned earlier, monitoring Matlab’s official blogs is certainly worthwhile. It is very unfortunate that there is no external access to MathWork’s internal blogs – I am certain these are even more interesting! Is it possible they might describe an online section not of previous releases but rather of future ones?

Shana Tova everyone – a Happy New Year filled with new discoveries!

Solving a MATLAB bug by subclassing

Yair Altman — Sun, 07 Feb 2010 23:57:25 +0000

I would like to welcome guest blogger Matthew Whitaker. Many of Matt’s CSSM submissions offer important insight of internal Matlab functionality. As shall be seen by today’s article and some future submissions, Matt has plenty to share vis-a-vis Matlab’s undocumented functionality.

In my day-to-day work I make extensive use of MATLAB’s Image Processing Toolbox (IPT). One area of the toolbox that has seen considerable change over the last few releases has been the development of a set of modular tools to aid in GUI design for image processing applications. In this article, I examine a bug in one of those tools to illustrate how we can use the power of subclassing these objects (using an undocumented property) to design a simple and effective workaround.

The problem

The problem arose as I was refactoring some code that was written in R2006b to R2009b. The code in question uses the impoint tool on an image along with an associated text object that moves with the point to display information as it is dragged around the image. At the time of the R2006b release the impoint tool was written as an API. In R2006b the call to impoint returns a handle to an hggroup containing a structure of function handles in its application data under the tag ‘API’. This programming pattern was common before the advent of the new class syntax in MATLAB version 7.6 (R2008a).

Here is an example of how impoint would be used in R2006b:

function impointBehavior_R2006b
%IMPOINTBEHAVIOR_R2006B shows how impoint would be used in R2006b
%Note: RUN UNDER R2006B (will run under R2009b but actually uses
%classdef impoint so it will show the same issue)
 
  % Display the image in a figure window
  figure;  imshow('rice.png');
 
  % In R2006b calling impoint returns the hggroup handle
  h = impoint(gca,100,200);
 
  % In 2006b iptgetapi returns a structure of function handles
  api = iptgetapi(h);
 
  % Add a new position callback to set the text string
  api.addNewPositionCallback(@newPos_Callback);
 
  % Construct boundary constraint function so we can't go outside the axes
  fcn = makeConstrainToRectFcn('impoint',get(gca,'XLim'),get(gca,'YLim'));
  api.setDragConstraintFcn(fcn);
 
  % Fire callback so we get initial text
  newPos_Callback(api.getPosition());
 
  function newPos_Callback(newPos)
    % Display the current point position in a text label
    api.setString(sprintf('(%1.0f,%1.0f)',newPos(1),newPos(2)));
  end %newPos_Callback
 
end %impointBehavior_R2006b

The code above, when run in R2006b, produces the desired behavior of displaying a text object containing the point coordinates that moves around with the point as it is dragged around the axes.

In R2009b, impoint is now a true MATLAB class using the new classdef syntax, so I wanted to update the existing code. Initially this appeared to be a straightforward translation of the code to make use of the new impoint class syntax. The first attempt to rewrite the code was:

function impointBehavior_R2009b
%IMPOINTBEHAVIOR_R2009B shows the undesirable behavior when
%using the setString method in R2009b. 
 
  % Display the image in a figure window
  figure;  imshow('rice.png');
  h = impoint(gca,100,200);
 
  % Add a new position callback to set the text string
  h.addNewPositionCallback(@newPos_Callback);
 
  % Construct boundary constraint function so we can't go outside the axes
  fcn = makeConstrainToRectFcn('impoint',get(gca,'XLim'),get(gca,'YLim'));
 
  % Enforce boundary constraint function
  h.setPositionConstraintFcn(fcn);
 
  % Fire callback so we get initial text
  newPos_Callback(h.getPosition());
 
  function newPos_Callback(newPos)
    % Display the current point position in a text label
    h.setString(sprintf('(%1.0f,%1.0f)',newPos(1),newPos(2)))
  end %newPos_Callback
 
end %impointBehavior_R2009b

Unfortunately, when this code is run, dragging the mouse around the axes produces a trail of labels as shown below:

Before - buggy behavior

Opening up impoint.m in the editor and tracing the code revealed that impoint‘s setString method creates a new text object each time it is used. I reported this to MATLAB and the bug is now documented on the MathWorks Support site (574846).

The solution

So how do we work around this bug to get to the behavior we want? One solution would be to rewrite the offending MATLAB code but this is somewhat risky in terms of maintainability and compatibility.

A more elegant solution is to subclass the impoint class and substitute the setString behavior we want. Looking at the impoint code we find that impoint is a subclass of imroi. In the imroi property declarations we see a number of undocumented properties that are protected. We can access these properties in a subclass but not outside the class. One of these undocumented properties is h_group which is an hggroup that contains the handle graphic objects that make up the impoint on the screen. The label, when created, becomes part of this hggroup with its Tag property set to ‘label’. When performing the setString method the behavior we want to see is that if the text object exists we want to update its String property. If it does not exist we want it to perform its existing functionality:

classdef impointtextupdate < impoint
%IMPOINTTEXTUPDATE subclasses impoint to override the setString
%method of impoint so that it does not create a new text object
%each time it is called.   
 
  methods
    function obj = impointtextupdate(varargin)
      obj = obj@impoint(varargin{:});
    end %impointtextupdate
 
    function setString(obj,str)
      %override impoint setString
 
      %check to see if there is already a text label
      label = findobj(obj.h_group,'Type','text','Tag','label');
 
      if isempty(label)
        %do the default behavior
        setString@impoint(obj,str);
      else
        %update the existing tag
        set(label(1),'String',str);
      end %if
    end %setString
 
  end %methods
end %impointtextupdate

Substituting calls to impoint with the new impointupdatetext subclass now produces the desired effect as shown below:

After - expected behavior

Conclusions

This case illustrates a couple of points:

Much of the existing code in the MATLAB toolboxes is being updated to the new object oriented syntax. This presents many opportunities to easily and elegantly modify the default behavior without modifying provided toolbox code In the example above we retain all the desirable behavior of impoint while overriding the undesirable behavior.
Many of the properties and methods in the provided toolbox objects are hidden or protected and are undocumented. It takes some simple detective work to find these out through examining the code. MATLAB is very generous in providing much of the existing code openly. Open the functions and classes you use in the editor to really find out how they work. Over the years I’ve learned and adopted a lot of useful MATLAB programming patterns by examining the code in the various toolboxes (there are a few coding horrors as well!).

I hope to explore some other under-documented features of the IPT and other toolboxes in future posts.

Toolbox – Undocumented Matlab

Speeding-up builtin Matlab functions – part 2

Profiling

Optimizing input args pre-processing

Vectorizing the main loop

Related functions

Conclusions

Speeding-up builtin Matlab functions – part 1

Profiling

The speed-up improvement process

Speeding-up stablefit()

Speeding-up stablelike()

Conclusions

Matlab compilation quirks – take 2

Don’t fix it, if it ain’t broke…

Don’t kill it, if it ain’t bad…

Speeding up Matlab-JDBC SQL queries

Simple GUI Tabs for Advanced Matlab Trading App

1. The tabs have to look like tabs

2. The tabs need to be created through GUIDE

3. The tabs should be easy to add and edit

How the code was created

1. Tab settings

2. Changing the figure width

3. Creating tabs

4. The callback function ClickOnTab

Matlab compiler bug and workaround

New book: Accelerating MATLAB Performance

Docs of old Matlab releases

Solving a MATLAB bug by subclassing

The problem

The solution

Conclusions