A few days ago, a reader on StackOverflow asked whether it is possible to improve the performance of Matlab’s built-in * datenum* function. This question reminded me of a similar case that I answered exactly two years ago, of improving the performance of the built-in

*function.*

**ismember**In both cases, the solution to the performance question can be found by simply using Matlab’s built-in profiler in order to extract just the core processing functionality. It is often found that in a particular situation there is no need for all the input arguments data validity checks, and under some known limitations we can indeed use the core functionality directly.

In the case of * ismember*, it turned out that if we are assured in advance that the input data are sorted non-sparse non-NaN values, then we can use the undocumented built-in helper functions

*or*

**ismembc***for much-improved performance over the standard*

**ismembc2***. Both*

**ismember***and*

**ismembc***happen to be mex files, although this is not always the case for helper functions.*

**ismembc2**Our * datenum* case is very similar. It turns out that

*uses the undocumented built-in helper function*

**datenum***for the actual processing – converting a date from text to floating-point number. As I noted in my response to the StackOverflow question, we can directly use this helper function for improved performance: On my particular computer,*

**dtstr2dtnummx***is over 3 times faster than the standard*

**dtstr2dtnummx***function:*

**datenum**% Fast - using dtstr2dtnummx >> tic, for i=1:1000; dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss'); end; dateNum,toc dateNum = 734484.514722222 Elapsed time is 0.218423 seconds. % Slower - using datenum >> tic, for i=1:1000; dateNum=datenum({'2010-12-12 12:21:12.123'},'yyyy-mm-dd HH:MM:SS'); end; dateNum,toc dateNum = 734484.514722222 % Same value as dtstr2dtnummx - good! Elapsed time is 0.658352 seconds. % 3x slower than dtstr2dtnummx - bad!

While the difference in timing may appear negligible, if you are using this function to parse a text file with thousands of lines, each with its own timestamp, then these seemingly negligible time differences quickly add up. Of course, this only makes sense to do if you find out (using the profiler again) that this date parsing is a performance hotspot in your particular application. It was indeed such a performance hotspot in one of my applications, as it apparently was also for the original poster on StackOverflow.

Like * ismembc*,

*is an internal mex function. On my Windows system it is located in C:\Program Files\Matlab\R2011a\toolbox\matlab\timefun\private\dtstr2dtnummx.mexw32. It will have a different extension non-Windows systems, but you will easily find it in its containing folder.*

**dtstr2dtnummx**To gain access to * dtstr2dtnummx*, simply add its folder to the Matlab path using the

*function, or copy the dtstr2dtnummx.mexw32 file to another folder that is already on your Matlab path.*

**addpath**Note that the string format is different between * dtstr2dtnummx* and

*: In the test case above,*

**datenum***used*

**dtstr2dtnummx**`'yyyy-MM-dd HH:mm:ss'`

, while *required*

**datenum**`'yyyy-`**mm**-dd HH:**MM:SS**'

. I have no idea why MathWorks did not keep consistent formatting strings. But because of this, we need to be extra careful (example1, example2). If you are interested in finding out how the *format strings translates into a*

**datenum***, take a look at the helper function*

**dtstr2dtnummx***, which is a very readable m-file located in the same folder as*

**cnv2icudf***.*

**dtstr2dtnummx**To those interested, the folder that contains * dtstr2dtnummx* also contains some other interesting date conversion functions, so explore and enjoy!

Perhaps the main lesson that can be learned from this article, and its * ismembc* predecessor of two years ago, is that it is very useful to profile the code for performance hotspots. When such a hotspot is found, don’t stop your profiling at the built-in Matlab functions – keep digging in the profiler results and perhaps you’ll find that you can improve performance by taking an internal shortcut.

Have you discovered any other performance shortcuts in a built-in Matlab function? If so, please post a comment to tell us all about it.

Related posts:

- datestr performance Caching is a simple and very effective means to improve code performance, as demonstrated for the datestr function....
- Plot performance Undocumented inner plot mechanisms can be used to significantly improved plotting performance...
- Performance: scatter vs. line In many circumstances, the line function can generate visually-identical plots as the scatter function, much faster...
- cellfun – undocumented performance boost Matlab's built-in cellfun function has an undocumented option to significantly improve performance in some cases....
- Matrix processing performance Matrix operations performance is affected by internal subscriptions in a counter-intuitive way....
- Matlab-Java memory leaks, performance Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...

I’ve published a C-Mex function for the conversion to date numbers: http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num

E.g. for the ‘yyyy-mm-dd HH:MM:SS’ format I get these timings (Matlab 2009a, 1.5GHz Pentium-M, 1000 iterations as in your example):

And for a {1 x 1000} cell string:

The speed is based on two methods: 1. the format has to be specified by a very limited set of 6 most common formats. 2. The value is not checked for validity: While DATENUM and DTSTR2DTNUMMX recognize ’2011-04-180′ more or less correctly as 179th day after 2011-04-01, DateStr2Num fails and does even not catch ’2011-04-AB’ as an error. To calculate the fractional part for ’25:61:62′, the overflow can be ignored fortunately.

Therefore, if you know that the date string is valid, a very simple C-code can be 100 times faster than DATENUM and 25 times faster than DSTR2DNUMMX.

The C-Mex DATENUMMX.c was part of Matlab 6.5. What a pitty that modern Matlab versions include less of such cookies.

Kind regards, Jan

@Jan – thanks for the tip. I remember being impressed with your datestr2num utility back when, but I simply forgot it lately. So your comment is right on.

In general the basic lesson here, as elsewhere in Matlab, is that wherever we can remove unnecessary input formats, options and validity checks, this could greatly increase the performance.

Pingback: Datenum performance | Undocumented Matlab | 零度季节

Hi,

The built-in DATENUMMX converts [1 x 6] date vectors 4 times faster than DATENUM. It is at least included in Matlab 5.3 to 2009a – I cannot check this in newer versions. As said already, the source code datenummx.c was shipped with Matlab 6.5.

DATENUMMX is still available in the latest Matlab release (R2011a)

I have another Tip for such processing. For given vectors, it could save a huge amount of time:

You want to compute the vector A and become the results in the variable Result:

Use the unique function before:

For a vector A with a lot of same values it saved my life

Pingback: datestr performance | Undocumented Matlab

Pingback: sprintfc – undocumented helper function | Undocumented Matlab

Hello,

Mathworks seem to have done some black magic – when I used older Matlab (2011), datenum conversion of a time vector (2.1*10^6 entries) with a given format took 600+ seconds. With DTSTR2DTNUMMX, it took 220 s. In Matlab 2013, it takes 28s only and seems to give a correct answer too!

Best,

Jakub