A few days ago, a reader on StackOverflow asked whether it is possible to improve the performance of Matlab’s built-in * datenum* function. This question reminded me of a similar case that I answered exactly two years ago, of improving the performance of the built-in

*function.*

**ismember**In both cases, the solution to the performance question can be found by simply using Matlab’s built-in profiler in order to extract just the core processing functionality. It is often found that in a particular situation there is no need for all the input arguments data validity checks, and under some known limitations we can indeed use the core functionality directly.

In the case of * ismember*, it turned out that if we are assured in advance that the input data are sorted non-sparse non-NaN values, then we can use the undocumented built-in helper functions

*or*

**ismembc***for much-improved performance over the standard*

**ismembc2***. Both*

**ismember***and*

**ismembc***happen to be mex files, although this is not always the case for helper functions.*

**ismembc2**Our * datenum* case is very similar. It turns out that

*uses the undocumented built-in helper function*

**datenum***for the actual processing – converting a date from text to floating-point number. As I noted in my response to the StackOverflow question, we can directly use this helper function for improved performance: On my particular computer,*

**dtstr2dtnummx***is over 3 times faster than the standard*

**dtstr2dtnummx***function:*

**datenum**% Fast - using dtstr2dtnummx >> tic, for i=1:1000; dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss'); end; dateNum,toc dateNum = 734484.514722222 Elapsed time is 0.218423 seconds. % Slower - using datenum >> tic, for i=1:1000; dateNum=datenum({'2010-12-12 12:21:12.123'},'yyyy-mm-dd HH:MM:SS'); end; dateNum,toc dateNum = 734484.514722222 % Same value as dtstr2dtnummx - good! Elapsed time is 0.658352 seconds. % 3x slower than dtstr2dtnummx - bad!

While the difference in timing may appear negligible, if you are using this function to parse a text file with thousands of lines, each with its own timestamp, then these seemingly negligible time differences quickly add up. Of course, this only makes sense to do if you find out (using the profiler again) that this date parsing is a performance hotspot in your particular application. It was indeed such a performance hotspot in one of my applications, as it apparently was also for the original poster on StackOverflow.

Like * ismembc*,

*is an internal mex function. On my Windows system it is located in C:\Program Files\Matlab\R2011a\toolbox\matlab\timefun\private\dtstr2dtnummx.mexw32. It will have a different extension non-Windows systems, but you will easily find it in its containing folder.*

**dtstr2dtnummx**To gain access to * dtstr2dtnummx*, simply add its folder to the Matlab path using the

*function, or copy the dtstr2dtnummx.mexw32 file to another folder that is already on your Matlab path.*

**addpath**Note that the string format is different between * dtstr2dtnummx* and

*: In the test case above,*

**datenum***used*

**dtstr2dtnummx**`'yyyy-MM-dd HH:mm:ss'`

, while *required*

**datenum**`'yyyy-`**mm**-dd HH:**MM:SS**'

. I have no idea why MathWorks did not keep consistent formatting strings. But because of this, we need to be extra careful (example1, example2). If you are interested in finding out how the *format strings translates into a*

**datenum***, take a look at the helper function*

**dtstr2dtnummx***, which is a very readable m-file located in the same folder as*

**cnv2icudf***.*

**dtstr2dtnummx**To those interested, the folder that contains * dtstr2dtnummx* also contains some other interesting date conversion functions, so explore and enjoy!

Perhaps the main lesson that can be learned from this article, and its * ismembc* predecessor of two years ago, is that it is very useful to profile the code for performance hotspots. When such a hotspot is found, don’t stop your profiling at the built-in Matlab functions – keep digging in the profiler results and perhaps you’ll find that you can improve performance by taking an internal shortcut.

Have you discovered any other performance shortcuts in a built-in Matlab function? If so, please post a comment to tell us all about it.

I’ve published a C-Mex function for the conversion to date numbers: http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num

E.g. for the ‘yyyy-mm-dd HH:MM:SS’ format I get these timings (Matlab 2009a, 1.5GHz Pentium-M, 1000 iterations as in your example):

And for a {1 x 1000} cell string:

The speed is based on two methods: 1. the format has to be specified by a very limited set of 6 most common formats. 2. The value is not checked for validity: While DATENUM and DTSTR2DTNUMMX recognize ‘2011-04-180’ more or less correctly as 179th day after 2011-04-01, DateStr2Num fails and does even not catch ‘2011-04-AB’ as an error. To calculate the fractional part for ’25:61:62′, the overflow can be ignored fortunately.

Therefore, if you know that the date string is valid, a very simple C-code can be 100 times faster than DATENUM and 25 times faster than DSTR2DNUMMX.

The C-Mex DATENUMMX.c was part of Matlab 6.5. What a pitty that modern Matlab versions include less of such cookies.

Kind regards, Jan

@Jan – thanks for the tip. I remember being impressed with your datestr2num utility back when, but I simply forgot it lately. So your comment is right on.

In general the basic lesson here, as elsewhere in Matlab, is that wherever we can remove unnecessary input formats, options and validity checks, this could greatly increase the performance.

Pingback: Datenum performance | Undocumented Matlab | 零度季节

Hi,

The built-in DATENUMMX converts [1 x 6] date vectors 4 times faster than DATENUM. It is at least included in Matlab 5.3 to 2009a – I cannot check this in newer versions. As said already, the source code datenummx.c was shipped with Matlab 6.5.

DATENUMMX is still available in the latest Matlab release (R2011a)

I have another Tip for such processing. For given vectors, it could save a huge amount of time:

You want to compute the vector A and become the results in the variable Result:

Use the unique function before:

For a vector A with a lot of same values it saved my life

Pingback: datestr performance | Undocumented Matlab

Pingback: sprintfc – undocumented helper function | Undocumented Matlab

Hello,

Mathworks seem to have done some black magic – when I used older Matlab (2011), datenum conversion of a time vector (2.1*10^6 entries) with a given format took 600+ seconds. With DTSTR2DTNUMMX, it took 220 s. In Matlab 2013, it takes 28s only and seems to give a correct answer too!

Best,

Jakub

Hi

Any idea where dtstr2dtnummx has disappeared to in the newer versions. Can’t find it in 2015a, for example. I know it’s there because I can call it but for the life of me I can’t find it.

Regards,

Phil

`which dtstr2dtnummx`

Yair, Thanks so much on this. I have a particular problem that using dtstr2dtnummx doesn’t solve, and I was wondering if you knew of a simple fix. You are correct that dtstr2dtnummx is much faster but if you need milliseconds, this doesn’t seem to catch that. For instance, using your code above but adding milliseconds to the time string gives two different results.

The difference is the added millisecond value. Even adding the “.FFF” to the format string doesn’t seem to catch the milliseconds in the faster case. This must happen outside of the dtstr2dtnummx function.

@David – there is indeed an answer to this but I make it a personal point not to answer any pro-bono questions from JHU-APL following a few cases in previous years where I felt that my goodwill was taken advantage of by your peers. If you want a professional answer to your question then email me for a paid consulting request.