Datenum performance

A few days ago, a reader on StackOverflow asked whether it is possible to improve the performance of Matlab’s built-in datenum function. This question reminded me of a similar case that I answered exactly two years ago, of improving the performance of the built-in ismember function.

In both cases, the solution to the performance question can be found by simply using Matlab’s built-in profiler in order to extract just the core processing functionality. It is often found that in a particular situation there is no need for all the input arguments data validity checks, and under some known limitations we can indeed use the core functionality directly.

In the case of ismember, it turned out that if we are assured in advance that the input data are sorted non-sparse non-NaN values, then we can use the undocumented built-in helper functions ismembc or ismembc2 for much-improved performance over the standard ismember. Both ismembc and ismembc2 happen to be mex files, although this is not always the case for helper functions.

Our datenum case is very similar. It turns out that datenum uses the undocumented built-in helper function dtstr2dtnummx for the actual processing – converting a date from text to floating-point number. As I noted in my response to the StackOverflow question, we can directly use this helper function for improved performance: On my particular computer, dtstr2dtnummx is over 3 times faster than the standard datenum function:

% Fast - using dtstr2dtnummx
>> tic, for i=1:1000; dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss'); end; dateNum,toc
dateNum =
          734484.514722222
Elapsed time is 0.218423 seconds.
 
% Slower - using datenum
>> tic, for i=1:1000; dateNum=datenum({'2010-12-12 12:21:12.123'},'yyyy-mm-dd HH:MM:SS'); end; dateNum,toc
dateNum =
          734484.514722222   % Same value as dtstr2dtnummx - good!
Elapsed time is 0.658352 seconds.   % 3x slower than dtstr2dtnummx - bad!

While the difference in timing may appear negligible, if you are using this function to parse a text file with thousands of lines, each with its own timestamp, then these seemingly negligible time differences quickly add up. Of course, this only makes sense to do if you find out (using the profiler again) that this date parsing is a performance hotspot in your particular application. It was indeed such a performance hotspot in one of my applications, as it apparently was also for the original poster on StackOverflow.

Like ismembc, dtstr2dtnummx is an internal mex function. On my Windows system it is located in C:\Program Files\Matlab\R2011a\toolbox\matlab\timefun\private\dtstr2dtnummx.mexw32. It will have a different extension non-Windows systems, but you will easily find it in its containing folder.

To gain access to dtstr2dtnummx, simply add its folder to the Matlab path using the addpath function, or copy the dtstr2dtnummx.mexw32 file to another folder that is already on your Matlab path.

Note that the string format is different between dtstr2dtnummx and datenum: In the test case above, dtstr2dtnummx used 'yyyy-MM-dd HH:mm:ss', while datenum required 'yyyy-mm-dd HH:MM:SS'. I have no idea why MathWorks did not keep consistent formatting strings. But because of this, we need to be extra careful (example1, example2). If you are interested in finding out how the datenum format strings translates into a dtstr2dtnummx, take a look at the helper function cnv2icudf, which is a very readable m-file located in the same folder as dtstr2dtnummx.

To those interested, the folder that contains dtstr2dtnummx also contains some other interesting date conversion functions, so explore and enjoy!

Perhaps the main lesson that can be learned from this article, and its ismembc predecessor of two years ago, is that it is very useful to profile the code for performance hotspots. When such a hotspot is found, don’t stop your profiling at the built-in Matlab functions – keep digging in the profiler results and perhaps you’ll find that you can improve performance by taking an internal shortcut.

Have you discovered any other performance shortcuts in a built-in Matlab function? If so, please post a comment to tell us all about it.

Related posts:

  1. datestr performance Caching is a simple and very effective means to improve code performance, as demonstrated for the datestr function....
  2. Plot performance Undocumented inner plot mechanisms can be used to significantly improved plotting performance...
  3. Performance: scatter vs. line In many circumstances, the line function can generate visually-identical plots as the scatter function, much faster...
  4. cellfun – undocumented performance boost Matlab's built-in cellfun function has an undocumented option to significantly improve performance in some cases....
  5. Matrix processing performance Matrix operations performance is affected by internal subscriptions in a counter-intuitive way....
  6. Matlab-Java memory leaks, performance Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...

Categories: Medium risk of breaking in future versions, Stock Matlab function, Undocumented function

Tags: , , , , ,

Bookmark and SharePrint Print

9 Responses to Datenum performance

  1. Jan Simon says:

    I’ve published a C-Mex function for the conversion to date numbers: http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num
    E.g. for the ‘yyyy-mm-dd HH:MM:SS’ format I get these timings (Matlab 2009a, 1.5GHz Pentium-M, 1000 iterations as in your example):

    DATENUM: 0.93 sec, DTSTR2DTNUMMX: 0.21 sec, DateStr2Num: 0.0087 sec

    And for a {1 x 1000} cell string:

    DATENUM: 2.52 sec, DTSTR2DTNUMMX: 1.77 sec, DateStr2Num: 0.027 sec

    The speed is based on two methods: 1. the format has to be specified by a very limited set of 6 most common formats. 2. The value is not checked for validity: While DATENUM and DTSTR2DTNUMMX recognize ’2011-04-180′ more or less correctly as 179th day after 2011-04-01, DateStr2Num fails and does even not catch ’2011-04-AB’ as an error. To calculate the fractional part for ’25:61:62′, the overflow can be ignored fortunately.

    Therefore, if you know that the date string is valid, a very simple C-code can be 100 times faster than DATENUM and 25 times faster than DSTR2DNUMMX.

    The C-Mex DATENUMMX.c was part of Matlab 6.5. What a pitty that modern Matlab versions include less of such cookies.

    Kind regards, Jan

    • @Jan – thanks for the tip. I remember being impressed with your datestr2num utility back when, but I simply forgot it lately. So your comment is right on.

      In general the basic lesson here, as elsewhere in Matlab, is that wherever we can remove unnecessary input formats, options and validity checks, this could greatly increase the performance.

  2. Pingback: Datenum performance | Undocumented Matlab | 零度季节

  3. Jan Simon says:

    Hi,

    The built-in DATENUMMX converts [1 x 6] date vectors 4 times faster than DATENUM. It is at least included in Matlab 5.3 to 2009a – I cannot check this in newer versions. As said already, the source code datenummx.c was shipped with Matlab 6.5.

  4. Teegee says:

    I have another Tip for such processing. For given vectors, it could save a huge amount of time:

    You want to compute the vector A and become the results in the variable Result:

    Result=datestr(A); % To avoid

    Use the unique function before:

    [b, m, n] = unique(A); % Reduce your vector A
    b=datestr(b); % Apply your function to b which is much smaller
    Result=b(n,:); % Assign it to a vector the same size as A

    For a vector A with a lot of same values it saved my life :-)

  5. Pingback: datestr performance | Undocumented Matlab

  6. Pingback: sprintfc – undocumented helper function | Undocumented Matlab

  7. JakubT says:

    Hello,
    Mathworks seem to have done some black magic – when I used older Matlab (2011), datenum conversion of a time vector (2.1*10^6 entries) with a given format took 600+ seconds. With DTSTR2DTNUMMX, it took 220 s. In Matlab 2013, it takes 28s only and seems to give a correct answer too!
    Best,
    Jakub

Leave a Reply

Your email address will not be published. Required fields are marked *

*

<pre lang="matlab">
a = magic(3);
sum(a)
</pre>