Datenum performance

A few days ago, a reader on StackOverflow asked whether it is possible to improve the performance of Matlab’s built-in datenum function. This question reminded me of a similar case that I answered exactly two years ago, of improving the performance of the built-in ismember function.

In both cases, the solution to the performance question can be found by simply using Matlab’s built-in profiler in order to extract just the core processing functionality. It is often found that in a particular situation there is no need for all the input arguments data validity checks, and under some known limitations we can indeed use the core functionality directly.

In the case of ismember, it turned out that if we are assured in advance that the input data are sorted non-sparse non-NaN values, then we can use the undocumented built-in helper functions ismembc or ismembc2 for much-improved performance over the standard ismember. Both ismembc and ismembc2 happen to be mex files, although this is not always the case for helper functions.

Our datenum case is very similar. It turns out that datenum uses the undocumented built-in helper function dtstr2dtnummx for the actual processing – converting a date from text to floating-point number. As I noted in my response to the StackOverflow question, we can directly use this helper function for improved performance: On my particular computer, dtstr2dtnummx is over 3 times faster than the standard datenum function:

% Fast - using dtstr2dtnummx
>> tic, for i=1:1000; dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss'); end; dateNum,toc
dateNum =
          734484.514722222
Elapsed time is 0.218423 seconds.
 
% Slower - using datenum
>> tic, for i=1:1000; dateNum=datenum({'2010-12-12 12:21:12.123'},'yyyy-mm-dd HH:MM:SS'); end; dateNum,toc
dateNum =
          734484.514722222   % Same value as dtstr2dtnummx - good!
Elapsed time is 0.658352 seconds.   % 3x slower than dtstr2dtnummx - bad!

While the difference in timing may appear negligible, if you are using this function to parse a text file with thousands of lines, each with its own timestamp, then these seemingly negligible time differences quickly add up. Of course, this only makes sense to do if you find out (using the profiler again) that this date parsing is a performance hotspot in your particular application. It was indeed such a performance hotspot in one of my applications, as it apparently was also for the original poster on StackOverflow.

Like ismembc, dtstr2dtnummx is an internal mex function. On my Windows system it is located in C:\Program Files\Matlab\R2011a\toolbox\matlab\timefun\private\dtstr2dtnummx.mexw32. It will have a different extension non-Windows systems, but you will easily find it in its containing folder.

To gain access to dtstr2dtnummx, simply add its folder to the Matlab path using the addpath function, or copy the dtstr2dtnummx.mexw32 file to another folder that is already on your Matlab path.

Note that the string format is different between dtstr2dtnummx and datenum: In the test case above, dtstr2dtnummx used 'yyyy-MM-dd HH:mm:ss', while datenum required 'yyyy-mm-dd HH:MM:SS'. I have no idea why MathWorks did not keep consistent formatting strings. But because of this, we need to be extra careful (example1, example2). If you are interested in finding out how the datenum format strings translates into a dtstr2dtnummx, take a look at the helper function cnv2icudf, which is a very readable m-file located in the same folder as dtstr2dtnummx.

To those interested, the folder that contains dtstr2dtnummx also contains some other interesting date conversion functions, so explore and enjoy!

Perhaps the main lesson that can be learned from this article, and its ismembc predecessor of two years ago, is that it is very useful to profile the code for performance hotspots. When such a hotspot is found, don’t stop your profiling at the built-in Matlab functions – keep digging in the profiler results and perhaps you’ll find that you can improve performance by taking an internal shortcut.

Have you discovered any other performance shortcuts in a built-in Matlab function? If so, please post a comment to tell us all about it.

Categories: Medium risk of breaking in future versions, Stock Matlab function, Undocumented function

Tags: , , , , ,

Bookmark and SharePrint Print

14 Responses to Datenum performance

  1. Jan Simon says:

    I’ve published a C-Mex function for the conversion to date numbers: http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num
    E.g. for the ‘yyyy-mm-dd HH:MM:SS’ format I get these timings (Matlab 2009a, 1.5GHz Pentium-M, 1000 iterations as in your example):

    DATENUM: 0.93 sec, DTSTR2DTNUMMX: 0.21 sec, DateStr2Num: 0.0087 sec

    And for a {1 x 1000} cell string:

    DATENUM: 2.52 sec, DTSTR2DTNUMMX: 1.77 sec, DateStr2Num: 0.027 sec

    The speed is based on two methods: 1. the format has to be specified by a very limited set of 6 most common formats. 2. The value is not checked for validity: While DATENUM and DTSTR2DTNUMMX recognize ‘2011-04-180’ more or less correctly as 179th day after 2011-04-01, DateStr2Num fails and does even not catch ‘2011-04-AB’ as an error. To calculate the fractional part for ’25:61:62′, the overflow can be ignored fortunately.

    Therefore, if you know that the date string is valid, a very simple C-code can be 100 times faster than DATENUM and 25 times faster than DSTR2DNUMMX.

    The C-Mex DATENUMMX.c was part of Matlab 6.5. What a pitty that modern Matlab versions include less of such cookies.

    Kind regards, Jan

    • @Jan – thanks for the tip. I remember being impressed with your datestr2num utility back when, but I simply forgot it lately. So your comment is right on.

      In general the basic lesson here, as elsewhere in Matlab, is that wherever we can remove unnecessary input formats, options and validity checks, this could greatly increase the performance.

  2. Pingback: Datenum performance | Undocumented Matlab | 零度季节

  3. Jan Simon says:

    Hi,

    The built-in DATENUMMX converts [1 x 6] date vectors 4 times faster than DATENUM. It is at least included in Matlab 5.3 to 2009a – I cannot check this in newer versions. As said already, the source code datenummx.c was shipped with Matlab 6.5.

  4. Teegee says:

    I have another Tip for such processing. For given vectors, it could save a huge amount of time:

    You want to compute the vector A and become the results in the variable Result:

    Result=datestr(A); % To avoid

    Use the unique function before:

    [b, m, n] = unique(A); % Reduce your vector A
    b=datestr(b); % Apply your function to b which is much smaller
    Result=b(n,:); % Assign it to a vector the same size as A

    For a vector A with a lot of same values it saved my life :-)

  5. Pingback: datestr performance | Undocumented Matlab

  6. Pingback: sprintfc – undocumented helper function | Undocumented Matlab

  7. JakubT says:

    Hello,
    Mathworks seem to have done some black magic – when I used older Matlab (2011), datenum conversion of a time vector (2.1*10^6 entries) with a given format took 600+ seconds. With DTSTR2DTNUMMX, it took 220 s. In Matlab 2013, it takes 28s only and seems to give a correct answer too!
    Best,
    Jakub

  8. Phillip says:

    Hi

    Any idea where dtstr2dtnummx has disappeared to in the newer versions. Can’t find it in 2015a, for example. I know it’s there because I can call it but for the life of me I can’t find it.

    Regards,
    Phil

  9. David Long says:

    Yair, Thanks so much on this. I have a particular problem that using dtstr2dtnummx doesn’t solve, and I was wondering if you knew of a simple fix. You are correct that dtstr2dtnummx is much faster but if you need milliseconds, this doesn’t seem to catch that. For instance, using your code above but adding milliseconds to the time string gives two different results.

    tic, 
    for i=1:1000 
        dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss.FFF');
    end; 
    dateNum
    toc
     
    dateNum =
              734484.514722222
     
    Elapsed time is 0.099181 seconds.
     
    % Slower - using datenum
    tic
    for i=1:1000 
        dateNum=datenum({'2010-12-12 12:21:12.123'},'yyyy-mm-dd HH:MM:SS.FFF');
    end; 
    dateNum
    toc
     
    dateNum =
              734484.514723646
     
    Elapsed time is 0.172265 seconds.

    The difference is the added millisecond value. Even adding the “.FFF” to the format string doesn’t seem to catch the milliseconds in the faster case. This must happen outside of the dtstr2dtnummx function.

    • @David – there is indeed an answer to this but I make it a personal point not to answer any pro-bono questions from JHU-APL following a few cases in previous years where I felt that my goodwill was taken advantage of by your peers. If you want a professional answer to your question then email me for a paid consulting request.

    • David says:

      Yair, sorry for the late response and sorry for your previous experience dealing with APL. Not sure what happened but APL usually has very nice people. On my question and the paid consultation issue…I would if I could but I’m just a engineer and that is way above my pay grade. Anyway, thanks so much.

Leave a Reply

Your email address will not be published. Required fields are marked *