A few months ago, I posted an article showing how we can use some internal help functions to significantly improve the performance of the commonly-used datenum function. The catch is that we must be certain of certain preconditions before we can use this method, otherwise we might get incorrect results.
Today I wish to share a related experience that happened to me yesterday, when I needed to improve the performance of a client’s application. When profiling the application, I found that a major performance hotspot was a repeated call to the datestr function, each time with several thousand date items.
datestr is the opposite function of datenum: it receives date values and returns date strings. Unlike datenum, however, datestr does not use a highly optimized native-code library function that we could use directly. Instead, it loops over all date values and sequentially applies the requested string pattern.
The natural reaction in such a case would perhaps be to vectorize the code (something that MathWorks should have done in the first place I guess). But in this case I used a different solution, that I would like to share today:
In any programming languages, Matlab included, the most effective performance tip is to cache processing results. Caching often makes the code slightly more complex and less maintainable, but the performance benefits are immediate and significant. In Matlab, benefits of caching can often surpass even those of vectorization (using both vectorization and caching is of course even better).
In the case of datestr, if we can be certain of the precondition that the output string format is the same, we can cache the results, and even use vectorization. In my case, I plotted historical daily stock quotes data and so I was assured that (1) all dates are integers and that (2) I always use the same date-string format ‘dd-mmm-yyyy’.
First, let’s define the wrapper function datestr2 with the necessary caching and vectorization:
% datestr2 - faster variant of datestr, for integer date values since 1/1/2000 function dateStrs = datestr2(dateVals,varargin) persistent dateStrsCache persistent dateValsCache if isempty(dateStrsCache) origin = datenum('1-Jan-2000'); dateValsCache = origin:(now+100); dateStrsCache = datestr(dateValsCache,varargin{:}); end [tf,loc] = ismember(dateVals, dateValsCache); if all(tf) dateStrs = dateStrsCache(loc,:); else dateStrs = datestr(dateVals,varargin{:}); end end % datestr2 |
As can be seen, the first time that datestr2 is called, it computes and caches all datestr values for all the dates since Jan 1, 2000. Subsequent calls to datestr2 simply retrieve the relevant cache values. Note that the input date entries need not be sorted.
In case that an input date number is not found in the cache, datestr2 automatically falls-back to using the built-in datestr for the entire input list. This could of course be improved to add the new entries to the cache – I leave this as a reader exercise.
The bottom line was a 150-times (!!!) speed improvement for a 1000-item date vector (50mS => 0.3mS on my system):
% Prepare a 1000-vector of dates, starting 3 years ago until today >> dateVals = fix(now)+(-1000:0); % Run the standard datestr function => 50mS >> tic; s1=datestr(dateVals); toc Elapsed time is 0.049089 seconds. >> tic; s1=datestr(dateVals); toc Elapsed time is 0.048086 seconds. % Now run our datestr2 function (caching already done before) => 0.3 mS >> tic; s2=datestr2(dateVals); toc Elapsed time is 0.222031 seconds. % initial cache preparation takes 222 mS >> tic; s2=datestr2(dateVals); toc Elapsed time is 0.000313 seconds. % subsequent datestr2 calls take 0.3 mS >> tic; s2=datestr2(dateVals); toc Elapsed time is 0.000296 seconds. % Ensure that the two functions give exactly the same results >> isequal(s1,s2) ans = 1 |
So what have we learned from this?
- To improve performance we must profile the code. Very often the performance bottlenecks occur in non-intuitive very specific places that can be surgically handled without requiring any major redesign. In this case, I simply had to replace calls to datestr with datestr2 in the application’s code.
- Vectorization is not always as cost-effective as caching
- Major performance improvements do NOT necessarily involve undocumented functions or tricks: In fact, today’s post about caching uses fully-documented pure-Matlab code.
- Different performance hotspots can have different solutions: caching, vectorization, internal library functions, undocumented graphics properties, smart property selection, smart function selection, smart indexing, smart parameter selection etc.
In a later post I will show how similar modifications to internal Matlab functions can dramatically improve the performance of the uitable function. Anyone who has tried using uitable with more than a few dozen cells will surely understand why this is important…
Do you have a favorite performance trick not mentioned above? If so, please post a comment.
I love this blog. This is fantastic.
Also, logical indexing is fast, both to write and to process, but if you are only after a select few values within a large/huge array find may be faster (and less memory hungry) especially with the ‘k’ parameter.
Really cool idea. I normally cringe when I see persistent variables, but this seems like a great use of them. Interesting indentation scheme you have there – kind of c-like to have your function guts indented like that.
Yair, good tip – what about separating out the cache creation into its own function, and putting that call into startup.m? Then you’d move the first-run time hit to startup, which is innocuous… You could next try making the cache creation one-time by saving it into an mat-file stored in the prefdir(?), using similar logic to check if it exists at the entry to the cache_creation function. This may not be any faster though 🙂
One power-user tip that used to work is to move any often-called/performance-critical ‘user library’ functions into the MATLAB library directory, and then update the function cache. This used to decrease the time required by ML to lookup the function, but recent improvements may make this untrue in recent versions? A related but little-known tip is that ‘clear all’ also deletes the internal function cache and causes it to be rebuilt, which can reduce performance esp. for user functions until it has been built over time as funcs are called.
Cheers,
EBS
@Eric – Thanks for the new ideas