Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Faster csvwrite/dlmwrite

October 3, 2017 5 Comments

Matlab’s builtin functions for exporting (saving) data to output files are quite sub-optimal (as in slowwwwww…). I wrote a few posts about this in the past (how to improve fwrite performance, and save performance). Today I extend the series by showing how we can improve the performance of delimited text output, for example comma-separated (CSV) or tab-separated (TSV/TXT) files.
The basic problem is that Matlab’s dlmwrite function, which can either be used directly, or via the csvwrite function which calls it internally, is extremely inefficient: It processes each input data value separately, in a non-vectorized loop. In the general (completely non-vectorized) case, each data value is separately converted into a string, and is separately sent to disk (using fprintf). In the specific case of real data values with simple delimiters and formatting, row values are vectorized, but in any case the rows are processed in a non-vectorized loop: A newline character is separately exported at the end of each row, using a separate fprintf call, and this has the effect of flushing the I/O to disk each and every row separately, which is of course disastrous for performance. The output file is indeed originally opened in buffered mode (as I explained in my fprintf performance post), but this only helps for outputs done within the row – the newline output at the end of each row forces an I/O flush regardless of how the file was opened. In general, when you read the short source-code of dlmwrite.m you’ll get the distinct feeling that it was written for correctness and maintainability, and some focus on performance (e.g., the vectorization edge-case). But much more could be done for performance it would seem.
This is where Alex Nazarovsky comes to the rescue.

Alex was so bothered by the slow performance of csvwrite and dlmwrite that he created a C++ (MEX) version that runs about enormously faster (30 times faster on my system). He explains the idea in his blog, and posted it as an open-source utility (mex-writematrix) on GitHub.
Usage of Alex’s utility is very easy:

mex_WriteMatrix(filename, dataMatrix, textFormat, delimiter, writeMode);

mex_WriteMatrix(filename, dataMatrix, textFormat, delimiter, writeMode);

where the input arguments are:

  • filename – full path name for file to export
  • dataMatrix – matrix of numeric values to be exported
  • textFormat – format of output text (sprintf format), e.g. '%10.6f'
  • delimiter – delimiter, for example ',' or ';' or char(9) (=tab)
  • writeMode – 'w+' for rewriting file; 'a+' for appending (note the lowercase: uppercase will crash Matlab!)

Here is a sample run on my system, writing a simple CSV file containing 1K-by-1K data values (1M elements, ~12MB text files):

>> data = rand(1000, 1000);  % 1M data values, 8MB in memory, ~12MB on disk
>> tic, dlmwrite('temp1.csv', data, 'delimiter',',', 'precision','%10.10f'); toc
Elapsed time is 28.724937 seconds.
>> tic, mex_WriteMatrix('temp2.csv', data, '%10.10f', ',', 'w+'); toc   % 30 times faster!
Elapsed time is 0.957256 seconds.

>> data = rand(1000, 1000); % 1M data values, 8MB in memory, ~12MB on disk >> tic, dlmwrite('temp1.csv', data, 'delimiter',',', 'precision','%10.10f'); toc Elapsed time is 28.724937 seconds. >> tic, mex_WriteMatrix('temp2.csv', data, '%10.10f', ',', 'w+'); toc % 30 times faster! Elapsed time is 0.957256 seconds.

Alex’s mex_WriteMatrix function is faster even in the edge case of simple formatting where dlmwrite uses vectorized mode (in that case, the file is exported in ~1.2 secs by dlmwrite and ~0.9 secs by mex_WriteMatrix, on my system).

Trapping Ctrl-C interrupts

Alex’s mex_WriteMatrix code includes another undocumented trick that could help anyone else who uses a long-running MEX function, namely the ability to stop the MEX execution using Ctrl-C. Using Ctrl-C is normally ignored in MEX code, but Wotao Yin showed how we can use the undocumented utIsInterruptPending() MEX function to monitor for user interrupts using Ctrl-C. For easy reference, here is a copy of Wotao Yin’s usage example (read his webpage for additional details):

/* A demo of Ctrl-C detection in mex-file by Wotao Yin. Jan 29, 2010. */
#include "mex.h"
#if defined (_WIN32)
    #include <windows.h>
#elif defined (__linux__)
    #include <unistd.h>
#endif
#ifdef __cplusplus
    extern "C" bool utIsInterruptPending();
#else
    extern bool utIsInterruptPending();
#endif
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    int count = 0;
    while(1) {
        #if defined(_WIN32)
            Sleep(1000);        /* Sleep one second */
        #elif defined(__linux__)
            usleep(1000*1000);  /* Sleep one second */
        #endif
        mexPrintf("Count = %d\n", count++);  /* print count and increase it by 1 */
        mexEvalString("drawnow;");           /* flush screen output */
        if (utIsInterruptPending()) {        /* check for a Ctrl-C event */
            mexPrintf("Ctrl-C Detected. END\n\n");
            return;
        }
        if (count == 10) {
            mexPrintf("Count Reached 10. END\n\n");
            return;
        }
    }
}

/* A demo of Ctrl-C detection in mex-file by Wotao Yin. Jan 29, 2010. */ #include "mex.h" #if defined (_WIN32) #include <windows.h> #elif defined (__linux__) #include <unistd.h> #endif #ifdef __cplusplus extern "C" bool utIsInterruptPending(); #else extern bool utIsInterruptPending(); #endif void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { int count = 0; while(1) { #if defined(_WIN32) Sleep(1000); /* Sleep one second */ #elif defined(__linux__) usleep(1000*1000); /* Sleep one second */ #endif mexPrintf("Count = %d\n", count++); /* print count and increase it by 1 */ mexEvalString("drawnow;"); /* flush screen output */ if (utIsInterruptPending()) { /* check for a Ctrl-C event */ mexPrintf("Ctrl-C Detected. END\n\n"); return; } if (count == 10) { mexPrintf("Count Reached 10. END\n\n"); return; } } }

Matlab performance webinars


I am offering a couple of webinars about various ways to improve Matlab’s run-time performance:

  • Matlab performance tuning part 1 (3:39 hours, syllabus) – $195 (buy)
  • Matlab performance tuning part 2 (3:43 hours, syllabus) – $195 (buy)
        ==> or buy both Matlab performance tuning webinars for only $345 (buy)

Both the webinar videos and their corresponding slide-decks are available for download. The webinars content is based on onsite training courses that I presented at multiple client locations (details).
Additional Matlab performance tips can be found under the Performance tag in this blog, as well as in my book “Accelerating MATLAB Performance“.
 Email me if you would like additional information on the webinars, or an onsite training course, or about my consulting.

Related posts:

  1. Faster findjobj – the ubiquitous findjobj utility has been significantly improved for speed for the most common use case. ...
  2. MEX ctrl-c interrupt – An undocumented MEX function can help interrupt running MEX functions. ...
  3. Matrix processing performance – Matrix operations performance is affected by internal subscriptions in a counter-intuitive way....
  4. Explicit multi-threading in Matlab part 3 – Matlab performance can be improved by employing POSIX threads in C/C++ code. ...
  5. Preallocation performance – Preallocation is a standard Matlab speedup technique. Still, it has several undocumented aspects. ...
  6. datestr performance – Caching is a simple and very effective means to improve code performance, as demonstrated for the datestr function....
Mex Performance Pure Matlab Undocumented feature
Print Print
« Previous
Next »
5 Responses
  1. AndrewV October 3, 2017 at 20:04 Reply

    Any resources out there for faster csvread (or any other text parsing) functions? I believe there were a few on file exchange but I had no luck with them, either from minimal speed improvements or functions that just do not work.

    • Yair Altman October 3, 2017 at 21:09 Reply

      @Andrew – my advise is to read the entire CSV file into memory (a long string) using fread and then parse the string, for example using textscan. This works well with files that are not huge; with huge files you may want to read them in chunks.

  2. Michal Kvasnicka October 4, 2017 at 12:07 Reply

    @Yair – thanks for very interesting post!

    Any examples how to read/write HUGE text files (CSV) by fread+textscan in chunks?

  3. Saket October 5, 2017 at 10:52 Reply

    @Andrew, @Yair – Try out the Mex Version of CSV Reader from Stanislaw Adaszewski: http://adared.ch/efficient-csv-reader-for-matlab

    You might need to modify the C code as per your need and create a .mex file from the c-code. I have been using it for last 2 Years. This is quick – very quick..!

    • Yair Altman October 11, 2017 at 02:51 Reply

      @Saket – thanks for sharing, useful indeed

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitable (6) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
Contact us
Captcha image for Custom Contact Forms plugin. You must type the numbers shown in the image
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top