Improving fwrite performance

Readers of this blog are probably aware by now that I am currently writing my second book, MATLAB Performance Tuning (expected publication date: early 2014, CRC Press). During my work on this book, I encounter many surprising aspects of Matlab performance. In many cases these aspects are not un-documented per-se, but are certainly not well known in the Matlab community. So taking some artistic liberty coupled with some influence over this blog’s owner, I’ll mention some of these interesting discoveries here, even if they are not strictly-speaking undocumented.

Today’s post is about the well-known fwrite function, which is used to write binary data to file. In many cases, using fwrite provides the fastest alternative to saving data files (save(…,’-v6′) coming a close second). This function is in fact so low-level, and is used so often, that some readers may be surprised that its default speed can be improved. Today’s article applies equally to the fprintf function, which is used to save data in text format.

Apparently, there are things to be learned even with such standard low-level functions; there’s a deep moral here I guess.

Flushing and buffering

Unlike C/C++’s implementation, Matlab’s fprintf and fwrite automatically flush the output buffer whenever they are called, even when '\n' is not present in the output stream. This is not mentioned outright in the main documentation, but is stated loud and clear in the official technical support solution 1-PV371.

The only exception to this rule is when the file was fopen‘ed with the 'W' or 'A' specifiers (which for some inexplicable reason is NOT mentioned in the technical solution!), or when outputting to the MATLAB’s Command Window (more precisely, to STDOUT (fid=1) and STDERR (fid=2)). Writing data without buffering in this manner severely degrades I/O performance:

data = randi(250,1e6,1);  % 1M integer values between 1-250
 
% Standard unbuffered writing - slow
fid = fopen('test.dat', 'wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 14.006194 seconds.
 
% Buffered writing – x4 faster
fid = fopen('test.dat', 'Wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 3.471557 seconds.

If I were in a generous mood, I could say that we could infer this information from fopen‘s doc page, where it mentions using the 'W' and 'A' permission specifiers to prevent automatic flushing, although it qualifies this with the very misleading statement that these specifiers are “Used with tape drives“. So first of all, who ever uses tape drives with Matlab nowadays?! Secondly, these specifiers are very useful for regular buffered I/O on standard disks and other I/O interfaces. I really think this was a poor choice of words. At the very least some extra clarification about these specifiers could be added.

It was also (IMHO) a poor design choice by MathWorks in the first place to break consistency with the C/C++ implementation for 'w' and associate the functionality to 'W' (and similarly, 'a' vs. 'A'). C’s fopen was in widespread usage for a decade before Matlab was invented, so there is really no excuse, certainly when Matlab’s fopen was so clearly modeled after the C implementation. It would have been more reasonable (again – IMHO) to preserve consistency of 'w' and 'a' for a default of buffered I/O (which is faster!), while providing the non-buffered functionality in 'W' and 'A'.

The vast majority of Matlab users fopen their files using 'w' and not 'W'. Even Matlab’s own documentation always uses 'w' and not 'W'. So coupled with the poorly-worded qualification about the tape drives, and the unintuitive inconsistency with C’s implementation, Matlab users could well be excused for not taking advantage of this feature.

Chunking I/O

The idea of buffering, and the reason behind the speedup above, is that I/O is faster when writing full pages (typically 4KB, but this changes on different platforms) and when bunched together to remove the disk access time between adjacent writes. This idea can be extended by preparing the entire file data in memory, and then using a single fwrite to write everything at once:

fid = fopen('test.dat', 'wb');
tic, fwrite(fid,data); toc
fclose(fid);
  => Elapsed time is 0.014025 seconds.

In fact, assembling the entire data in memory, within a long numeric or char array, and then using a single fwrite to save this array to file, is almost as fast as we can expect to get (example). Further improvement lies in optimizing the array assembly (which is CPU and memory-intensive) rather than the I/O itself.

In this example, the I/O was so fast (14mS) that it makes sense to write everything at once. But for enormously large data files and slower disks (I use a local SSD; network hard disks are way slower), writing the entire file’s data in this manner might take long minutes. In such cases, it is advisable to deliberately break up the data into smaller chunks, and fwrite them separately in a loop, all the time providing feedback to the user about the I/O’s progress. This could help improve the operation’s perceived performance. Here’s a bare-bones example:

h = waitbar(0, 'Saving data...', 'Name','Saving data...');
cN = 100;  % number of steps/chunks
 
% Divide the data into chunks (last chunk is smaller than the rest)
dN = length(data);
dataIdx = [1 : round(dN/cN) : dN, dN+1];  % cN+1 chunk location indexes
 
% Save the data
fid = fopen('test.dat', 'Wb');
for chunkIdx = 0 : cN-1
   % Update the progress bar
   fraction = chunkIdx/cN;
   msg = sprintf('Saving data... (%d%% done)', round(100*fraction));
   waitbar(fraction, h, msg);
 
   % Save the next data chunk
   chunkData = data(dataIdx(chunkIdx+1) : dataIdx(chunkIdx+2)-1);
   fwrite(fid,chunkData);
end
 
fclose(fid);
close(h);

Of course, rather than using a plain-ol’ waitbar window, we could integrate a progress bar directly into our GUI. Using my statusbar utility is one way to do it, but there are of course many other possible ways to dynamically present progress:

Dynamically updating progress using the statusbar utility (click for details)

Dynamically updating progress using the statusbar utility

Note: the entire article above applies equally well to fprintf in addition to fwrite. Storing and loading data in binary format (using fwrite/fread) is often faster than text format (using fprintf/fscanf/textscan), so we should generally use text format only if the file needs to be human-readable for any reason.

Do you know of any other trick to store data efficiently? If so, please share it in a comment.

Next week: some surprising performance aspects of Matlab’s save function.

Categories: Low risk of breaking in future versions, Semi-documented feature, Stock Matlab function

Tags: ,

Bookmark and SharePrint Print

11 Responses to Improving fwrite performance

  1. Thierry Dalon says:

    Thanks Yair for the post! I was absolutely not aware of this issue. (I’ve always used fprintf with small w)
    I ‘ve already encountered “Out of Memory” in Matlab with big data files and had to go the packaging/chunking way as well so I could write big data to file.

  2. Amro says:

    There is also a mention of this feature on Loren Shure’s blog when it was apparently first introduced: http://blogs.mathworks.com/loren/2006/04/19/high-performance-file-io/

    • @Amro – thanks for the link. I’m still not sure why TMW didn’t make this feature on by default, as apparently they were well aware of its performance implications for general I/O.

  3. Pingback: Improving save performance | Undocumented Matlab

  4. Jan Simon says:

    Thanks to explaining this feature.
    There has been a ‘b’ flag for the permissions in FOPEN in Matlab 5.3, but it has been removed from the documentation for higher versions. Now a missing ‘t’ is enough to trigger the binary mode.

    It is a pleasure to remind you, Yair, that you use an undocumented feature.

  5. Isaac Stoddard says:

    I went through this tuning effort at my previous job about 4 years ago, in 2007b era.
    I would have wished for the statusbar. (Or had you published it to the software library @ TMW?)

    Thanks for the blog, and enjoy a cup of coffee!

  6. Anonymous says:

    Just wonderful! As usual :) I’m very likely to start teaching this in my Matlab classes…

  7. Pingback: Explicit multi-threading in Matlab – part 1 | Undocumented Matlab

  8. Joe V says:

    On a related note: When reading binary files, don’t use fseek inside a loop if you can avoid it — it’s very slow. In almost all cases I’ve tried, it’s faster to do

    fread(fid, N, '*uint8');

    and throw away the result than to do

    fseek(fid, N, 'cof');

    (although, of course, you can’t move backwards in a file using fread, so you’re stuck with fseek in that case). This seems to be a limitation of the fseek function in the C library, and it can be especially egregious on network drives.

    In general, when doing binary file I/O, touch the file as few times as possible. Reading or writing the entire file in one go is fastest. Make use of fread’s (and fwrite’s) skip parameter when possible. When performance is truly critical, read the data into memory as a uint8 array and parse it yourself using typecast, rather than relying on fread to do the parsing for you. It can be complicated and tricky and error-prone, but it’s tons faster.

  9. Kerim says:

    Very surprising deatail, thank you.
    But it’s strange that you didn’t mention a possibility to improve read/write perfomance by adjusting Java. Is that real?

Leave a Reply

Your email address will not be published. Required fields are marked *