Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Improving fwrite performance

April 24, 2013 13 Comments

Readers of this blog are probably aware by now that I am currently writing my second book, MATLAB Performance Tuning (expected publication date: early 2014, CRC Press). During my work on this book, I encounter many surprising aspects of Matlab performance. In many cases these aspects are not un-documented per-se, but are certainly not well known in the Matlab community. So taking some artistic liberty coupled with some influence over this blog’s owner, I’ll mention some of these interesting discoveries here, even if they are not strictly-speaking undocumented.
Today’s post is about the well-known fwrite function, which is used to write binary data to file. In many cases, using fwrite provides the fastest alternative to saving data files (save(…,’-v6′) coming a close second). This function is in fact so low-level, and is used so often, that some readers may be surprised that its default speed can be improved. Today’s article applies equally to the fprintf function, which is used to save data in text format.
Apparently, there are things to be learned even with such standard low-level functions; there’s a deep moral here I guess.

Flushing and buffering

Unlike C/C++’s implementation, Matlab’s fprintf and fwrite automatically flush the output buffer whenever they are called, even when '\n' is not present in the output stream. This is not mentioned outright in the main documentation, but is stated loud and clear in the official technical support solution 1-PV371.
The only exception to this rule is when the file was fopen‘ed with the 'W' or 'A' specifiers (which for some inexplicable reason is NOT mentioned in the technical solution!), or when outputting to the MATLAB’s Command Window (more precisely, to STDOUT (fid=1) and STDERR (fid=2)). Writing data without buffering in this manner severely degrades I/O performance:

data = randi(250,1e6,1);  % 1M integer values between 1-250
% Standard unbuffered writing - slow
fid = fopen('test.dat', 'wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 14.006194 seconds.
% Buffered writing – x4 faster
fid = fopen('test.dat', 'Wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 3.471557 seconds.

data = randi(250,1e6,1); % 1M integer values between 1-250 % Standard unbuffered writing - slow fid = fopen('test.dat', 'wb'); tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc fclose(fid); => Elapsed time is 14.006194 seconds. % Buffered writing – x4 faster fid = fopen('test.dat', 'Wb'); tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc fclose(fid); => Elapsed time is 3.471557 seconds.

If I were in a generous mood, I could say that we could infer this information from fopen‘s doc page, where it mentions using the 'W' and 'A' permission specifiers to prevent automatic flushing, although it qualifies this with the very misleading statement that these specifiers are “Used with tape drives“. So first of all, who ever uses tape drives with Matlab nowadays?! Secondly, these specifiers are very useful for regular buffered I/O on standard disks and other I/O interfaces. I really think this was a poor choice of words. At the very least some extra clarification about these specifiers could be added.
It was also (IMHO) a poor design choice by MathWorks in the first place to break consistency with the C/C++ implementation for 'w' and associate the functionality to 'W' (and similarly, 'a' vs. 'A'). C’s fopen was in widespread usage for a decade before Matlab was invented, so there is really no excuse, certainly when Matlab’s fopen was so clearly modeled after the C implementation. It would have been more reasonable (again – IMHO) to preserve consistency of 'w' and 'a' for a default of buffered I/O (which is faster!), while providing the non-buffered functionality in 'W' and 'A'.
The vast majority of Matlab users fopen their files using 'w' and not 'W'. Even Matlab’s own documentation always uses 'w' and not 'W'. So coupled with the poorly-worded qualification about the tape drives, and the unintuitive inconsistency with C’s implementation, Matlab users could well be excused for not taking advantage of this feature.

Chunking I/O

The idea of buffering, and the reason behind the speedup above, is that I/O is faster when writing full pages (typically 4KB, but this changes on different platforms) and when bunched together to remove the disk access time between adjacent writes. This idea can be extended by preparing the entire file data in memory, and then using a single fwrite to write everything at once:

fid = fopen('test.dat', 'wb');
tic, fwrite(fid,data); toc
fclose(fid);
  => Elapsed time is 0.014025 seconds.

fid = fopen('test.dat', 'wb'); tic, fwrite(fid,data); toc fclose(fid); => Elapsed time is 0.014025 seconds.

In fact, assembling the entire data in memory, within a long numeric or char array, and then using a single fwrite to save this array to file, is almost as fast as we can expect to get (example). Further improvement lies in optimizing the array assembly (which is CPU and memory-intensive) rather than the I/O itself.
In this example, the I/O was so fast (14mS) that it makes sense to write everything at once. But for enormously large data files and slower disks (I use a local SSD; network hard disks are way slower), writing the entire file’s data in this manner might take long minutes. In such cases, it is advisable to deliberately break up the data into smaller chunks, and fwrite them separately in a loop, all the time providing feedback to the user about the I/O’s progress. This could help improve the operation’s perceived performance. Here’s a bare-bones example:

h = waitbar(0, 'Saving data...', 'Name','Saving data...');
cN = 100;  % number of steps/chunks
% Divide the data into chunks (last chunk is smaller than the rest)
dN = length(data);
dataIdx = [1 : round(dN/cN) : dN, dN+1];  % cN+1 chunk location indexes
% Save the data
fid = fopen('test.dat', 'Wb');
for chunkIdx = 0 : cN-1
   % Update the progress bar
   fraction = chunkIdx/cN;
   msg = sprintf('Saving data... (%d%% done)', round(100*fraction));
   waitbar(fraction, h, msg);
   % Save the next data chunk
   chunkData = data(dataIdx(chunkIdx+1) : dataIdx(chunkIdx+2)-1);
   fwrite(fid,chunkData);
end
fclose(fid);
close(h);

h = waitbar(0, 'Saving data...', 'Name','Saving data...'); cN = 100; % number of steps/chunks % Divide the data into chunks (last chunk is smaller than the rest) dN = length(data); dataIdx = [1 : round(dN/cN) : dN, dN+1]; % cN+1 chunk location indexes % Save the data fid = fopen('test.dat', 'Wb'); for chunkIdx = 0 : cN-1 % Update the progress bar fraction = chunkIdx/cN; msg = sprintf('Saving data... (%d%% done)', round(100*fraction)); waitbar(fraction, h, msg); % Save the next data chunk chunkData = data(dataIdx(chunkIdx+1) : dataIdx(chunkIdx+2)-1); fwrite(fid,chunkData); end fclose(fid); close(h);

Of course, rather than using a plain-ol’ waitbar window, we could integrate a progress bar directly into our GUI. Using my statusbar utility is one way to do it, but there are of course many other possible ways to dynamically present progress:

Dynamically updating progress using the statusbar utility (click for details)
Dynamically updating progress using the statusbar utility

Note: the entire article above applies equally well to fprintf in addition to fwrite. Storing and loading data in binary format (using fwrite/fread) is often faster than text format (using fprintf/fscanf/textscan), so we should generally use text format only if the file needs to be human-readable for any reason.
Do you know of any other trick to store data efficiently? If so, please share it in a comment.
Next week: some surprising performance aspects of Matlab’s save function.

Related posts:

  1. Improving save performance – There are many different ways of improving Matlab's standard save function performance. ...
  2. Improving Simulink performance – Simulink simulation run-time performance can be improved by orders of magnitude by following some simple steps. ...
  3. Improving graphics interactivity – Matlab R2018b added default axes mouse interactivity at the expense of performance. Luckily, we can speed-up the default axes. ...
  4. rmfield performance – The performance of the builtin rmfield function (as with many other builtin functions) can be improved by simple profiling. ...
  5. File deletion memory leaks, performance – Matlab's delete function leaks memory and is also slower than the equivalent Java function. ...
  6. Datenum performance – The performance of the built-in Matlab function datenum can be significantly improved by using an undocumented internal help function...
Performance Pure Matlab
Print Print
« Previous
Next »
13 Responses
  1. Thierry Dalon April 26, 2013 at 00:52 Reply

    Thanks Yair for the post! I was absolutely not aware of this issue. (I’ve always used fprintf with small w)
    I ‘ve already encountered “Out of Memory” in Matlab with big data files and had to go the packaging/chunking way as well so I could write big data to file.

  2. Amro April 28, 2013 at 17:58 Reply

    There is also a mention of this feature on Loren Shure’s blog when it was apparently first introduced: http://blogs.mathworks.com/loren/2006/04/19/high-performance-file-io/

    • Yair Altman April 28, 2013 at 23:18 Reply

      @Amro – thanks for the link. I’m still not sure why TMW didn’t make this feature on by default, as apparently they were well aware of its performance implications for general I/O.

  3. Improving save performance | Undocumented Matlab May 8, 2013 at 12:44 Reply

    […] Two weeks ago I posted an article about improving fwrite’s performance […]

  4. Jan Simon May 22, 2013 at 14:33 Reply

    Thanks to explaining this feature.
    There has been a ‘b’ flag for the permissions in FOPEN in Matlab 5.3, but it has been removed from the documentation for higher versions. Now a missing ‘t’ is enough to trigger the binary mode.

    It is a pleasure to remind you, Yair, that you use an undocumented feature.

  5. Isaac Stoddard May 29, 2013 at 12:28 Reply

    I went through this tuning effort at my previous job about 4 years ago, in 2007b era.
    I would have wished for the statusbar. (Or had you published it to the software library @ TMW?)

    Thanks for the blog, and enjoy a cup of coffee!

    • Yair Altman May 29, 2013 at 12:47 Reply

      @Isaac – I first posted statusbar over 6 years ago, in April 2007…

      Thanks for the coffee 🙂

  6. Anonymous November 22, 2013 at 02:37 Reply

    Just wonderful! As usual 🙂 I’m very likely to start teaching this in my Matlab classes…

  7. Explicit multi-threading in Matlab – part 1 | Undocumented Matlab February 19, 2014 at 13:18 Reply

    […] slow for our specific needs. We could perhaps improve it a bit with some fancy tricks for save or fwrite. But let’s take a different approach today, using multi-threading:Using Java threadsMatlab […]

  8. Joe V May 1, 2015 at 08:18 Reply

    On a related note: When reading binary files, don’t use fseek inside a loop if you can avoid it — it’s very slow. In almost all cases I’ve tried, it’s faster to do

    fread(fid, N, '*uint8');

    fread(fid, N, '*uint8');

    and throw away the result than to do

    fseek(fid, N, 'cof');

    fseek(fid, N, 'cof');

    (although, of course, you can’t move backwards in a file using fread, so you’re stuck with fseek in that case). This seems to be a limitation of the fseek function in the C library, and it can be especially egregious on network drives.

    In general, when doing binary file I/O, touch the file as few times as possible. Reading or writing the entire file in one go is fastest. Make use of fread’s (and fwrite’s) skip parameter when possible. When performance is truly critical, read the data into memory as a uint8 array and parse it yourself using typecast, rather than relying on fread to do the parsing for you. It can be complicated and tricky and error-prone, but it’s tons faster.

  9. Kerim September 30, 2018 at 23:03 Reply

    Very surprising deatail, thank you.
    But it’s strange that you didn’t mention a possibility to improve read/write perfomance by adjusting Java. Is that real?

  10. Michelle Kline May 17, 2023 at 21:22 Reply

    Thank you, Yair! With this previously-unknown-to-me tip about fwrite() performance, you have saved me literally hours of processing time.
    Michelle Kline
    Department of Radiology and Imaging Research
    University of Utah

  11. Michelle Kline May 17, 2023 at 21:25 Reply

    *edit*
    tip about fopen(), not about fwrite(). ‘Wb’ vs. ‘wb’

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitable (6) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
Contact us
Captcha image for Custom Contact Forms plugin. You must type the numbers shown in the image
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top