Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Improving fwrite performance

April 24, 2013 11 Comments

Readers of this blog are probably aware by now that I am currently writing my second book, MATLAB Performance Tuning (expected publication date: early 2014, CRC Press). During my work on this book, I encounter many surprising aspects of Matlab performance. In many cases these aspects are not un-documented per-se, but are certainly not well known in the Matlab community. So taking some artistic liberty coupled with some influence over this blog’s owner, I’ll mention some of these interesting discoveries here, even if they are not strictly-speaking undocumented.
Today’s post is about the well-known fwrite function, which is used to write binary data to file. In many cases, using fwrite provides the fastest alternative to saving data files (save(…,’-v6′) coming a close second). This function is in fact so low-level, and is used so often, that some readers may be surprised that its default speed can be improved. Today’s article applies equally to the fprintf function, which is used to save data in text format.
Apparently, there are things to be learned even with such standard low-level functions; there’s a deep moral here I guess.

Flushing and buffering

Unlike C/C++’s implementation, Matlab’s fprintf and fwrite automatically flush the output buffer whenever they are called, even when '\n' is not present in the output stream. This is not mentioned outright in the main documentation, but is stated loud and clear in the official technical support solution 1-PV371.
The only exception to this rule is when the file was fopen‘ed with the 'W' or 'A' specifiers (which for some inexplicable reason is NOT mentioned in the technical solution!), or when outputting to the MATLAB’s Command Window (more precisely, to STDOUT (fid=1) and STDERR (fid=2)). Writing data without buffering in this manner severely degrades I/O performance:

data = randi(250,1e6,1);  % 1M integer values between 1-250
% Standard unbuffered writing - slow
fid = fopen('test.dat', 'wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 14.006194 seconds.
% Buffered writing – x4 faster
fid = fopen('test.dat', 'Wb');
tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc
fclose(fid);
  => Elapsed time is 3.471557 seconds.

data = randi(250,1e6,1); % 1M integer values between 1-250 % Standard unbuffered writing - slow fid = fopen('test.dat', 'wb'); tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc fclose(fid); => Elapsed time is 14.006194 seconds. % Buffered writing – x4 faster fid = fopen('test.dat', 'Wb'); tic, for idx = 1:length(data), fwrite(fid,data(idx)); end, toc fclose(fid); => Elapsed time is 3.471557 seconds.

If I were in a generous mood, I could say that we could infer this information from fopen‘s doc page, where it mentions using the 'W' and 'A' permission specifiers to prevent automatic flushing, although it qualifies this with the very misleading statement that these specifiers are “Used with tape drives“. So first of all, who ever uses tape drives with Matlab nowadays?! Secondly, these specifiers are very useful for regular buffered I/O on standard disks and other I/O interfaces. I really think this was a poor choice of words. At the very least some extra clarification about these specifiers could be added.
It was also (IMHO) a poor design choice by MathWorks in the first place to break consistency with the C/C++ implementation for 'w' and associate the functionality to 'W' (and similarly, 'a' vs. 'A'). C’s fopen was in widespread usage for a decade before Matlab was invented, so there is really no excuse, certainly when Matlab’s fopen was so clearly modeled after the C implementation. It would have been more reasonable (again – IMHO) to preserve consistency of 'w' and 'a' for a default of buffered I/O (which is faster!), while providing the non-buffered functionality in 'W' and 'A'.
The vast majority of Matlab users fopen their files using 'w' and not 'W'. Even Matlab’s own documentation always uses 'w' and not 'W'. So coupled with the poorly-worded qualification about the tape drives, and the unintuitive inconsistency with C’s implementation, Matlab users could well be excused for not taking advantage of this feature.

Chunking I/O

The idea of buffering, and the reason behind the speedup above, is that I/O is faster when writing full pages (typically 4KB, but this changes on different platforms) and when bunched together to remove the disk access time between adjacent writes. This idea can be extended by preparing the entire file data in memory, and then using a single fwrite to write everything at once:

fid = fopen('test.dat', 'wb');
tic, fwrite(fid,data); toc
fclose(fid);
  => Elapsed time is 0.014025 seconds.

fid = fopen('test.dat', 'wb'); tic, fwrite(fid,data); toc fclose(fid); => Elapsed time is 0.014025 seconds.

In fact, assembling the entire data in memory, within a long numeric or char array, and then using a single fwrite to save this array to file, is almost as fast as we can expect to get (example). Further improvement lies in optimizing the array assembly (which is CPU and memory-intensive) rather than the I/O itself.
In this example, the I/O was so fast (14mS) that it makes sense to write everything at once. But for enormously large data files and slower disks (I use a local SSD; network hard disks are way slower), writing the entire file’s data in this manner might take long minutes. In such cases, it is advisable to deliberately break up the data into smaller chunks, and fwrite them separately in a loop, all the time providing feedback to the user about the I/O’s progress. This could help improve the operation’s perceived performance. Here’s a bare-bones example:

h = waitbar(0, 'Saving data...', 'Name','Saving data...');
cN = 100;  % number of steps/chunks
% Divide the data into chunks (last chunk is smaller than the rest)
dN = length(data);
dataIdx = [1 : round(dN/cN) : dN, dN+1];  % cN+1 chunk location indexes
% Save the data
fid = fopen('test.dat', 'Wb');
for chunkIdx = 0 : cN-1
   % Update the progress bar
   fraction = chunkIdx/cN;
   msg = sprintf('Saving data... (%d%% done)', round(100*fraction));
   waitbar(fraction, h, msg);
   % Save the next data chunk
   chunkData = data(dataIdx(chunkIdx+1) : dataIdx(chunkIdx+2)-1);
   fwrite(fid,chunkData);
end
fclose(fid);
close(h);

h = waitbar(0, 'Saving data...', 'Name','Saving data...'); cN = 100; % number of steps/chunks % Divide the data into chunks (last chunk is smaller than the rest) dN = length(data); dataIdx = [1 : round(dN/cN) : dN, dN+1]; % cN+1 chunk location indexes % Save the data fid = fopen('test.dat', 'Wb'); for chunkIdx = 0 : cN-1 % Update the progress bar fraction = chunkIdx/cN; msg = sprintf('Saving data... (%d%% done)', round(100*fraction)); waitbar(fraction, h, msg); % Save the next data chunk chunkData = data(dataIdx(chunkIdx+1) : dataIdx(chunkIdx+2)-1); fwrite(fid,chunkData); end fclose(fid); close(h);

Of course, rather than using a plain-ol’ waitbar window, we could integrate a progress bar directly into our GUI. Using my statusbar utility is one way to do it, but there are of course many other possible ways to dynamically present progress:

Dynamically updating progress using the statusbar utility (click for details)
Dynamically updating progress using the statusbar utility

Note: the entire article above applies equally well to fprintf in addition to fwrite. Storing and loading data in binary format (using fwrite/fread) is often faster than text format (using fprintf/fscanf/textscan), so we should generally use text format only if the file needs to be human-readable for any reason.
Do you know of any other trick to store data efficiently? If so, please share it in a comment.
Next week: some surprising performance aspects of Matlab’s save function.

Related posts:

  1. Improving save performance – There are many different ways of improving Matlab's standard save function performance. ...
  2. Improving Simulink performance – Simulink simulation run-time performance can be improved by orders of magnitude by following some simple steps. ...
  3. Improving graphics interactivity – Matlab R2018b added default axes mouse interactivity at the expense of performance. Luckily, we can speed-up the default axes. ...
  4. rmfield performance – The performance of the builtin rmfield function (as with many other builtin functions) can be improved by simple profiling. ...
  5. File deletion memory leaks, performance – Matlab's delete function leaks memory and is also slower than the equivalent Java function. ...
  6. Datenum performance – The performance of the built-in Matlab function datenum can be significantly improved by using an undocumented internal help function...
Performance Pure Matlab
Print Print
« Previous
Next »
11 Responses
  1. Thierry Dalon April 26, 2013 at 00:52 Reply

    Thanks Yair for the post! I was absolutely not aware of this issue. (I’ve always used fprintf with small w)
    I ‘ve already encountered “Out of Memory” in Matlab with big data files and had to go the packaging/chunking way as well so I could write big data to file.

  2. Amro April 28, 2013 at 17:58 Reply

    There is also a mention of this feature on Loren Shure’s blog when it was apparently first introduced: http://blogs.mathworks.com/loren/2006/04/19/high-performance-file-io/

    • Yair Altman April 28, 2013 at 23:18 Reply

      @Amro – thanks for the link. I’m still not sure why TMW didn’t make this feature on by default, as apparently they were well aware of its performance implications for general I/O.

  3. Improving save performance | Undocumented Matlab May 8, 2013 at 12:44 Reply

    […] Two weeks ago I posted an article about improving fwrite’s performance […]

  4. Jan Simon May 22, 2013 at 14:33 Reply

    Thanks to explaining this feature.
    There has been a ‘b’ flag for the permissions in FOPEN in Matlab 5.3, but it has been removed from the documentation for higher versions. Now a missing ‘t’ is enough to trigger the binary mode.

    It is a pleasure to remind you, Yair, that you use an undocumented feature.

  5. Isaac Stoddard May 29, 2013 at 12:28 Reply

    I went through this tuning effort at my previous job about 4 years ago, in 2007b era.
    I would have wished for the statusbar. (Or had you published it to the software library @ TMW?)

    Thanks for the blog, and enjoy a cup of coffee!

    • Yair Altman May 29, 2013 at 12:47 Reply

      @Isaac – I first posted statusbar over 6 years ago, in April 2007…

      Thanks for the coffee 🙂

  6. Anonymous November 22, 2013 at 02:37 Reply

    Just wonderful! As usual 🙂 I’m very likely to start teaching this in my Matlab classes…

  7. Explicit multi-threading in Matlab – part 1 | Undocumented Matlab February 19, 2014 at 13:18 Reply

    […] slow for our specific needs. We could perhaps improve it a bit with some fancy tricks for save or fwrite. But let’s take a different approach today, using multi-threading:Using Java threadsMatlab […]

  8. Joe V May 1, 2015 at 08:18 Reply

    On a related note: When reading binary files, don’t use fseek inside a loop if you can avoid it — it’s very slow. In almost all cases I’ve tried, it’s faster to do

    fread(fid, N, '*uint8');

    fread(fid, N, '*uint8');

    and throw away the result than to do

    fseek(fid, N, 'cof');

    fseek(fid, N, 'cof');

    (although, of course, you can’t move backwards in a file using fread, so you’re stuck with fseek in that case). This seems to be a limitation of the fseek function in the C library, and it can be especially egregious on network drives.

    In general, when doing binary file I/O, touch the file as few times as possible. Reading or writing the entire file in one go is fastest. Make use of fread’s (and fwrite’s) skip parameter when possible. When performance is truly critical, read the data into memory as a uint8 array and parse it yourself using typecast, rather than relying on fread to do the parsing for you. It can be complicated and tricky and error-prone, but it’s tons faster.

  9. Kerim September 30, 2018 at 23:03 Reply

    Very surprising deatail, thank you.
    But it’s strange that you didn’t mention a possibility to improve read/write perfomance by adjusting Java. Is that real?

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (email)
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
ActiveX (6) AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
  • Yair Altman (1 day 17 hours ago): Ren – This is usually the expected behavior, which avoids unnecessary duplications of the Excel process in CPU/memory. If you want to kill the process you can always run...
  • Yair Altman (2 days 7 hours ago): When you use plot() without hold(‘on’), each new plot() clears the axes and draws a new line, so your second plot() of p2 caused the first plot() line (p1) to be...
  • Cesim Dumlu (8 days 14 hours ago): Hello. I am trying to do a gradient plot for multiple functions to be displayed on the same axes and each one is colorcoded by respective colordata, using the same scaling. The...
  • Yair Altman (16 days 17 hours ago): @Veronica – you are using the new version of uitree, which uses HTML-based uifigures, and my post was about the Java-based uitree which uses legacy Matlab figures. For...
  • Veronica Taurino (16 days 17 hours ago): >> [txt1,txt2] ans = ‘abrakadabra’
  • Veronica Taurino (16 days 18 hours ago): Hello, I am just trying to change the uitree node name as you suggested: txt1 = 'abra'; txt2 = 'kadabra'; node.setName([txt1,txt2]); >> "Unrecognized method, property, or...
  • Yair Altman (19 days 17 hours ago): The version of JGraph that you downloaded uses a newer version of Java (11) than the one that Matlab supports (8). You need to either (1) find an earlier version of JGraph that...
  • mrv (19 days 22 hours ago): hello, I used MATLAB 2019b update9, I have add jgraphx.jar to javaclassapth, and restart matlab, but still got errors below: Warning: A Java exception occurred trying to load the...
  • xuejie wu (43 days 18 hours ago): Hi: I’m wondering if i can add my customized section or tab ?
  • Yair Altman (55 days 10 hours ago): @Sagar – use the view(az,el) function to rotate the 3D axes.
  • Sagar Chawla (55 days 10 hours ago): I want to know how to change the x-axis to the z-axis. I mean the position. Like if there is a 3d animated graph then how to change position of the axis. X-axis in place of...
  • Ren (55 days 10 hours ago): I noticed that xlsread will create a hidden and never-dying special server that always has priority when actxGetRunningServer is called. So this cause a problem that no matter how many...
  • Ben Abbott (59 days 1 hour ago): Thanks Yair, it was the second. I didn’t not include the drawnow ()
  • Yair Altman (59 days 4 hours ago): @Ben – it looks perfectly ok (with color gradient and all) on my R2022a… Perhaps you missed some of the steps (e.g. setting the ColorBinding to 'interpolated') or...
  • Ben Abbott (59 days 5 hours ago): The graded color is not working for me using R2021a. The plot “HG2 plot line color, transparency gradient” looks exactly like “Transparent HG2 plot...
Contact us
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top