- Undocumented Matlab - https://undocumentedmatlab.com/blog_old -
File deletion memory leaks, performance
Posted By Yair Altman On September 5, 2012 | 7 Comments
Last week I wrote about Matlab’s built-in pause function, that not only leaks memory but also appears to be less accurate than the equivalent Java function. Today I write about a very similar case. Apparently, using Matlab’s delete function not only leaks memory but is also slower than the equivalent Java function.
The memory leak in delete was (to the best of my knowledge) originally reported [3] in the CSSM newsgroup and on this blog [4] a few weeks ago. The reporter mentioned that after deleting 760K files using delete, he got a Java Heap Space out-of-memory error. The reported solution was to use the Java equivalent, java.io.File(filename).delete()
, which does not leak anything.
I was able to recreate the report on my WinXP R2012a system, and discovered what appears to be a memory leak of ~150 bytes per file. This appears to be a very small number, but multiply by 760K (=111MB) and you can understand the problem. Of course, you can always increase the size of the Java heap used by Matlab (here’s how [5]), but this should only be used as a last resort and certainly not when the solution is so simple.
For those interested, here’s the short test harness that I’ve used to test the memory leak:
function perfTest() rt = java.lang.Runtime.getRuntime; rt.gc(); java.lang.Thread.sleep(1000); % wait 1 sec to let the GC time to finish orig = rt.freeMemory; % in bytes testSize = 50000; for idx = 1 : testSize % Create a temp file tn = [tempname '.tmp']; fid = fopen(tn,'wt'); fclose(fid); % Delete the temp file delete(tn); %java.io.File(tn).delete(); end rt.gc(); java.lang.Thread.sleep(1000); % wait 1 sec to let the GC time to finish free = rt.freeMemory; totalLeak = orig - free; leakPerCall = totalLeak / testSize end
I placed it in a function to remove command-prompt-generated fluctuations, but it must still be run several times to smooth the data. The main reason for the changes across runs is the fact that the Java heap is constantly growing and shrinking in a seesaw manner [6], and explicitly calling the garbage collector as I have done does not guarantee that it actually gets performed immediately or fully. By running a large-enough loop, and rerunning the test several times, the results become consistent due to the law of large numbers [7].
Running the test above with the delete line commented and the java.io.File
line uncommented, shows no discernible memory leak.
To monitor Matlab’s Java heap space size in runtime, see my article [8] from several months ago, or use Elmar Tarajan’s memory-monitor utility [9] from the File Exchange.
Note: there are numerous online resources about Java’s garbage collector. Here’s one interesting article [10] that I have recently come across.
When running the test function using java.io.File
, we notice a significant speedup compared to running using delete. The reason is that (at least on my system, YMMV) delete takes 1.5-2 milliseconds to run while java.io.File
only takes 0.4-0.5 ms. Again, this doesn’t seem like much, but multiply by thousands of files and it starts to be appreciable. For our 50K test harness, the difference translates into ~50 seconds, or 40% of the overall time.
Since we’re dealing with file I/O, it is important to run the testing multiple times and within a function (not the Matlab Command Prompt), to get rid of spurious measurement artifacts.
Have you encountered any other Matlab function, where the equivalent in Java is better? If so, please add a comment [11] below.
Categories: Uncategorized
Article printed from Undocumented Matlab: https://undocumentedmatlab.com/blog_old
URL to article: https://undocumentedmatlab.com/blog_old/file-deletion-memory-leaks-performance
URLs in this post:
[1] Image: https://undocumentedmatlab.com/feed/
[2] email feed: https://undocumentedmatlab.com/subscribe_email.html
[3] reported: https://www.mathworks.com/matlabcentral/newsreader/view_thread/305515#885698
[4] on this blog: https://undocumentedmatlab.com/blog/matlab-java-memory-leaks-performance/#comment-104833
[5] here’s how: http://www.mathworks.co.uk/support/solutions/en/data/1-18I2C/
[6] growing and shrinking in a seesaw manner: http://www.javaperformancetuning.com/tools/gcviewer/index.shtml
[7] law of large numbers: http://en.wikipedia.org/wiki/Law_of_large_numbers
[8] my article: https://undocumentedmatlab.com/blog/profiling-matlab-memory-usage/
[9] memory-monitor utility: http://www.mathworks.com/matlabcentral/fileexchange/8169-matlab-memory-monitor-v2-4
[10] interesting article: http://middlewaremagic.com/weblogic/?p=6388
[11] add a comment: https://undocumentedmatlab.com/blog/file-deletion-memory-leaks-performance/#respond
[12] Matlab-Java memory leaks, performance : https://undocumentedmatlab.com/blog_old/matlab-java-memory-leaks-performance
[13] Pause for the better : https://undocumentedmatlab.com/blog_old/pause-for-the-better
[14] Matlab installation woes : https://undocumentedmatlab.com/blog_old/matlab-installation-woes
[15] Array resizing performance : https://undocumentedmatlab.com/blog_old/array-resizing-performance
[16] Waiting for asynchronous events : https://undocumentedmatlab.com/blog_old/waiting-for-asynchronous-events
[17] New book: Accelerating MATLAB Performance : https://undocumentedmatlab.com/blog_old/new-book-accelerating-matlab-performance
Click here to print.
Copyright © Yair Altman - Undocumented Matlab. All rights reserved.
7 Comments To "File deletion memory leaks, performance"
#1 Comment By Jan Simon On September 7, 2012 @ 1:46 am
In text you explain “*pause* takes 1.5-2 milliseconds to run while java.io.File only takes 0.4-0.5 ms”. Do you really mean *pause* or *delete*?
#2 Comment By Yair Altman On September 7, 2012 @ 5:33 am
@Jan – thanks, sharp eyes! (now corrected)
#3 Comment By Jeremy On June 3, 2013 @ 2:51 pm
Yair- Dug up this post while investigating a similar issue with:
In the Matlab call of exist, they are caching the result as to make subsequent calls fast, but if you do large numbers of “exist” calls on files with different names (e.g. with constantly increasing index values in the file name), each one of those results is permanently cached. The result is a constant memory “leak” (well, technically memory is being put to good use, so not a leak in that sense, but there apparently no way to clear that cache and recover the memory). However, using:
works like a charm… AS LONG AS you don’t have the need to search the entire Matlab path. That is the ONE thing the Java call won’t do that the exist() call does.
#4 Comment By YoGabbaGabba On April 17, 2015 @ 1:40 pm
Do we know if this memory leak still exists in Matlab 2013b or newer versions?
If so, if I want to override these files for a project, how do I avoid shadow warnings if I wish to keep the files named the same and on my path? Thanks
#5 Comment By Yair Altman On April 18, 2015 @ 10:19 am
@Bobby – the test script above should be fairly easy for you to run on your specific system to check this. And as noted, you can always use the
java.io.File(tn).delete()
workaround.#6 Comment By stefan On June 16, 2015 @ 2:13 am
i ve got a problem with saving matlab files (.mat). yesterday I overwrote my existing file. no errors occured. today, my file is almost empty (one of 4 folders left).
is there ANY possibility to recover/restore?
#7 Comment By Yair Altman On June 16, 2015 @ 4:42 am
@Stefan – MAT files do not contain history (except if you designed them to contain it), so yyour only option is to check if this file was backed up.