**- Undocumented Matlab - https://undocumentedmatlab.com -**

Preallocation performance

Posted By Yair Altman On May 16, 2012 | __30 Comments__

Array preallocation ^{[3]} is a standard and quite well-known technique for improving Matlab loop run-time performance. Today’s article will show that there is more than meets the eye for even such a simple coding technique.

A note of caution: in the examples that follow, don’t take any speedup as an expected actual value – the actual value may well be different on your system. Your mileage may vary. I only mean to display the relative differences between different alternatives.

Memory management has a direct influence on performance. I have already shown some examples of this ^{[4]} in past articles here.

Preallocation solves a basic problem in simple program loops, where an array is iteratively enlarged with new data (dynamic array growth). Unlike other programming languages (such as C, C++, C# or Java) that use static typing, Matlab uses dynamic typing. This means that it is natural and easy to modify array size dynamically during program execution. For example:

fibonacci = [0, 1]; for idx = 3 : 100 fibonacci(idx) = fibonacci(idx-1) + fibonacci(idx-2); end

While this may be simple to program, it is not wise with regards to performance. The reason is that whenever an array is resized (typically enlarged), Matlab allocates an entirely new contiguous block of memory for the array, copying the old values from the previous block to the new, then releasing the old block for potential reuse. This operation takes time to execute. In some cases, this reallocation might require accessing virtual memory and page swaps, which would take an even longer time to execute. If the operation is done in a loop, then performance could quickly drop off a cliff.

The cost of such naïve array growth is theoretically quadratic. This means that multiplying the number of elements by N multiplies the execution time by about N^{2}. The reason for this is that Matlab needs to reallocate N times more than before, and each time takes N times longer due to the larger allocation size (the average block size multiplies by N), and N times more data elements to copy from the old to the new memory blocks.

A very interesting discussion of this phenomenon and various solutions can be found in a newsgroup thread from 2005 ^{[5]}. Three main solutions were presented: preallocation, selective dynamic growth (*allocating headroom*) and using cell arrays. The best solution among these in terms of ease of use and performance is preallocation.

The basic idea of preallocation is to create a data array in the final expected size before actually starting the processing loop. This saves any reallocations within the loop, since all the data array elements are already available and can be accessed. This solution is useful when the final size is known in advance, as the following snippet illustrates:

% Regular dynamic array growth: tic fibonacci = [0,1]; for idx = 3 : 40000 fibonacci(idx) = fibonacci(idx-1) + fibonacci(idx-2); end toc => Elapsed time is 0.019954 seconds. % Now use preallocation – 5 times faster than dynamic array growth: tic fibonacci = zeros(40000,1); fibonacci(1)=0; fibonacci(2)=1; for idx = 3 : 40000, fibonacci(idx) = fibonacci(idx-1) + fibonacci(idx-2); end toc => Elapsed time is 0.004132 seconds.

On pre-R2011a releases the effect of preallocation is even more pronounced: I got a 35-times speedup on the same machine using Matlab 7.1 (R14 SP3). R2011a (Matlab 7.12) had a dramatic performance boost for such cases in the internal accelerator, so newer releases are much faster in dynamic allocations, but preallocation is still 5 times faster even on R2011a.

Because the effect of preallocation is so dramatic on all Matlab releases, it makes sense to utilize it even in cases where the data array’s final size is not known in advance. We can do this by estimating an upper bound to the array’s size, preallocate this large size, and when we’re done remove any excess elements:

% The final array size is unknown – assume 1Kx3K upper bound (~23MB) data = zeros(1000,3000); % estimated maximal size numRows = 0; numCols = 0; while (someCondition) colIdx = someValue1; numCols = max(numCols,colIdx); rowIdx = someValue2; numRows = max(numRows,rowIdx); data(rowIdx,colIdx) = someOtherValue; end % Now remove any excess elements data(:,numCols+1:end) = []; % remove excess columns data(numRows+1:end,:) = []; % remove excess rows

It turns out that MathWorks’ official suggestion ^{[6]} for preallocation, namely using the * zeros* function, is not the most efficient:

% MathWorks suggested variant clear data1, tic, data1 = zeros(1000,3000); toc => Elapsed time is 0.016907 seconds. % A much faster alternative - 500 times faster! clear data1, tic, data1(1000,3000) = 0; toc => Elapsed time is 0.000034 seconds.

The reason for the second variant being so much faster is because it only allocates the memory, without worrying about the internal values (they get a default of 0, *false* or ”, in case you wondered). On the other hand, * zeros* has to place a value in each of the allocated locations, which takes precious time.

In most cases the differences are immaterial since the preallocation code would only run once in the program, and an extra 17ms isn’t such a big deal. But in some cases we may have a need to periodically refresh our data, where the extra run-time could quickly accumulate.

** Update (October 27, 2015)**: As Marshall notes below

When we need to preallocate a specific value into every data array element, we cannot use Variant #2. The reason is that Variant #2 only sets the very last data element, and all other array elements get assigned the default value (0, ‘’ or false, depending on the array’s data type). In this case, we can use one of the following alternatives (with their associated timings for a 1000×3000 data array):

scalar = pi; % for example... data = scalar(ones(1000,3000)); % Variant A: 87.680 msecs data(1:1000,1:3000) = scalar; % Variant B: 28.646 msecs data = repmat(scalar,1000,3000); % Variant C: 17.250 msecs data = scalar + zeros(1000,3000); % Variant D: 17.168 msecs data(1000,3000) = 0; data = data+scalar; % Variant E: 16.334 msecs

As can be seen, Variants C-E are about twice as fast as Variant B, and 5 times faster than Variant A.

7.4.5 Preallocating non-double data

When preallocating an array of a type that is not * double*, we should be careful to create it using the desired type, to prevent memory and/or performance inefficiencies. For example, if we need to process a large array of small integers (

% Bad idea: allocates 8MB double array, then converts to 1MB int8 array data = int8(zeros(1000,1000)); % 1M elements => Elapsed time is 0.008170 seconds. % Better: directly allocate the array as a 1MB int8 array – x80 faster data = zeros(1000,1000,'int8'); => Elapsed time is 0.000095 seconds.

To preallocate a cell-array we can use the cell function (explicit preallocation), or the maximal cell index (implicit preallocation). Explicit preallocation is faster than implicit preallocation, but functionally equivalent (Note: this is contrary to the experience with allocation of numeric arrays and other arrays):

% Variant #1: Explicit preallocation of a 1Kx3K cell array data = cell(1000,3000); => Elapsed time is 0.004637 seconds. % Variant #2: Implicit preallocation – x3 slower than explicit clear('data'), data{1000,3000} = []; => Elapsed time is 0.012873 seconds.

To preallocate an array of structs or class objects, we can use the * repmat* function to replicate copies of a single data element (explicit preallocation), or just use the maximal data index (implicit preallocation). In this case, unlike the case of cell arrays, implicit preallocation is much faster than explicit preallocation, since the single element does not actually need to be copied multiple times (ref

% Variant #1: Explicit preallocation of a 100x300 struct array element = struct('field1',magic(2), 'field2',{[]}); data = repmat(element, 100, 300); => Elapsed time is 0.002804 seconds. % Variant #2: Implicit preallocation – x7 faster than explicit element = struct('field1',magic(2), 'field2',{[]}); clear('data'), data(100,300) = element; => Elapsed time is 0.000429 seconds.

When preallocating structs, we can also use a third variant, using the built-in struct feature of replicating the struct when the * struct* function is passed a cell array. For example,

`struct('field1',cell(100,1), 'field2',5)`

will create 100 structs, each of them having the empty field When preallocating in general, ensure that you are using the maximal expected array size. There is no point in preallocating an empty array or an array having a smaller size than the expected maximum, since dynamic memory reallocation will automatically kick-in within the processing-loop. For this reason, do not use ^{[10]} the *empty()* method of class objects to preallocate, but rather * repmat* as explained above.

When using * repmat* to replicate class objects, always be careful to note whether you are replicating the object itself (this happens if your class does NOT derive from

Next week: what if we can’t avoid dynamic array resizing? – apparently, all is not lost. Stay tuned…

Do you have any similar allocation-related tricks you’re using? or unexpected differences such as the ones shown above? If so, then please do post a comment ^{[12]}.

Categories: Low risk of breaking in future versions, Memory, Stock Matlab function, Undocumented feature

30 Comments (Open | Close)

Article printed from Undocumented Matlab: **https://undocumentedmatlab.com**

URL to article: **https://undocumentedmatlab.com/blog/preallocation-performance**

URLs in this post:

[1] Image: **http://undocumentedmatlab.com/feed/**

[2] **email feed**: **http://undocumentedmatlab.com/subscribe_email.html**

[3] preallocation: **http://www.mathworks.com/help/techdoc/matlab_prog/f8-784135.html#f8-793781**

[4] some examples of this: **https://undocumentedmatlab.com/blog/matlab-java-memory-leaks-performance/**

[5] newsgroup thread from 2005: **https://www.mathworks.com/matlabcentral/newsreader/view_thread/102704**

[6] official suggestion: **http://www.mathworks.com/help/techdoc/matlab_prog/f8-784135.html#f8-793795**

[7] notes below: **https://undocumentedmatlab.com/blog/preallocation-performance#comment-359956**

[8] the new LXE: **https://undocumentedmatlab.com/blog/callback-functions-performance**

[9] ref: **http://www.mathworks.com/support/solutions/en/data/1-7S1YKO/**

[10] do not use: **http://stackoverflow.com/questions/2510427/how-to-preallocate-an-array-of-class-in-matlab**

[11] need to call: **http://stackoverflow.com/questions/591495/matlab-preallocate-a-non-numeric-vector#591788**

[12] post a comment: **https://undocumentedmatlab.com/blog/preallocation-performance/#respond**

[13] Array resizing performance : **https://undocumentedmatlab.com/blog/array-resizing-performance**

[14] Internal Matlab memory optimizations : **https://undocumentedmatlab.com/blog/internal-matlab-memory-optimizations**

[15] Allocation performance take 2 : **https://undocumentedmatlab.com/blog/allocation-performance-take-2**

[16] Quirks with parfor vs. for : **https://undocumentedmatlab.com/blog/quirks-with-parfor-vs-for**

[17] Plot LimInclude properties : **https://undocumentedmatlab.com/blog/plot-liminclude-properties**

[18] Customizing axes part 2 : **https://undocumentedmatlab.com/blog/customizing-axes-part-2**

[19] : **http://www.psi.toronto.edu/~vincent/matlabindexrepmat.html**

[20] : **http://stackoverflow.com/questions/14169222/faster-way-to-initilize-arrays-via-empty-matrix-multiplication-matlab#comment19679128_14169222**

[21] : **http://undocumentedmatlab.com/blog/allocation-performance-take-2**

[22] : **http://undocumentedmatlab.com/blog/callback-functions-performance**

Click here to print.

Copyright © Yair Altman - Undocumented Matlab. All rights reserved.

30 Comments To "Preallocation performance"

#1 CommentByRichard KennawayOn May 16, 2012 @ 5:53 amIt turns out that MathWorks’ official suggestion for preallocation, namely using the zeros function, is not the most efficient:

Doesn’t work for me (Matlab 2011a on Mac OS 10.6.8):

For “Pre-allocating non-default values”, my timings mostly agree, except that C and D are the other way round:

A 0.076208 seconds.

B 0.010146 seconds.

C 0.024495 seconds.

D 0.014459 seconds.

E 0.012212 seconds.

The other examples all worked the same for me as in the post.

#2 CommentByCrisOn May 16, 2012 @ 7:25 amThis is strange! Can you explain why the 2nd variant is only 100 faster the first time around? Only after a ‘clear all’ you get a fast execution again.

#3 CommentByYair AltmanOn May 16, 2012 @ 9:39 am@Chris & Richard: perhaps this is platform or situation dependent. On my Windows XP running both R2010b and R2012a,

`data1=zeros(1000,3000)`

is consistently hundreds of timesslowerthan`data1(1000,3000)=0`

.Perhaps you are running this on a “warm” Matlab session that is running on fragmented memory? Try running on a new Matlab session that does not clash for system memory with other heavy processes (e.g. a multi-tab browser).

#4 CommentBypedroOn May 24, 2012 @ 2:07 amgreetings!

“@Chris & Richard: perhaps this is platform or situation dependent. On my Windows XP running both R2010b and R2012a, data1=zeros(1000,3000) is consistently hundreds of times faster than data1(1000,3000)=0.”

now I’m confused: did you mean slower, by any chance?

cheers,

pedro

#5 CommentByYair AltmanOn May 24, 2012 @ 3:10 am@Pedro – you’re right, it was a typo (now corrected) – thanks

#6 CommentByDanielOn May 17, 2012 @ 5:14 amI can recreate Yair’s differences between

`data1 = zeros(1000,3000);`

and`data2(1000,3000) = 0;`

on 32-bit Windows XP 32 with R2010a, but not on 64-bit Linux with R2011a. So it looks like it might be OS or 32/64 bit dependent.But I wanted to explore a little more so I decided to up the size of the matrix so it would take more time to compute in Linux

It appears it is faster to preallocate a 1e4 x 1e4 (800MB) matrix than a 1e3 x 1e3 (8MB) matrix.

#7 CommentByMarcOn May 16, 2012 @ 2:03 pmOf course it’s good to profile first, then use this technique if warranted. Otherwise, write clear, simple code that’s easy to reason about. I usually shut off this particular Code Analyzer warning because it’s almost never been an important bottleneck for me. YMMV

#8 CommentByHéctorOn May 18, 2012 @ 6:56 amI run on windows 7, Matlab R2010a. My Matlab has been doing things for 2 days and was release from heavy duty 5 minutes ago. This are my results:

#9 CommentByAlexanderOn May 20, 2012 @ 11:44 pmDear Yair,

in your nice post you used the code

I’ve never seen this way of using a scalar, mapping it to an array. Is there any passage in the documentation you could cite or explanation why and how precisely this works?

Alexander

#10 CommentByYair AltmanOn May 21, 2012 @ 12:20 am@Alexander – using indexing to repeat array elements is such a standard technique for me that I didn’t even think twice about the fact that it may not be well known… This technique is used numerous times in the Matlab code corpus, although now that you asked I couldn’t find a direct doc reference for it (there probably is some reference, I just couldn’t find it).

Here is an unofficial

^{[19]}that explains the use.#11 CommentByAlexanderOn May 21, 2012 @ 2:59 amDear Yair,

thanks for the link. Indeed I’ve never seen this way of indexing befor and I’m sure I’ll use it in many places.

The main step to understand seems to be to think of scalars being implemented as arrays with a singleton dimension. Then everything becomes clear. Thinks could be so easy …

Thanks

#12 PingbackByInternal Matlab memory optimizations | Undocumented MatlabOn May 30, 2012 @ 5:09 am[…] Since I discussed related topics in the past two weeks (preallocation performance, array resizing performance), these internal optimizations seemed a natural topic for today […]

#13 PingbackBypreallocation is not an option | Matlabtips.comOn June 25, 2012 @ 10:29 pm[…] from UndocumentedMatlab recently introduced a very nice trick to boost this even further. It turns out that most of the time spent in the function zeros is […]

#14 CommentByTroyOn December 7, 2012 @ 8:19 amFor “Variants for pre-allocation”:

Using zeros is about as fast for me, as long as you specify the class, e.g.:

is comparable on my machine to setting z(10000,10000) = 0

(Elapsed time is 0.000030 seconds.)

#15 CommentByMikeOn December 17, 2012 @ 6:46 amThanks for putting this information up. To comment on the timings, I think they are too small to be measured correctly this way. You should run these commands in a loop for a couple thousand times to get timings into seconds, otherwise they will be influenced by context switching of your operating system.

I doubt it matters if you use the zeroes command or direct indexing. As you say, the array gets initialized with zeroes either way (FALSE or ” are zeroes as well). Using different data types might cause a smaller number of bytes to be allocated and a smaller initialization time.

#16 CommentBynatanOn January 7, 2013 @ 1:35 pmDear Yair,

Thanks for this fascinating post.

Incidentally, after noticing that empty matrix multiplication also produces a matrix zeros:

I’ve discovered that this is another way to improve pre-allocation performance. For example,

vs:

That improvement is array size dependent and only happens above some matrix size, more info can be seen in

^{[20]}.#17 CommentByYair AltmanOn January 7, 2013 @ 2:07 pm@Nathan – Thanks for the update. I think that

`a(1000,1000)=0;`

is simpler to use. IMHO, the potential minor extra saving by the product trick is not worth the extra complexity. Still, I admit it’s a very neat trick.#18 PingbackByAllocation performance take 2 | Undocumented MatlabOn August 14, 2013 @ 11:01 am[…] I used my finding that zeros(n,m)+scalar is often faster than ones(n,m)*scalar (see my article on pre-allocation performance). Applying this to Mike’s powers-vector example, zeros(n,m)+scalar only gave me a 25% […]

#19 PingbackByClass object creation performance | Undocumented MatlabOn December 13, 2013 @ 3:12 am[…] (or demand-driven) evaluation.For additional aspects of preallocation performance, refer to my article from last year, which discusses class objects only briefly, but expands on the other data types.London training […]

#20 CommentByhackndoOn May 4, 2015 @ 8:15 amHi, thanks for your post, very useful. I tried some stuff, and I found this :

From data(1,1) to data(2346,2346), time is increasing linearly, and when reaching 2347×2347, execution time falls : 100 time faster. I’m wondering why is 2346 a “limit” … Weird !

Thanks again for your posts

#21 CommentByYair AltmanOn May 9, 2015 @ 9:57 pmRead this related post:

^{[21]}#22 PingbackByFaster way to initialize arrays via empty matrix multiplication? (Matlab) – DexPageOn July 18, 2015 @ 9:52 am[…] doing some research, I’ve found this article in “Undocumented Matlab”, in which Mr. Yair Altman had already come to the conclusion […]

#23 PingbackByPivot to binary matrix from categorial array – DexPageOn July 25, 2015 @ 10:25 am[…] both these approaches use the same hacky technique to pre-allocate as listed in Undocumented MATLAB and also listed in the other answer by @rayryeng. On top of it, it uses a raw version of […]

#24 CommentByMarshallOn October 27, 2015 @ 8:52 amHi Yair,

I’m wondering if Mathworks have changed zeros() in the more recent revisions. I’m currently seeing zeros() as being significantly faster than the end-referencing technique, for example:

#25 CommentByYair AltmanOn October 27, 2015 @ 9:45 am@Marshall – indeed: the behavior changed in R2015b when

^{[22]}(Matlab’s new execution engine) replaced the previous engine.#26 CommentByXiangrui LiOn November 1, 2016 @ 4:44 pmI can replicate this version difference since 2015b. The following observation may be interesting.

If I open Windows Task Manager or Linux System Monitor to watch for memory usage, whenever the allocation (Matlab versions and zeros /end-referencing combinations) is slower, it involves increased memory usage, while the faster one does not. This suggests the faster one is using pre-allocated memory when Matlab starts. Any idea to take advantage of this? Thanks.

#27 CommentByDarrellOn October 28, 2015 @ 10:52 amI just upgraded from R2015a to R2015b and a MATLAB script I have that used to consume about 2.1 GB now consumes 4.6 GB of memory. That’s a big problem for me because I used to be able to run three copies of MATLAB in parallel with different parameter settings and now I can only run one on my 8 GB, 4 processor machine. You mentioned a new LXE above, and I’m wondering if that is the culprit and if anything can be done about it. If it’s faster than the old version, that’s great, but not if it uses way more memory.

#28 CommentByAnoush NajarianOn November 5, 2015 @ 9:35 pm@Darrell, thank you for bringing this to our attention!

Would it be possible for us to take a look at your code?

Would you be open to connecting with us offline to discuss your use case? Feel free to email Anoush dot Najarian at mathworks dot com.

#29 PingbackByArray resizing in MATLAB | Possibly WrongOn February 8, 2017 @ 5:40 pm[…] Pre-allocation is usually recommended as a fix: that is, size the entire array in advance, before assigning any elements. But what if you don’t know the size of the array in advance? Although there are several approaches to dealing with this situation, with varying complexity, the subject of this post is to describe just how much execution time may be eliminated by only a modest change to the above code. […]

#30 PingbackBy变量出现在每个循环迭代的大小改变-什么？ – CodingBlogOn May 21, 2017 @ 6:00 pm[…] Preallocation performance […]