I would like to introduce guest blogger Ken Johnson, a MATLAB Connections partner specializing in electromagnetic optics simulation. Today Ken will explore some performance subtleties of zero testing in Matlab.
I often have a need to efficiently test a large Matlab array for any nonzero elements, e.g.
>> a = zeros(1e4); >> tic, b = any(a(:)~=0); toc Elapsed time is 0.126118 seconds.
Simple enough. In this case, when a is all-zero, the internal search algorithm has no choice but to inspect every element of the array to determine whether it contains any nonzeros. In the more typical case where a contains many nonzeros you would expect the search to terminate almost immediately, as soon as it finds the first nonzero. But that’s not how it works:
>> a = round(rand(1e4)); >> tic, b = any(a(:)~=0); toc Elapsed time is 0.063404 seconds.
There is significant runtime overhead in constructing the logical array “a(:)~=0”, although the “any(…)” operation apparently terminates at the first true value it finds.
The overhead can be eliminated by taking advantage of the fact that numeric values may be used as logicals in Matlab, with zero implicitly representing false and nonzero representing true. Repeating the above test without “~=0”, we get a huge runtime improvement:
>> a = round(rand(1e4)); >> tic, b = any(a(:)); toc Elapsed time is 0.000026 seconds.
However, there is no runtime benefit when a is all-zero:
>> a = zeros(1e4); >> tic, b = any(a(:)); toc Elapsed time is 0.125120 seconds.
(I do not quite understand this. There should be some runtime benefit from bypassing the logical array construction.)
There is also another catch: The above efficiency trick does not work when a contains NaN values (if you consider NaN to be nonzero), e.g.
>> any([0,nan]) ans = 0
The any function ignores entries that are NaN, meaning it treats NaNs as zero-equivalent. This is inconsistent with the behavior of the inequality operator:
>> any([0,nan]~=0) ans = 1
To avoid this problem, an explicit isnan test is needed. Efficiency is not impaired when a contains many nonzeros, but there is a 2x efficiency loss when a is all-zero:
>> a = round(rand(1e4)); >> tic, b = any(a(:)) || any(isnan(a(:))); toc Elapsed time is 0.000027 seconds. >> a = zeros(1e4); >> tic, b = any(a(:)) || any(isnan(a(:))); toc Elapsed time is 0.256604 seconds.
For testing all-nonzero the NaN problem does not occur:
>> all([1 nan]) ans = 1
In this context NaN is treated as nonzero and the all-nonzero test is straightforward:
>> a = round(rand(1e4)); >> tic, b = all(a(:)); toc Elapsed time is 0.000029 seconds.
For testing any-zero and all-zero, use the complements of the above tests:
>> b = ~any(a(:)) || any(isnan(a(:))); % all zero? >> b = ~all(a(:)); % any zero?
The find operation can also be optimized by bypassing construction of a logical temporary array, e.g.
>> a = round(rand(1e4)); >> tic, b = find(a(:)~=0, 1); toc Elapsed time is 0.065697 seconds. >> tic, b = find(a(:), 1); toc Elapsed time is 0.000029 seconds.
There is no problem with NaNs in this case; the find function treats NaN as nonzero, e.g.
>> find([0,nan,1], 1) ans = 2