Comments on: ismembc – undocumented helper function https://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function Charting Matlab's unsupported hidden underbelly Wed, 20 May 2020 03:01:17 +0000 hourly 1 https://wordpress.org/?v=4.4.1 By: Yair Altmanhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-432443 Sun, 15 Jul 2018 14:39:08 +0000 https://undocumentedmatlab.com/?p=164#comment-432443 @Rik – in a vast number of real-life use-cases, we already know in advance that either a or b or both are already sorted, and in this case ismembc is still faster than ismember, to this very day.
Even if you sort the arrays yourself (as in your 3rd usage example), ismembc is still much faster than ismember for most Matlab releases and almost as fast even on the latest release.
To make a long story short, there is very little (or no) down-side to using ismembc, at least from a performance viewpoint, as long as your inputs are non-sparse etc.

]]>
By: Rik Wisselinkhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-432439 Sun, 15 Jul 2018 13:01:12 +0000 https://undocumentedmatlab.com/?p=164#comment-432439 I’m sorry, I edited my code after I already pasted the code here and forgot to update the code along with the output. The remarks in the posted text still stand though. Below is a 20 iteration version along with the sum timings.

clc,n=2e6; a=ceil(n*rand(n,1)); b=ceil(n*rand(n,1)); d=sort(b);c=sort(a);
t_option=zeros(4,20);
for n_tic=1:size(t_option,2)
   tic; ismembc(c,d);              t_option(1,n_tic)=toc;
   tic; ismember(a,b);             t_option(2,n_tic)=toc;
   tic; ismembc(sort(a),sort(b));  t_option(3,n_tic)=toc;
   tic; ismembc(a,sort(b));        t_option(4,n_tic)=toc;
end
fprintf('%06.3f sec (pre-sorted a and b)\n',sum(t_option(1,:)));
fprintf('%06.3f sec (ismember)\n',          sum(t_option(2,:)));
fprintf('%06.3f sec (sort(a),sort(b))\n',   sum(t_option(3,:)));
fprintf('%06.3f sec (unsorted a,sort(b))\n',sum(t_option(4,:)));
%ML6.5
02.835 sec (pre-sorted a and b)
19.311 sec (ismember)
12.037 sec (sort(a),sort(b))
18.909 sec (unsorted a,sort(b))
 
%R2012b
02.415 sec (pre-sorted a and b)
14.289 sec (ismember)
05.581 sec (sort(a),sort(b))
14.176 sec (unsorted a,sort(b))
 
%R2018a
02.635 sec (pre-sorted a and b)
05.794 sec (ismember)
06.050 sec (sort(a),sort(b))
14.147 sec (unsorted a,sort(b))
]]>
By: Yair Altmanhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-432383 Sat, 14 Jul 2018 17:55:19 +0000 https://undocumentedmatlab.com/?p=164#comment-432383 @Rik – your outputs don’t correspond to your code, so they do not make any sense. Moreover, instead of reporting 3 separate timing instances, it would be better to report the total run-time of a loop of [say] 10-20 iterations.

]]>
By: Rik Wisselinkhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-432381 Sat, 14 Jul 2018 17:40:38 +0000 https://undocumentedmatlab.com/?p=164#comment-432381 @Yair, unfortunately, that is the same sample data as you use in your article. If you include the sort, ismembc becomes much slower to use. Interestingly, ismember on the older releases is similar to doing a sort of b and then calling ismembc, while in R2018a it has similar timings to a double sort before a call to ismembc. So sorting your own data is worth it (even with calling sort to an already sorted array, it is still faster than ismember).

clc,n=2e6; a=ceil(n*rand(n,1)); b=ceil(n*rand(n,1)); c=sort(b);d=sort(a);
tic;ismember(a,b);fprintf('%.3f sec, ',toc);
tic;ismember(a,b);fprintf('%.3f sec, ',toc);
tic;ismember(a,b);fprintf('%.3f sec (ismember)\n',toc);
tic;ismembc(a,sort(b));fprintf('%.3f sec, ',toc);
tic;ismembc(a,sort(b));fprintf('%.3f sec, ',toc);
tic;ismembc(a,sort(b));fprintf('%.3f sec (ismembc and sort)\n',toc);
tic;ismembc(a,c);fprintf('%.3f sec, ',toc);
tic;ismembc(a,c);fprintf('%.3f sec, ',toc);
tic;ismembc(a,c);fprintf('%.3f sec (ismembc, unsorted a)\n',toc);
tic;ismembc(d,c);fprintf('%.3f sec, ',toc);
tic;ismembc(d,c);fprintf('%.3f sec, ',toc);
tic;ismembc(d,c);fprintf('%.3f sec (ismembc, sorted a)\n',toc);

On my W10x64 machine this returns the following timings:

%ML6.5
0.137 sec, 0.139 sec, 0.138 sec (pre-sorted a and b)
0.896 sec, 0.901 sec, 0.909 sec (ismember)
0.582 sec, 0.582 sec, 0.578 sec (sort(a),sort(b))
0.865 sec, 0.862 sec, 0.866 sec (unsorted a,sort(b))
 
%R2012b
0.120 sec, 0.119 sec, 0.119 sec (pre-sorted a and b)
0.676 sec, 0.662 sec, 0.651 sec (ismember)
0.257 sec, 0.258 sec, 0.261 sec (sort(a),sort(b))
0.676 sec, 0.673 sec, 0.639 sec (unsorted a,sort(b))
 
%R2018a
0.127 sec, 0.124 sec, 0.123 sec (pre-sorted a and b)
0.249 sec, 0.248 sec, 0.256 sec (ismember)
0.287 sec, 0.272 sec, 0.289 sec (sort(a),sort(b))
0.664 sec, 0.676 sec, 0.664 sec (unsorted a,sort(b))
]]>
By: Ilyahttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-354685 Fri, 07 Aug 2015 13:16:57 +0000 https://undocumentedmatlab.com/?p=164#comment-354685 If I’ve understood (and tested) it correctly, Inf-padding at the end of any of the 2 input vectors should not be a problem (sometimes vectors are 0 or NaN padded to preserve certain dimensionality)…
Otherwise, a very nice post, it really helped me a lot!

]]>
By: Oleg Komarovhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-333334 Sun, 05 Oct 2014 16:45:41 +0000 https://undocumentedmatlab.com/?p=164#comment-333334 For ismembc(), the first input needs NOT to be sorted. Follows the heuristic that tests the assertion(terminate execution with CTRL+C):

c = 0;
while true 
    c          = c+1;
    A          = randi(1e6,1e5,1);
    B          = sort(randi(1e6,1e3,1)); 
    [idx, pos] = ismember(A,B);
    if ~isequal(idx,ismembc(A,B))
        disp('fail')
        disp(c)
        break 
    end
end

The same holds true for ismembc2(), i.e. first input needs not to be sorted, IFF we are getting the positions under the ‘legacy’ flag:
The pos output:

c = 0;
while true 
    c          = c+1;
    A          = randi(1e6,1e5,1);
    B          = sort(randi(1e6,1e3,1)); 
    [idx, pos] = ismember(A,B,'legacy');
    if ~isequal(pos,ismembc2(A,B))
        disp('fail')
        disp(c)
        break 
    end
end

To reproduce the same pos as with ismember() in >= R2012b, you do NOT need sorted A, but should have unique and sorted B.

]]>
By: Yair Altmanhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-299830 Thu, 28 Nov 2013 08:16:32 +0000 https://undocumentedmatlab.com/?p=164#comment-299830 @Ramy – I am not sure I understand – you ran the same ismember command several times, of course it would take a similar amount of time. If you run the same thing with ismembc you’ll see that it’s much faster. On the other hand, remember that ismembc must have sorted inputs, and your inputs are currently random…

]]>
By: Ramyhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-299787 Thu, 28 Nov 2013 01:56:57 +0000 https://undocumentedmatlab.com/?p=164#comment-299787 There is almost no difference in the new 2013b version of Matlab:

>> n=2e6; a=ceil(n*rand(n,1)); b=ceil(n*rand(n,1));
>> tic;ismember(a,b);toc;
Elapsed time is 0.846894 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 0.817701 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 0.808824 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 0.817153 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 0.817318 seconds.
>> tic;ismember(a,b);toc;
Elapsed time is 0.810535 seconds.
]]>
By: sprintfc – undocumented helper function | Undocumented Matlabhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-299747 Wed, 27 Nov 2013 20:19:19 +0000 https://undocumentedmatlab.com/?p=164#comment-299747 […] bumping into them within the m-code of standard functions. Such was the case, for example, of the ismembc function, that I described here back in 2009, and the dtstr2dtnummx function that I described in 2011. Today […]

]]>
By: Yair Altmanhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-58796 Thu, 06 Oct 2011 20:14:42 +0000 https://undocumentedmatlab.com/?p=164#comment-58796 @Vivien – the preconditions required by ismembc were indeed mentioned in the article:

ismembc should not be used carelessly: as noted, its inputs must be sorted non-sparse non-NaN values. In the general case we should either ensure this programmatically (as done in setxor) or use ismember, which handles this for us.

]]>
By: Vivienhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-58773 Thu, 06 Oct 2011 08:04:58 +0000 https://undocumentedmatlab.com/?p=164#comment-58773 It doesn’t work if it’s not sorted. The function stops as soon as it meets a higher value. The given example is bad because the function stops most of the time too early (but ismembc is indeed faster).

>> a = [3,5]; b = [1,2,3,4,9,5];
>> ismembc(a,b)
ans =
     1     0
]]>
By: Undocumented Matlab at Nordt Bloghttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-54334 Thu, 18 Aug 2011 09:35:08 +0000 https://undocumentedmatlab.com/?p=164#comment-54334 […] and infos. (Click here) Especially the performance // hidden functions are to be looked at. The ismembc2 tipp, hell yeah « One […]

]]>
By: Robhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-45884 Thu, 02 Jun 2011 00:17:48 +0000 https://undocumentedmatlab.com/?p=164#comment-45884 interestingly, it appears that the variable “a” does not have to be sorted for using ismembc(), but the algorithm runs much more quickly if it is.

]]>
By: Datenum performance | Undocumented Matlabhttps://undocumentedmatlab.com/blog_old/ismembc-undocumented-helper-function#comment-42711 Thu, 05 May 2011 18:37:37 +0000 https://undocumentedmatlab.com/?p=164#comment-42711 […] This question reminded me of a similar case that I answered exactly two years ago, of improving the performance of the built-in ismember function. In both cases, the solution to the performance question can be found by simply using Matlab’s built-in profiler in order to extract just the core processing functionality. […]

]]>