Comments on: Convolution performance

By: Coo Coo

Coo Coo — Tue, 06 Dec 2022 20:26:46 +0000

FFT-based convolution is circular whereas MATLAB's conv functions have several options ('valid', 'same', 'full') but unfortunately not 'circ'. For that you need to wrap your own conv function at a cost of replicating the array with padding.

function C = cconvn(A,B)
% cconvn  N-dimensional circular convolution

sA = size(A);
sB = size(B);

% indices with wrapped endpoints
for k = 1:numel(sA)
    if sA(k)==1 || k > numel(sB) || sB(k)==1
        s{k} = ':';
    else
        s{k} = [sA(k)-ceil(sB(k)/2)+2:sA(k) 1:sA(k) 1:floor(sB(k)/2)];
    end
end

% pad array for convn valid
C = convn(A(s{:}),B,'valid');

By: Yair Altman

Yair Altman — Sun, 01 Oct 2017 14:42:38 +0000

In reply to Alex. @Alex - this is due to your use of the optional 'replicate' option in your call to imfilter. You are not doing the same with conv2fft or convn, which causes the results to look different. Border-pixels replication is especially important in cases such as yours where the kernel size is the same size as the input image; If you remove the 'replicate' option in your call to imfilter, you will see that the results look the same (to the naked eye at least...). If you want to use conv2fft or convn rather than the slow imfilter, and yet you still want to see a nice-looking image, then you should either reduce the kernel size, or enlarge the input image (so that the original image is at its center) and take care of the boundary pixels. You can either do it the same way as the 'replicate' option, or in a different way. For example, here is a simple implementation that at least in my eyes gives superior results even compared to imfilter:

c2 = repmat(CICcut,3,3);  % c2 is 3072x3072, CICcut is 1024x1024
filteredN = convn (g, c2, 'same');
subplot 155, imshow (filteredN, []);   title ({'Gravitational potential' 'convn'})

By: Alex

Alex — Sat, 30 Sep 2017 13:50:28 +0000

In reply to Alex. Hello, I am having a problem trying to do FFT-based convolution in 2D. convnfft is definitely the fastest one, but only imfilter produces a valid result. For convnfft and convn the result is wrong, as can be seen in the minimal working example below:

% generate image
len = 2^10;
CICcut = zeros (len);
CICcut = imnoise (CICcut, 'salt & pepper', 0.0001);
CICcut = CICcut.*(rand(len)).^2;
gauss = fspecial('gaussian', round(sqrt(len)), sqrt(sqrt(len)));
CICcut = imfilter (CICcut, gauss, 'replicate', 'conv');

% generate kernel
g = zeros(len);
lenMone = len-1;
for i = 1:len
    for j = 1:len
        g(i, j) = ((i-1)/lenMone - 0.5)^2 + ((j-1)/lenMone - 0.5)^2;
    end
end
g = -log(sqrt(g));

% convolution
tic
filtered    = imfilter (g, CICcut, 'replicate', 'conv');
toc
tic
filteredFFT = conv2fft (g, CICcut, 'same');
toc
tic
filteredN   = convn (g, CICcut, 'same');
toc

% display
figure('units', 'normalized', 'outerposition', [0 0.25 1 0.5])
subplot 151, imshow (CICcut, []);      title ('Mass density')
subplot 152, imshow (g, []);           title ('Green`s function')
subplot 153, imshow (filtered, []);    title ({'Gravitational potential' 'imfilter'})
subplot 154, imshow (filteredFFT, []); title ({'Gravitational potential' 'conv2fft'})
subplot 155, imshow (filteredN, []);   title ({'Gravitational potential' 'convn'})

Best regards, Alex

By: Yair Altman

Yair Altman — Tue, 06 Sep 2016 07:12:22 +0000

In reply to Jackie Shan. @Jackie - I believe this is due to a sub-optimal implementation. MathWorks has limited engineering resources and probably decided that 2D convolution is much more common than 3D. I assume that MathWorks focused its engineers on improving the performance of the 2D case and then moved on to more pressing matters, instead of also solving the harder and less-used 3D case. In a world with limited resources this is certainly understandable.

By: Jackie Shan

Jackie Shan — Mon, 05 Sep 2016 22:27:33 +0000

When looking at the CPU utilization, I noticed that the ND convolution function (convn) does not use multiple cores when operating on greater than 2D arrays.

A=randn(500,500);
B=randn(500,500);
C=convn(A,B,'same'); % all 12 CPUs are utilized

A=randn(500,50,10);
B=randn(500,50,10);
C=convn(A,B,'same'); % only 1 CPU is utilized

I was wondering if there's any reason for this?

By: Yair Altman

Yair Altman — Sun, 28 Feb 2016 19:00:34 +0000

In reply to Alex. @Alex - you can take a look at the m-code within Bruno's convnfft utility for this. The speedup depends on several factors, including the size of the data, the Matlab release, and your available memory. So it is quite possible that on your specific system with your specific data you do not see significant speedup, but in many cases Bruno's convnfft does improve the processing speed.

By: Alex

Alex — Sun, 28 Feb 2016 17:22:21 +0000

Could you please provide a code for 2D version? In case of linked .mex is not working any faster than standard convolution.