Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Convolution performance

February 3, 2016 6 Comments

MathWorks’ latest MATLAB Digest (January 2016) featured my book “Accelerating MATLAB Performance“. I am deeply honored and appreciative of MathWorks for this.
Accelerating MATLAB Performance bookI would like to dedicate today’s post to a not-well-known performance trick from my book, that could significantly improve the speed when computing the convolution of two data arrays. Matlab’s internal implementation of convolution (conv, conv2 and convn) appears to rely on a sliding window approach, using implicit (internal) multithreading for speed.
However, this can often be sped up significantly if we use the Convolution Theorem, which states in essence that conv(a,b) = ifft(fft(a,N) .* fft(b,N)), an idea proposed by Bruno Luong. In the following usage example we need to remember to zero-pad the data to get comparable results:

% Prepare the input vectors (1M elements each)
x = rand(1e6,1);
y = rand(1e6,1);
% Compute the convolution using the builtin conv()
tic, z1 = conv(x,y); toc
 => Elapsed time is 360.521187 seconds.
% Now compute the convolution using fft/ifft: 780x faster!
n = length(x) + length(y) - 1;  % we need to zero-pad
tic, z2 = ifft(fft(x,n) .* fft(y,n)); toc
 => Elapsed time is 0.463169 seconds.
% Compare the relative accuracy (the results are nearly identical)
disp(max(abs(z1-z2)./abs(z1)))
 => 2.75200348450538e-10

% Prepare the input vectors (1M elements each) x = rand(1e6,1); y = rand(1e6,1); % Compute the convolution using the builtin conv() tic, z1 = conv(x,y); toc => Elapsed time is 360.521187 seconds. % Now compute the convolution using fft/ifft: 780x faster! n = length(x) + length(y) - 1; % we need to zero-pad tic, z2 = ifft(fft(x,n) .* fft(y,n)); toc => Elapsed time is 0.463169 seconds. % Compare the relative accuracy (the results are nearly identical) disp(max(abs(z1-z2)./abs(z1))) => 2.75200348450538e-10

This latest result shows that the results are nearly identical, up to a tiny difference, which is certainly acceptable in most cases when considering the enormous performance speedup (780x in this specific case). Bruno’s implementation (convnfft) is made even more efficient by using MEX in-place data multiplications, power-of-2 FFTs, and use of GPU/Jacket where available.
It should be noted that the builtin Matlab functions can still be faster for relatively small data arrays, or if your machine has a large number of CPU cores and free memory that Matlab’s builtin conv* functions can utilize, and of course also depending on the Matlab release. So, your mileage might well vary. But given the significant speedup potential, I contend that you should give it a try and see how well it performs on your specific system and data.
If you have read my book, please be kind enough to post your feedback about it on Amazon (link), for the benefit of others. Thanks in advance!

Related posts:

  1. Performance: scatter vs. line – In many circumstances, the line function can generate visually-identical plots as the scatter function, much faster...
  2. Performance: accessing handle properties – Handle object property access (get/set) performance can be significantly improved using dot-notation. ...
  3. rmfield performance – The performance of the builtin rmfield function (as with many other builtin functions) can be improved by simple profiling. ...
  4. Zero-testing performance – Subtle changes in the way that we test for zero/non-zero entries in Matlab can have a significant performance impact. ...
  5. Matrix processing performance – Matrix operations performance is affected by internal subscriptions in a counter-intuitive way....
  6. Preallocation performance – Preallocation is a standard Matlab speedup technique. Still, it has several undocumented aspects. ...
Book Performance Pure Matlab Undocumented feature
Print Print
« Previous
Next »
6 Responses
  1. Alex February 28, 2016 at 19:22 Reply

    Could you please provide a code for 2D version? In case of linked .mex is not working any faster than standard convolution.

    • Yair Altman February 28, 2016 at 21:00 Reply

      @Alex – you can take a look at the m-code within Bruno’s convnfft utility for this. The speedup depends on several factors, including the size of the data, the Matlab release, and your available memory. So it is quite possible that on your specific system with your specific data you do not see significant speedup, but in many cases Bruno’s convnfft does improve the processing speed.

    • Alex September 30, 2017 at 16:50 Reply

      Hello,

      I am having a problem trying to do FFT-based convolution in 2D. convnfft is definitely the fastest one, but only imfilter produces a valid result. For convnfft and convn the result is wrong, as can be seen in the minimal working example below:

      % generate image
      len = 2^10;
      CICcut = zeros (len);
      CICcut = imnoise (CICcut, 'salt & pepper', 0.0001);
      CICcut = CICcut.*(rand(len)).^2;
      gauss = fspecial('gaussian', round(sqrt(len)), sqrt(sqrt(len)));
      CICcut = imfilter (CICcut, gauss, 'replicate', 'conv');
       
      % generate kernel
      g = zeros(len);
      lenMone = len-1;
      for i = 1:len
          for j = 1:len
              g(i, j) = ((i-1)/lenMone - 0.5)^2 + ((j-1)/lenMone - 0.5)^2;
          end
      end
      g = -log(sqrt(g));
       
      % convolution
      tic
      filtered    = imfilter (g, CICcut, 'replicate', 'conv');
      toc
      tic
      filteredFFT = conv2fft (g, CICcut, 'same');
      toc
      tic
      filteredN   = convn (g, CICcut, 'same');
      toc
       
      % display
      figure('units', 'normalized', 'outerposition', [0 0.25 1 0.5])
      subplot 151, imshow (CICcut, []);      title ('Mass density')
      subplot 152, imshow (g, []);           title ('Green`s function')
      subplot 153, imshow (filtered, []);    title ({'Gravitational potential' 'imfilter'})
      subplot 154, imshow (filteredFFT, []); title ({'Gravitational potential' 'conv2fft'})
      subplot 155, imshow (filteredN, []);   title ({'Gravitational potential' 'convn'})

      % generate image len = 2^10; CICcut = zeros (len); CICcut = imnoise (CICcut, 'salt & pepper', 0.0001); CICcut = CICcut.*(rand(len)).^2; gauss = fspecial('gaussian', round(sqrt(len)), sqrt(sqrt(len))); CICcut = imfilter (CICcut, gauss, 'replicate', 'conv'); % generate kernel g = zeros(len); lenMone = len-1; for i = 1:len for j = 1:len g(i, j) = ((i-1)/lenMone - 0.5)^2 + ((j-1)/lenMone - 0.5)^2; end end g = -log(sqrt(g)); % convolution tic filtered = imfilter (g, CICcut, 'replicate', 'conv'); toc tic filteredFFT = conv2fft (g, CICcut, 'same'); toc tic filteredN = convn (g, CICcut, 'same'); toc % display figure('units', 'normalized', 'outerposition', [0 0.25 1 0.5]) subplot 151, imshow (CICcut, []); title ('Mass density') subplot 152, imshow (g, []); title ('Green`s function') subplot 153, imshow (filtered, []); title ({'Gravitational potential' 'imfilter'}) subplot 154, imshow (filteredFFT, []); title ({'Gravitational potential' 'conv2fft'}) subplot 155, imshow (filteredN, []); title ({'Gravitational potential' 'convn'})

      Best regards,
      Alex

      • Yair Altman October 1, 2017 at 17:42

        @Alex – this is due to your use of the optional 'replicate' option in your call to imfilter. You are not doing the same with conv2fft or convn, which causes the results to look different. Border-pixels replication is especially important in cases such as yours where the kernel size is the same size as the input image;

        If you remove the 'replicate' option in your call to imfilter, you will see that the results look the same (to the naked eye at least…).

        If you want to use conv2fft or convn rather than the slow imfilter, and yet you still want to see a nice-looking image, then you should either reduce the kernel size, or enlarge the input image (so that the original image is at its center) and take care of the boundary pixels. You can either do it the same way as the 'replicate' option, or in a different way. For example, here is a simple implementation that at least in my eyes gives superior results even compared to imfilter:

        c2 = repmat(CICcut,3,3);  % c2 is 3072x3072, CICcut is 1024x1024
        filteredN = convn (g, c2, 'same');
        subplot 155, imshow (filteredN, []);   title ({'Gravitational potential' 'convn'})

        c2 = repmat(CICcut,3,3); % c2 is 3072x3072, CICcut is 1024x1024 filteredN = convn (g, c2, 'same'); subplot 155, imshow (filteredN, []); title ({'Gravitational potential' 'convn'})

  2. Jackie Shan September 6, 2016 at 01:27 Reply

    When looking at the CPU utilization, I noticed that the ND convolution function (convn) does not use multiple cores when operating on greater than 2D arrays.

    A=randn(500,500);
    B=randn(500,500);
    C=convn(A,B,'same'); % all 12 CPUs are utilized
     
    A=randn(500,50,10);
    B=randn(500,50,10);
    C=convn(A,B,'same'); % only 1 CPU is utilized

    A=randn(500,500); B=randn(500,500); C=convn(A,B,'same'); % all 12 CPUs are utilized A=randn(500,50,10); B=randn(500,50,10); C=convn(A,B,'same'); % only 1 CPU is utilized

    I was wondering if there’s any reason for this?

    • Yair Altman September 6, 2016 at 10:12 Reply

      @Jackie – I believe this is due to a sub-optimal implementation. MathWorks has limited engineering resources and probably decided that 2D convolution is much more common than 3D. I assume that MathWorks focused its engineers on improving the performance of the 2D case and then moved on to more pressing matters, instead of also solving the harder and less-used 3D case. In a world with limited resources this is certainly understandable.

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (email)
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
ActiveX (6) AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
  • Marcel (9 days 12 hours ago): Hi, I am trying to set the legend to Static, but this command seems not to work in R2022a anymore: set(gca,’LegendColorbarL isteners’,[]); Any ideas? THANKS / marcel
  • Gres (9 days 16 hours ago): In 2018b, you can get the icons by calling [hh,icons,plots,txt] = legend({‘Line 1’});
  • Yair Altman (11 days 11 hours ago): @Mitchell – in most cases the user wants a single string identifier for the computer, that uniquely identifies it with a distinct fingerprint that is different from any...
  • Mitchell (11 days 20 hours ago): Great post! I’m not very familiar with the network interfaces being referenced here, but it seems like the java-based cross-platform method concatenates all network...
  • Yair Altman (14 days 13 hours ago): Dani – You can use jViewport.setViewPosition(java .awt.Point(0,0)) as I showed in earlier comments here
  • dani (15 days 9 hours ago): hi!! how i can set the horizontal scrollbar to the leftside when appearing! now it set to right side of text
  • Yair Altman (24 days 5 hours ago): Dom – call drawnow *just before* you set hLine.MarkerHandle.FaceColorTy pe to 'truecoloralpha'. Also, you made a typo in your code: it’s truecoloralpha, not...
  • Dom (25 days 4 hours ago): Yair I have tried your code with trucoloralpha and the markers do not appear transparent in R2021b, same as for Oliver.
  • Yair Altman (28 days 11 hours ago): Ren – This is usually the expected behavior, which avoids unnecessary duplications of the Excel process in CPU/memory. If you want to kill the process you can always run...
  • Yair Altman (29 days 1 hour ago): When you use plot() without hold(‘on’), each new plot() clears the axes and draws a new line, so your second plot() of p2 caused the first plot() line (p1) to be...
  • Cesim Dumlu (35 days 9 hours ago): Hello. I am trying to do a gradient plot for multiple functions to be displayed on the same axes and each one is colorcoded by respective colordata, using the same scaling. The...
  • Yair Altman (43 days 12 hours ago): @Veronica – you are using the new version of uitree, which uses HTML-based uifigures, and my post was about the Java-based uitree which uses legacy Matlab figures. For...
  • Veronica Taurino (43 days 12 hours ago): >> [txt1,txt2] ans = ‘abrakadabra’
  • Veronica Taurino (43 days 12 hours ago): Hello, I am just trying to change the uitree node name as you suggested: txt1 = 'abra'; txt2 = 'kadabra'; node.setName([txt1,txt2]); >> "Unrecognized method, property, or...
  • Yair Altman (46 days 12 hours ago): The version of JGraph that you downloaded uses a newer version of Java (11) than the one that Matlab supports (8). You need to either (1) find an earlier version of JGraph that...
Contact us
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top