Comments on: rmfield performance

By: Yair Altman

Yair Altman — Thu, 16 Jan 2025 15:29:56 +0000

In reply to tommsch. I agree, you have a good point :-)

By: tommsch

tommsch — Thu, 16 Jan 2025 14:30:49 +0000

I suggest the name `rmfield_fast`, because after some time one usually forgot that there exists a fast version of `rmfield`. But, if “fast” is appended at the end, then the intellisense-thingy of Matlab will show you the function `rmfield_fast` whenever you type `rmfield`.

By: Hoi Wong

Hoi Wong — Mon, 30 Jan 2017 19:32:09 +0000

Thanks for pointing that out. I didn’t even notice (or expect) that rmfield() is not a built-in low level function.

Since the rmfield() code calls struct2cell() then cell2struct(), it looks like it’s saying that behind the scene, struct() is basically a high level wrapper around cells using hash keys to map the name to indices: a useful piece of information to keep in mind for performance tuning. Actually, table() or dataset() object deals with cells under the hood, I just wasn’t expecting struct() to be the same given its origins in C.

I found rmfield() so slow that I’ve actually written a keepField() long time ago for the exact same reason as your application scenario: if I need to remove 5000 fields, I might as well keep what I want by adding to an empty (fieldless) struct one-field at a time. i.e.

for k=1:length(fieldsToKeep)
  Y.(fieldsToKeep{k}) = X.(fieldsToKeep{k});
end

It turned out to be much faster too because there are no names to search for. Dynamic field names are done with hash table (checked with TMW, it’s not documented), so it’s on average O(1) time. It boils down to the same O(nlog(n)) time as the setdiff() proposed if you ultimately have to identify the fields to remove instead.

Unfortunately MATLAB has only rmfield(), so I suspect a lot of people might have done a set-op (spent O(nlog(n)) time) to get the list to keep (the complementary set to remove) then run through the O(n) algorithm in rmfield() when they could have done it in average O(1) time by just transferring the wanted fields.

By: Malcolm Lidierth

Malcolm Lidierth — Mon, 06 Jun 2016 08:34:48 +0000

I found this too with a package that was heavily profiled up to R2012a. Maybe things have changed as JIT acceleration has improved but there were two 'tricks' I used often. Conditional statements were frequently the bottleneck, but served little purpose in the specific context e.g.

if ~isa(x,'double')
   x=double(x);
end

could often safely be replaced with

x=double(x);

Also, a try-catch sequence in place of conditional tests was often faster. ML

By: Fernando

Fernando — Sun, 05 Jun 2016 18:36:37 +0000

This is really cool. I remember years ago having to accelerate an algorithm that used Matlab’s bultin kronecker tensor product. Luckily, I was able to find this: http://www.mathworks.com/matlabcentral/fileexchange/23606-fast-and-efficient-kronecker-multiplication

By: Yair Altman

Yair Altman — Thu, 26 May 2016 10:02:20 +0000

In reply to Peter. @Peter - excellent usage example. Thanks for sharing.

By: Peter

Peter — Thu, 26 May 2016 00:08:37 +0000

I recently wrote a function that needed to calculate many many thousands of dot products. When I profiled my function, it was spending a ton of time in the dot function. When I opened dot.m, it was mostly sanity checks I didn't need, so I just inlined the dot calculation. The function went from minutes to about a second to complete.