In numerous functions that I wrote over the years, some input arguments were expected to be strings in the old sense, i.e. char arrays for example, 'on'
or 'off'
. Matlab release R2016b introduced the concept of string objects, which can be created using the string function or [starting in R2017a] double quotes ("on"
).
The problem is that I have numerous functions that supported the old char-based strings but not the new string objects. If someone tries to enter a string object ("on"
) as input to a function that expects a char-array ('on'
), in many cases Matlab will error. This by itself is very unfortunate – I would have liked everything to be fully backward-compatible. But unfortunately this is not the case: MathWorks did invest effort in making the new strings backward-compatible to some degree (for example, graphic object property names/values and many internal functions that now accept either form as input). However, backward compatibility of strings is not 100% perfect.
In such cases, the only solution is to make the function accept both forms (char-arrays and string objects), for example, by type-casting all such inputs as char-arrays using the builtin char function. If we do this at the top of our function, then the rest of the function can remain unchanged. For example:
function test(stage) if isa(stage,'string') stage = char(stage); end % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays switch stage case 'stage 1', ... case 'stage 2', ... ... end end |
That was simple enough. But what if our function expects complex inputs (cell-arrays, structs etc.) that may contain strings in only some of their cells/fields?
Luckily, Matlab contains an internal utility function that can help us: controllib.internal.util.hString2Char. This function, whose Matlab source-code is available (%matlabroot%/toolbox/shared/controllib/general/+controllib/+internal/+util/hString2Char.m) recursively scans the input value and converts any string object into the corresponding char-array, leaving all other data-types unchanged. For example:
>> controllib.internal.util.hString2Char({123, 'char-array', "a string"}) ans = 1×3 cell array {[123]} {'char-array'} {'a string'} >> controllib.internal.util.hString2Char(struct('a',"another string", 'b',pi)) ans = struct with fields: a: 'another string' b: 3.14159265358979 |
In order to keep our code working not just on recent releases (that support strings and controllib.internal.util.hString2Char) but also on older Matlab releases (where they did not exist), we simply wrap the call to hString2Char within a try–catch block. The adaptation of our function might then look as follows:
function test(varargin) try varargin = controllib.internal.util.hString2Char(varargin); catch, end % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays ... end |
Note that controllib.internal.util.hString2Char is a semi-documented function: it contains a readable internal help section (accessible via help controllib.internal.util.hString2Char
), but not a doc-page. Nor is this function mentioned anywhere in Matlab’s official documentation. I think that this is a pity, because it’s such a useful little helper function.
Have you seen the functions convertStringsToChars and convertCharsToStrings? I think they do something pretty close to what you’re describing here, and have been available in base MATLAB since R2017b.
Not _exactly_ what you’re describing, but close enough that they’re worth a mention, at least.
@Sam – thanks for your note. I agree that convertStringsToChars is worth a mention, but in my opinion its only advantage over controllib.internal.util.hString2Char is the fact that it’s documented. In all other aspects I think that
hString2Char
is superior: it is recursive; it runs on fields of struct arrays and on cell arrays (recursively!); it returns''
(rather than 0×0 empty char array) for""
; it is shorter (although it does a lot more!); it works on R2016b(?) and R2017a; and it does not demand that you specify the output variables:I have no idea why MathWorks chose to document and support the inferior convertStringsToChars, rather than the more powerful controllib.internal.util.hString2Char. Moreover, I don’t understand why convertStringsToChars was even developed, since
hString2Char
already existed at that time.I was looking up how to do this exact task today. I was about to hold my nose and use the internal controllib function, when I happened to chance across a slightly newer MathWorks function, convertContainedStringsToChars: https://www.mathworks.com/help/matlab/ref/convertcontainedstringstochars.html
That function recurses in just the way I want it to, and it meets several of your other benefits: it works recursively inside cell arrays and struct fields, and it works with
ans
without needing an output variable. It does still turn""
into a 0x0 empty char array, but that’s fine for my use case.@Alex – thanks, but keep in mind that new functions will only work on the recent Matlab releases. If your code needs to work on older Matlab releases, you could revert to the previous “hold-your-nose” undocumented functionality. This can easily be achieved using a simple try-catch block:
I’ve been hanging on to cell arrays of character vectors because in R2016-2017 they were still quite a bit faster, but it looks like in the most recent release (R2018a) they’ve definitely sped things up quite a bit:
I wonder if they have some fancy compression for storing Strings in .mat files. The ‘-nocompression’ does nothing, but I’m guessing that’s simply not using HDF5’s zlib compression. I’m going to have to do some porting, I have some big databases of character arrays…
Sorry, I had written arrows like \<– that probably needed an escape character. Those save() commands shouldn't have been commented.
I fixed your comment above accordingly
Surely verLessThan check would be more elegant than the try/catch.
@David – I disagree: quite a few people still use older Matlab releases which do not have verLessThan. You need to take such things into consideration when you’re writing code that aught to work for as many Matlab releases as possible.
I believe that try/catch are extremely under-rated. As long as you clearly comment what you’re doing and why you’re using them, they can be both effective, readable and fast.
Yes I’ve seen you like try/catch Yair. All too often I see it used by people as a lazy way out so don’t like to see it when there are other ways. If you know why something will fail then you should write a proper check in my opinion. Options other than verLessThan exist if people are unfortunate enough to need to be running in R2006b or older as well as R2016b or newer.
It’s Easier To Ask Forgiveness Than To Get Permission.
David – I can understand where you’re coming from, but my experience with the code that I write is different. Consider the following code snippet for example:
In this case, Version 1 might fail to do the expected thing if you have a problem (bug or exception) in either
someCondition()
ordoSomethingWith()
. On the other hand, Version 2 will, in the worst case, keepdata
unchanged and then proceed with the rest of the downstream processing. For many (but certainly not all) use-cases this would be better than erroring-out on an exception, as in Version 1. Version 2 is also faster than Version 1 due to the avoided condition checks (some checks could take a noticeable time to execute).Perhaps the question is a matter of expertise level: just as inexperienced drivers should be much more careful in their driving than experienced ones, so too should inexperienced programmers be more careful in their programming while experienced developers have more leeway.
You might want to rethink that statement about version 2 being faster. It’s only slightly faster in the new release where you are effectively doing a needless check. However, it’s a lot slower (total time in my example) in the old release where the error will occur inside the try/catch. If you are trying to support old releases as well then the try/catch could have just introduced a big performance degradation. Obviously all comes down to usecase.
Yair, note that there are cases where your
doSomethingWith
will itself slow down, just by virtue of being inside atry/catch
block – for example, if it contains any in-place optimizations (which are disabled when run inside a try/catch block).Sam & David – I don’t want to pounce on the performance issue too much, after all my main point was about robustness and compatibility, not performance (which indeed is use-case dependent). But just for the record, please read Hanan’s recent comment about how repeated use of verLessThan turned out to be a performance bottleneck in his code.
@Yair,
1) Just precision : I just try the command controllib.internal.util.hString2Char(“”) (with matlab 2017b) and the result was 0×0 empty char array, not ”.
By looking the M-code, we see ” is reserved for empty string, BUT, “” is scalar String and then the code return char(“”), not ”.
2) Since I have problem with tcpip command (thread/ async pbwith big data stream to transfert over tcp) I look at the source code fo tcpip command. And I discover this command :
instrument.internal.stringConversionHelpers.str2char
Ok, it is in Instrument Toolbox, but if you have it, it seems best (more clear code than hString2Char because use of switch/case instead of if/then/else). Perhaps it should be a good option too.
Hi, Yair, I read your blogs a lot. Thanks for sharing all the useful skills and fun tricks.
Regarding the difference between
"string"
and'char'
, which did annoy me sometimes. Until recent days, I found that Matlab itself solved this issue by providing the convertStringsToChars and convertContainedStringsToChars functions. This page tells the details: Update Your Code to Accept StringsIt helps me, so I ‘d like to share it with you. Hope it will interest you.
@Henry – read my response to Sam’s comment above, where I explain the benefits of controllib.internal.util.hString2Char over convertStringsToChars.