- Undocumented Matlab - https://undocumentedmatlab.com -

String/char compatibility

Posted By Yair Altman On June 28, 2018 | 18 Comments

In numerous functions that I wrote over the years, some input arguments were expected to be strings in the old sense, i.e. char arrays for example, 'on' or 'off'. Matlab release R2016b introduced the concept of string objects [1], which can be created using the string function or [starting in R2017a] double quotes ("on").
The problem is that I have numerous functions that supported the old char-based strings but not the new string objects. If someone tries to enter a string object ("on") as input to a function that expects a char-array ('on'), in many cases Matlab will error. This by itself is very unfortunate – I would have liked everything to be fully backward-compatible. But unfortunately this is not the case: MathWorks did invest effort in making the new strings backward-compatible to some degree (for example, graphic object property names/values and many internal functions that now accept either form as input). However, backward compatibility of strings is not 100% perfect.
In such cases, the only solution is to make the function accept both forms (char-arrays and string objects), for example, by type-casting all such inputs as char-arrays using the builtin char function. If we do this at the top of our function, then the rest of the function can remain unchanged. For example:

function test(stage)
   if isa(stage,'string')
      stage = char(stage);
   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays
   switch stage
      case 'stage 1', ...
      case 'stage 2', ...

That was simple enough. But what if our function expects complex inputs (cell-arrays, structs etc.) that may contain strings in only some of their cells/fields?
Luckily, Matlab contains an internal utility function that can help us: controllib.internal.util.hString2Char. This function, whose Matlab source-code is available (%matlabroot%/toolbox/shared/controllib/general/+controllib/+internal/+util/hString2Char.m) recursively scans the input value and converts any string object into the corresponding char-array, leaving all other data-types unchanged. For example:

>> controllib.internal.util.hString2Char({123, 'char-array', "a string"})
ans =
  1×3 cell array
    {[123]}    {'char-array'}    {'a string'}
>> controllib.internal.util.hString2Char(struct('a',"another string", 'b',pi))
ans =
  struct with fields:
    a: 'another string'
    b: 3.14159265358979

In order to keep our code working not just on recent releases (that support strings and controllib.internal.util.hString2Char) but also on older Matlab releases (where they did not exist), we simply wrap the call to hString2Char within a trycatch block. The adaptation of our function might then look as follows:

function test(varargin)
   try varargin = controllib.internal.util.hString2Char(varargin); catch, end
   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays

Note that controllib.internal.util.hString2Char is a semi-documented function: it contains a readable internal help section (accessible via help controllib.internal.util.hString2Char), but not a doc-page. Nor is this function mentioned anywhere in Matlab’s official documentation. I think that this is a pity, because it’s such a useful little helper function.

Categories: Low risk of breaking in future versions, Semi-documented function

18 Comments (Open | Close)

18 Comments To "String/char compatibility"

#1 Comment By Sam Roberts On June 28, 2018 @ 19:04

Have you seen the functions convertStringsToChars and convertCharsToStrings? I think they do something pretty close to what you’re describing here, and have been available in base MATLAB since R2017b.
Not _exactly_ what you’re describing, but close enough that they’re worth a mention, at least.

#2 Comment By Yair Altman On June 28, 2018 @ 19:27

@Sam – thanks for your note. I agree that convertStringsToChars is worth a mention, but in my opinion its only advantage over controllib.internal.util.hString2Char is the fact that it’s documented. In all other aspects I think that hString2Char is superior: it is recursive; it runs on fields of struct arrays and on cell arrays (recursively!); it returns '' (rather than 0×0 empty char array) for ""; it is shorter (although it does a lot more!); it works on R2016b(?) and R2017a; and it does not demand that you specify the output variables:

>> convertStringsToChars("on")
Error using convertStringsToChars (line 51)
The number of outputs must match the number of inputs. 

>> controllib.internal.util.hString2Char("on")
ans =

I have no idea why MathWorks chose to document and support the inferior convertStringsToChars, rather than the more powerful controllib.internal.util.hString2Char. Moreover, I don’t understand why convertStringsToChars was even developed, since hString2Char already existed at that time.

#3 Comment By Alex Churchill On February 8, 2021 @ 20:14

I was looking up how to do this exact task today. I was about to hold my nose and use the internal controllib function, when I happened to chance across a slightly newer MathWorks function, convertContainedStringsToChars: [8]

That function recurses in just the way I want it to, and it meets several of your other benefits: it works recursively inside cell arrays and struct fields, and it works with ans without needing an output variable. It does still turn "" into a 0x0 empty char array, but that’s fine for my use case.

#4 Comment By Yair Altman On February 8, 2021 @ 20:41

@Alex – thanks, but keep in mind that new functions will only work on the recent Matlab releases. If your code needs to work on older Matlab releases, you could revert to the previous “hold-your-nose” undocumented functionality. This can easily be achieved using a simple try-catch block:

    % use documented functions that only run on very recent Matlab releases
    % use undocumented functions that work on older Matlab releases

#5 Comment By Marshall Crumiller On June 28, 2018 @ 21:15

I’ve been hanging on to cell arrays of character vectors because in R2016-2017 they were still quite a bit faster, but it looks like in the most recent release (R2018a) they’ve definitely sped things up quite a bit:

% generate 1e5 array of random strings sized 0-100
str_siz = [1e5 1];
cell_char = arrayfun(@(x) char(uint16(rand(1,x)*42+48)), round(rand(str_siz)*100),'UniformOutput',false);

% make a String array version
array_string = string(cell_char);

>> whos cell_char array_string
  Name                   Size               Bytes  Class     Attributes

  array_string      100000x1             15503656  string              
  cell_char         100000x1             21192102  cell

>> tic; save('test.mat','cell_char'); toc
Elapsed time is 6.074972 seconds.

>> tic; save('test2.mat','array_string'); toc
Elapsed time is 0.264601 seconds.

I wonder if they have some fancy compression for storing Strings in .mat files. The ‘-nocompression’ does nothing, but I’m guessing that’s simply not using HDF5’s zlib compression. I’m going to have to do some porting, I have some big databases of character arrays…

#6 Comment By Marshall Crumiller On June 28, 2018 @ 21:17

Sorry, I had written arrows like \<– that probably needed an escape character. Those save() commands shouldn't have been commented.

#7 Comment By Yair Altman On June 28, 2018 @ 21:37

I fixed your comment above accordingly

#8 Comment By David On June 28, 2018 @ 21:17

Surely verLessThan check would be more elegant than the try/catch.

#9 Comment By Yair Altman On June 28, 2018 @ 21:36

@David – I disagree: quite a few people still use older Matlab releases which do not have verLessThan. You need to take such things into consideration when you’re writing code that aught to work for as many Matlab releases as possible.

I believe that try/catch are extremely under-rated. As long as you clearly comment what you’re doing and why you’re using them, they can be both effective, readable and fast.

#10 Comment By David On June 28, 2018 @ 21:56

Yes I’ve seen you like try/catch Yair. All too often I see it used by people as a lazy way out so don’t like to see it when there are other ways. If you know why something will fail then you should write a proper check in my opinion. Options other than verLessThan exist if people are unfortunate enough to need to be running in R2006b or older as well as R2016b or newer.

#11 Comment By Siyi Deng On June 29, 2018 @ 01:21

It’s Easier To Ask Forgiveness Than To Get Permission.

#12 Comment By Yair Altman On June 29, 2018 @ 14:25

David – I can understand where you’re coming from, but my experience with the code that I write is different. Consider the following code snippet for example:

% Version 1:
if someCondition()
   data = doSomethingWith(data);
   % do nothing in this specific case because...

% Version 2:
   data = doSomethingWith(data);
   % do nothing in this specific case because...

In this case, Version 1 might fail to do the expected thing if you have a problem (bug or exception) in either someCondition() or doSomethingWith(). On the other hand, Version 2 will, in the worst case, keep data unchanged and then proceed with the rest of the downstream processing. For many (but certainly not all) use-cases this would be better than erroring-out on an exception, as in Version 1. Version 2 is also faster than Version 1 due to the avoided condition checks (some checks could take a noticeable time to execute).

Perhaps the question is a matter of expertise level: just as inexperienced drivers should be much more careful in their driving than experienced ones, so too should inexperienced programmers be more careful in their programming while experienced developers have more leeway.

#13 Comment By David On June 29, 2018 @ 17:03

You might want to rethink that statement about version 2 being faster. It’s only slightly faster in the new release where you are effectively doing a needless check. However, it’s a lot slower (total time in my example) in the old release where the error will occur inside the try/catch. If you are trying to support old releases as well then the try/catch could have just introduced a big performance degradation. Obviously all comes down to usecase.

%% Version 1
nThings = 1e4;
for iThing = 1:nThings
    if verLessThan('matlab', '9.1')
        % nothing to do in old release

%% Version 2
for iThing = 1:nThings
        % try to run function in old release which will error because it doesn't exist

#14 Comment By Sam Roberts On June 29, 2018 @ 19:22

Yair, note that there are cases where your doSomethingWith will itself slow down, just by virtue of being inside a try/catch block – for example, if it contains any in-place optimizations (which are disabled when run inside a try/catch block).

#15 Comment By Yair Altman On July 1, 2018 @ 17:06

Sam & David – I don’t want to pounce on the performance issue too much, after all my main point was about robustness and compatibility, not performance (which indeed is use-case dependent). But just for the record, please read [9] about how repeated use of verLessThan turned out to be a performance bottleneck in his code.

#16 Comment By Gelth On July 6, 2018 @ 12:16

1) Just precision : I just try the command controllib.internal.util.hString2Char(“”) (with matlab 2017b) and the result was 0×0 empty char array, not .
By looking the M-code, we see ” is reserved for empty string, BUT, “” is scalar String and then the code return char(“”), not ”.

2) Since I have problem with tcpip command (thread/ async pbwith big data stream to transfert over tcp) I look at the source code fo tcpip command. And I discover this command :

Ok, it is in Instrument Toolbox, but if you have it, it seems best (more clear code than hString2Char because use of switch/case instead of if/then/else). Perhaps it should be a good option too.

#17 Comment By Henry W.H. On January 16, 2019 @ 10:58

Hi, Yair, I read your blogs a lot. Thanks for sharing all the useful skills and fun tricks.

Regarding the difference between "string" and 'char', which did annoy me sometimes. Until recent days, I found that Matlab itself solved this issue by providing the convertStringsToChars and convertContainedStringsToChars functions. This page tells the details: [10]

It helps me, so I ‘d like to share it with you. Hope it will interest you.

#18 Comment By Yair Altman On January 16, 2019 @ 11:24

@Henry – read my response to [11], where I explain the benefits of controllib.internal.util.hString2Char over convertStringsToChars.

Article printed from Undocumented Matlab: https://undocumentedmatlab.com

URL to article: https://undocumentedmatlab.com/articles/string-char-compatibility

URLs in this post:

[1] string objects: https://www.mathworks.com/help/matlab/characters-and-strings.html

[2] Sliders in Matlab GUI – part 2 : https://undocumentedmatlab.com/articles/sliders-in-matlab-gui-part-2

[3] Converting Java vectors to Matlab arrays : https://undocumentedmatlab.com/articles/converting-java-vectors-to-matlab-arrays

[4] Types of undocumented Matlab aspects : https://undocumentedmatlab.com/articles/types-of-undocumented-matlab-aspects

[5] ismembc – undocumented helper function : https://undocumentedmatlab.com/articles/ismembc-undocumented-helper-function

[6] Bug and workaround in timeseries plot : https://undocumentedmatlab.com/articles/bug-and-workaround-in-timeseries-plot

[7] The hgfeval function : https://undocumentedmatlab.com/articles/hgfeval

[8] : https://www.mathworks.com/help/matlab/ref/convertcontainedstringstochars.html

[9] : https://undocumentedmatlab.com/blog/speeding-up-builtin-matlab-functions-part-1#comment-424394

[10] : https://www.mathworks.com/help/matlab/matlab_prog/update-your-code-to-accept-strings.html

[11] : https://undocumentedmatlab.com/blog/string-char-compatibility#comment-431274

Copyright © Yair Altman - Undocumented Matlab. All rights reserved.