String/char compatibility

In numerous functions that I wrote over the years, some input arguments were expected to be strings in the old sense, i.e. char arrays for example, 'on' or 'off'. Matlab release R2016b introduced the concept of string objects, which can be created using the string function or [starting in R2017a] double quotes ("on").

The problem is that I have numerous functions that supported the old char-based strings but not the new string objects. If someone tries to enter a string object ("on") as input to a function that expects a char-array ('on'), in many cases Matlab will error. This by itself is very unfortunate – I would have liked everything to be fully backward-compatible. But unfortunately this is not the case: MathWorks did invest effort in making the new strings backward-compatible to some degree (for example, graphic object property names/values and many internal functions that now accept either form as input). However, backward compatibility of strings is not 100% perfect.

In such cases, the only solution is to make the function accept both forms (char-arrays and string objects), for example, by type-casting all such inputs as char-arrays using the builtin char function. If we do this at the top of our function, then the rest of the function can remain unchanged. For example:

function test(stage)
   if isa(stage,'string')      stage = char(stage);   end   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays
 
   switch stage
      case 'stage 1', ...
      case 'stage 2', ...
      ...
   end
end

That was simple enough. But what if our function expects complex inputs (cell-arrays, structs etc.) that may contain strings in only some of their cells/fields?

Luckily, Matlab contains an internal utility function that can help us: controllib.internal.util.hString2Char. This function, whose Matlab source-code is available (%matlabroot%/toolbox/shared/controllib/general/+controllib/+internal/+util/hString2Char.m) recursively scans the input value and converts any string object into the corresponding char-array, leaving all other data-types unchanged. For example:

>> controllib.internal.util.hString2Char({123, 'char-array', "a string"})
ans =
  1×3 cell array
    {[123]}    {'char-array'}    {'a string'}
 
>> controllib.internal.util.hString2Char(struct('a',"another string", 'b',pi))
ans = 
  struct with fields:
    a: 'another string'
    b: 3.14159265358979

In order to keep our code working not just on recent releases (that support strings and controllib.internal.util.hString2Char) but also on older Matlab releases (where they did not exist), we simply wrap the call to hString2Char within a trycatch block. The adaptation of our function might then look as follows:

function test(varargin)
   try varargin = controllib.internal.util.hString2Char(varargin); catch, end   % from this point onward, we don't need to worry about string inputs - any such strings will become plain-ol' char-arrays
   ...
end

Note that controllib.internal.util.hString2Char is a semi-documented function: it contains a readable internal help section (accessible via help controllib.internal.util.hString2Char), but not a doc-page. Nor is this function mentioned anywhere in Matlab’s official documentation. I think that this is a pity, because it’s such a useful little helper function.

Categories: Low risk of breaking in future versions, Semi-documented function

Tags: ,

Bookmark and SharePrint Print

15 Responses to String/char compatibility

  1. Have you seen the functions convertStringsToChars and convertCharsToStrings? I think they do something pretty close to what you’re describing here, and have been available in base MATLAB since R2017b.

    • Not _exactly_ what you’re describing, but close enough that they’re worth a mention, at least.

    • @Sam – thanks for your note. I agree that convertStringsToChars is worth a mention, but in my opinion its only advantage over controllib.internal.util.hString2Char is the fact that it’s documented. In all other aspects I think that hString2Char is superior: it is recursive; it runs on fields of struct arrays and on cell arrays (recursively!); it returns '' (rather than 0×0 empty char array) for ""; it is shorter (although it does a lot more!); it works on R2016b(?) and R2017a; and it does not demand that you specify the output variables:

      >> convertStringsToChars("on")
      Error using convertStringsToChars (line 51)
      The number of outputs must match the number of inputs. 
       
      >> controllib.internal.util.hString2Char("on")
      ans =
          'on'

      I have no idea why MathWorks chose to document and support the inferior convertStringsToChars, rather than the more powerful controllib.internal.util.hString2Char. Moreover, I don’t understand why convertStringsToChars was even developed, since hString2Char already existed at that time.

  2. Marshall Crumiller says:

    I’ve been hanging on to cell arrays of character vectors because in R2016-2017 they were still quite a bit faster, but it looks like in the most recent release (R2018a) they’ve definitely sped things up quite a bit:

    % generate 1e5 array of random strings sized 0-100
    str_siz = [1e5 1];
    cell_char = arrayfun(@(x) char(uint16(rand(1,x)*42+48)), round(rand(str_siz)*100),'UniformOutput',false);
     
    % make a String array version
    array_string = string(cell_char);
     
    >> whos cell_char array_string
      Name                   Size               Bytes  Class     Attributes
     
      array_string      100000x1             15503656  string              
      cell_char         100000x1             21192102  cell
     
    >> tic; save('test.mat','cell_char'); toc
    Elapsed time is 6.074972 seconds.
     
    >> tic; save('test2.mat','array_string'); toc
    Elapsed time is 0.264601 seconds.

    I wonder if they have some fancy compression for storing Strings in .mat files. The ‘-nocompression’ does nothing, but I’m guessing that’s simply not using HDF5’s zlib compression. I’m going to have to do some porting, I have some big databases of character arrays…

  3. David says:

    Surely verLessThan check would be more elegant than the try/catch.

    • @David – I disagree: quite a few people still use older Matlab releases which do not have verLessThan. You need to take such things into consideration when you’re writing code that aught to work for as many Matlab releases as possible.

      I believe that try/catch are extremely under-rated. As long as you clearly comment what you’re doing and why you’re using them, they can be both effective, readable and fast.

    • David says:

      Yes I’ve seen you like try/catch Yair. All too often I see it used by people as a lazy way out so don’t like to see it when there are other ways. If you know why something will fail then you should write a proper check in my opinion. Options other than verLessThan exist if people are unfortunate enough to need to be running in R2006b or older as well as R2016b or newer.

    • Siyi Deng says:

      It’s Easier To Ask Forgiveness Than To Get Permission.

    • David – I can understand where you’re coming from, but my experience with the code that I write is different. Consider the following code snippet for example:

      % Version 1:
      if someCondition()
         data = doSomethingWith(data);
      else
         % do nothing in this specific case because...
      end
       
      % Version 2:
      try
         data = doSomethingWith(data);
      catch
         % do nothing in this specific case because...
      end

      In this case, Version 1 might fail to do the expected thing if you have a problem (bug or exception) in either someCondition() or doSomethingWith(). On the other hand, Version 2 will, in the worst case, keep data unchanged and then proceed with the rest of the downstream processing. For many (but certainly not all) use-cases this would be better than erroring-out on an exception, as in Version 1. Version 2 is also faster than Version 1 due to the avoided condition checks (some checks could take a noticeable time to execute).

      Perhaps the question is a matter of expertise level: just as inexperienced drivers should be much more careful in their driving than experienced ones, so too should inexperienced programmers be more careful in their programming while experienced developers have more leeway.

    • David says:

      You might want to rethink that statement about version 2 being faster. It’s only slightly faster in the new release where you are effectively doing a needless check. However, it’s a lot slower (total time in my example) in the old release where the error will occur inside the try/catch. If you are trying to support old releases as well then the try/catch could have just introduced a big performance degradation. Obviously all comes down to usecase.

      %% Version 1
      nThings = 1e4;
      tic;
      for iThing = 1:nThings
          if verLessThan('matlab', '9.1')
              % nothing to do in old release
          end
      end
      toc;
       
      %% Version 2
      tic;
      for iThing = 1:nThings
          try
              % try to run function in old release which will error because it doesn't exist
              someFunctionThatWillErrorInOldRelease
          catch
          end
      end
      toc;
    • Yair, note that there are cases where your doSomethingWith will itself slow down, just by virtue of being inside a try/catch block – for example, if it contains any in-place optimizations (which are disabled when run inside a try/catch block).

    • Sam & David – I don’t want to pounce on the performance issue too much, after all my main point was about robustness and compatibility, not performance (which indeed is use-case dependent). But just for the record, please read Hanan’s recent comment about how repeated use of verLessThan turned out to be a performance bottleneck in his code.

  4. Gelth says:

    @Yair,
    1) Just precision : I just try the command controllib.internal.util.hString2Char(“”) (with matlab 2017b) and the result was 0×0 empty char array, not .
    By looking the M-code, we see ” is reserved for empty string, BUT, “” is scalar String and then the code return char(“”), not ”.

    2) Since I have problem with tcpip command (thread/ async pbwith big data stream to transfert over tcp) I look at the source code fo tcpip command. And I discover this command :
    instrument.internal.stringConversionHelpers.str2char

    Ok, it is in Instrument Toolbox, but if you have it, it seems best (more clear code than hString2Char because use of switch/case instead of if/then/else). Perhaps it should be a good option too.

Leave a Reply

Your email address will not be published. Required fields are marked *