- Undocumented Matlab - https://undocumentedmatlab.com -

Serializing/deserializing Matlab data

Posted By Yair Altman On January 22, 2014 | 22 Comments

Last year I wrote an article on improving the performance of the save function [1]. The article discussed various ways by which we can store Matlab data on disk. However, in many cases we are interested in a byte-stream serialization, in order to transmit information to external processes.
The request to get a serialized byte-stream of Matlab data has been around for many years (example [2]), but MathWorks has never released a documented way of serializing and unserializing data, except by storing onto a disk file and later loading it from file. Naturally, using a disk file significantly degrades performance. We could always use a RAM-disk or flash memory for improved performance, but in any case this seems like a major overkill to such a simple requirement.
In last year’s article, I presented a File Exchange utility [3] for such generic serialization/deserialization. However, that utility is limited in the types of data that it supports, and while it is relatively fast, there is a much better, more generic and faster solution.
The solution appears to use the undocumented built-in functions getByteStreamFromArray and getArrayFromByteStream, which are apparently used internally by the save and load functions. The usage is very simple:

byteStream = getByteStreamFromArray(anyData);  % 1xN uint8 array
anyData = getArrayFromByteStream(byteStream);

Many Matlab functions, documented and undocumented alike, are defined in XML files within the %matlabroot%/bin/registry/ folder; our specific functions can be found in %matlabroot%/bin/registry/hgbuiltins.xml. While other functions include information about their location and number of input/output args, these functions do not. Their only XML attribute is type = ":all:", which seems to indicate that they accept all data types as input. Despite the fact that the functions are defined in hgbuiltins.xml, they are not limited to HG objects – we can serialize basically any Matlab data: structs, class objects, numeric/cell arrays, sparse data, Java handles, timers, etc. For example:

% Simple Matlab data
>> byteStream = getByteStreamFromArray(pi)  % 1x72 uint8 array
byteStream =
  Columns 1 through 19
    0    1   73   77    0    0    0    0   14    0    0    0   56    0    0    0    6    0    0
  Columns 20 through 38
    0    8    0    0    0    6    0    0    0    0    0    0    0    5    0    0    0    8    0
  Columns 39 through 57
    0    0    1    0    0    0    1    0    0    0    1    0    0    0    0    0    0    0    9
  Columns 58 through 72
    0    0    0    8    0    0    0   24   45   68   84  251   33    9   64
>> getArrayFromByteStream(byteStream)
ans =
          3.14159265358979
% A cell array of several data types
>> byteStream = getByteStreamFromArray({pi, 'abc', struct('a',5)});  % 1x312 uint8 array
>> getArrayFromByteStream(byteStream)
ans =
    [3.14159265358979]    'abc'    [1x1 struct]
% A Java object
>> byteStream = getByteStreamFromArray(java.awt.Color.red);  % 1x408 uint8 array
>> getArrayFromByteStream(byteStream)
ans =
java.awt.Color[r=255,g=0,b=0]
% A Matlab timer
>> byteStream = getByteStreamFromArray(timer);  % 1x2160 uint8 array
>> getArrayFromByteStream(byteStream)
   Timer Object: timer-2
   Timer Settings
      ExecutionMode: singleShot
             Period: 1
           BusyMode: drop
            Running: off
   Callbacks
           TimerFcn: ''
           ErrorFcn: ''
           StartFcn: ''
            StopFcn: ''
% A Matlab class object
>> byteStream = getByteStreamFromArray(matlab.System);  % 1x1760 uint8 array
>> getArrayFromByteStream(byteStream)
ans =
  System: matlab.System

Serializing HG objects

Of course, we can also serialize/deserialize also HG controls, plots/axes and even entire figures. When doing so, it is important to serialize the handle of the object, rather than its numeric handle, since we are interested in serializing the graphic object, not the scalar numeric value of the handle:

% Serializing a simple figure with toolbar and menubar takes almost 0.5 MB !
>> hFig = handle(figure);  % a new default Matlab figure
>> length(getByteStreamFromArray(hFig))
ans =
      479128
% Removing the menubar and toolbar removes much of this amount:
>> set(hFig, 'menuBar','none', 'toolbar','none')
>> length(getByteStreamFromArray(hFig))
ans =
       11848   %!!!
% Plot lines are not nearly as "expensive" as the toolbar/menubar
>> x=0:.01:5; hp=plot(x,sin(x));
>> byteStream = getByteStreamFromArray(hFig);
>> length(byteStream)
ans =
       33088
>> delete(hFig);
>> hFig2 = getArrayFromByteStream(byteStream)
hFig2 =
	figure

The interesting thing here is that when we deserialize a byte-stream of an HG object, it is automatically rendered onscreen. This could be very useful for persistence mechanisms of GUI applications. For example, we can save the figure handles in file so that if the application crashes and relaunches, it simply loads the file and we get exactly the same GUI state, complete with graphs and what-not, just as before the crash. Although the figure was deleted in the last example, deserializing the data caused the figure to reappear.
We do not need to serialize the entire figure. Instead, we could choose to serialize only a specific plot line or axes. For example:

>> x=0:0.01:5; hp=plot(x,sin(x));
>> byteStream = getByteStreamFromArray(handle(hp));  % 1x13080 uint8 array
>> hLine = getArrayFromByteStream(byteStream)
ans =
	graph2d.lineseries

This could also be used to easily clone (copy) any figure or other HG object, by simply calling getArrayFromByteStream (note the corresponding copyobj function, which I bet uses the same underlying mechanism).
Also note that unlike HG objects, deserialized timers are NOT automatically restarted; perhaps the Running property is labeled transient or dependent. Properties defined with these attributes are apparently not serialized [4].

Performance aspects

Using the builtin getByteStreamFromArray and getArrayFromByteStream functions can provide significant performance speedups when caching Matlab data. In fact, it could be used to store otherwise unsupported objects using the save -v6 or savefast alternatives, which I discussed in my save performance article [1]. Robin Ince has shown [5] how this can be used to reduce the combined caching/uncaching run-time from 115 secs with plain-vanilla save, to just 11 secs using savefast. Robin hasn’t tested this in his post, but since the serialized data is a simple uint8 array, it is intrinsically supported by the save -v6 option, which is the fastest alternative of all:

>> byteStream = getByteStreamFromArray(hFig);
>> tic, save('test.mat','-v6','byteStream'); toc
Elapsed time is 0.001924 seconds.
>> load('test.mat')
>> data = load('test.mat')
data =
    byteStream: [1x33256 uint8]
>> getArrayFromByteStream(data.byteStream)
ans =
	figure

Moreover, we can now use java.util.Hashtable to store a cache map of any Matlab data, rather than use the much slower and more limited containers.Map class provided in Matlab.
Finally, note that as built-in functions, these functions could change without prior notice on any future Matlab release.

MEX interface – mxSerialize/mxDeserialize

To complete the picture, MEX includes a couple of undocumented functions mxSerialize and mxDeserialize, which correspond to the above functions. getByteStreamFromArray and getArrayFromByteStream apparently call them internally, since they provide the same results. Back in 2007, Brad Phelan wrote a MEX wrapper that could be used directly in Matlab (mxSerialize.c [6], mxDeserialize.c [7]). The C interface was very simple, and so was the usage:

#include "mex.h"
EXTERN_C mxArray* mxSerialize(mxArray const *);
EXTERN_C mxArray* mxDeserialize(const void *, size_t);
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    if (nlhs && nrhs) {
          plhs[0] = (mxArray *) mxSerialize(prhs[0]);
        //plhs[0] = (mxArray *) mxDeserialize(mxGetData(prhs[0]), mxGetNumberOfElements(prhs[0]));
    }
}

Unfortunately, MathWorks has removed the C interface for these functions from libmx in R2014a, keeping only their C++ interfaces:

mxArray* matrix::detail::noninlined::mx_array_api::mxSerialize(mxArray const *anyData)
mxArray* matrix::detail::noninlined::mx_array_api::mxDeserialize(void const *byteStream, unsigned __int64 numberOfBytes)
mxArray* matrix::detail::noninlined::mx_array_api::mxDeserializeWithTag(void const *byteStream, unsigned __int64 numberOfBytes, char const* *tagName)

These are not the only MEX functions that were removed from libmx in R2014a. Hundreds of other C functions were also removed with them, some of them quite important (e.g., mxCreateSharedDataCopy [8]). A few hundred new C++ functions were added in their place, but I fear that these are not accessible to MEX users without a code change (see below). libmx has always changed between Matlab releases, but not so drastically for many years. If you rely on any undocumented MEX functions in your code, now would be a good time to recheck it, before R2014a is officially released.
Thanks to Bastian Ebeling, we can still use these interfaces in our MEX code by simply renaming the MEX file from .c to .cpp and modifying the code as follows:

#include "mex.h"
// MX_API_VER has unfortunately not changed between R2013b and R2014a,
// so we use the new MATRIX_DLL_EXPORT_SYM as an ugly hack instead
#if defined(__cplusplus) && defined(MATRIX_DLL_EXPORT_SYM)
    #define EXTERN_C extern
    namespace matrix{ namespace detail{ namespace noninlined{ namespace mx_array_api{
#endif
EXTERN_C mxArray* mxSerialize(mxArray const *);
EXTERN_C mxArray* mxDeserialize(const void *, size_t);
// and so on, for any other MEX C functions that migrated to C++ in R2014a
#if defined(__cplusplus) && defined(MATRIX_DLL_EXPORT_SYM)
    }}}}
    using namespace matrix::detail::noninlined::mx_array_api;
#endif
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    if (nlhs && nrhs) {
        plhs[0] = (mxArray *) mxSerialize(prhs[0]);
      //plhs[0] = (mxArray *) mxDeserialize(mxGetData(prhs[0]), mxGetNumberOfElements(prhs[0]));
    }
}

Unfortunately, pre-R2014a code cannot coexist with R2014a code (since libmx is different), so separate MEX files need to be used depending on the Matlab version being used. This highlights the risk of using such unsupported functions.
The roundabout alternative is of course to use mexCallMATLAB [9] to invoke getByteStreamFromArray and getArrayFromByteStream. This is actually rather silly, but it works…
p.s. – Happy 30th anniversary, MathWorks!

Addendum March 9, 2014

Now that the official R2014a has been released, I am happy to report that most of the important MEX functions that were removed in the pre-release have been restored in the official release. These include mxCreateSharedDataCopy [8], mxFastZeros [10], mxCreateUninitDoubleMatrix [11], mxCreateUninitNumericArray, mxCreateUninitNumericMatrix and mxGetPropertyShared [12]. Unfortunately, mxSerialize and mxDeserialize remain among the functions that were left out, which is a real pity considering their usefulness, but we can use one of the workarounds mentioned above. At least those functions that were critical for in-place data manipulation and improved MATLAB performance have been restored, perhaps in some part due to lobbying by yours truly and by others.
MathWorks should be commended for their meaningful dialog with users and for making the fixes in such a short turn-around before the official release, despite the fact that they belong to the undocumented netherworld. MathWorks may appear superficially to be like any other corporate monolith, but when you scratch the surface you discover that there are people there who really care about users, not just the corporate bottom line. I must say that I really like this aspect of their corporate culture.

Categories: Handle graphics, High risk of breaking in future versions, Mex, Undocumented function


22 Comments (Open | Close)

22 Comments To "Serializing/deserializing Matlab data"

#1 Comment By Andre Kuehne On January 23, 2014 @ 07:13

There seems to be a ~4 GB size limitation for objects to be serialized with getByteStreamFromArray:

a = rand(2^30,1,'single');
aStream = getByteStreamFromArray(a);

will result in

Error using getByteStreamFromArray
Error during serialization

but using a slightly smaller array (haven’t found the exact overhead) will work

a = rand(2^30-64,1,'single');
aStream = getByteStreamFromArray(a);

My data is usually larger than 4GB, so unfortunately I cannot use this method to quickly save it to files. Also, since I usually have complex (real/imag) data, I cannot use savefast either since it only supports real data. Do you have any ideas for accelerating saving in that scenario?

#2 Comment By Yair Altman On January 23, 2014 @ 07:30

@Andre – you can split the data into <4GB chunks and/or into separate components for the real and imaginary portions.
If you're using MEX and simple numeric arrays, then mxGetPr will get you a pointer to the real data and mxGetPi will get you a similar pointer to the imaginary data, that you can process separately.

#3 Comment By Martin On September 12, 2014 @ 16:50

@Andre – The limit is either 2 or 4 GB depending on the type of data, since the format uses 32-bit signed integers in some places and 32-bit unsigned integers in other places. If we stick to plain arrays, the limit is 2^32-1 bytes or entries in one dimensions, i.e. this stays within the limit (and thus works)

getByteStreamFromArray(zeros(1, 2 ^ 31 - 1, 'uint8'));

while

getByteStreamFromArray(zeros(1, 2 ^ 31, 'uint8'));
getByteStreamFromArray(zeros(0, 2 ^ 31, 'uint8'));

both fail.

If you use aggregate data types, e.g. cells or structs, then the limit is 4 GB. You can put 3 arrays of size 1 GB into a cell array and then successfully it.

(However, there’s an elegant way to get beyond this limitation. Watch out for one of my future comments.)

#4 Comment By Dr. Bastian Ebeling On January 23, 2014 @ 23:08

As in your upper question – I’d like to tell you, I’ve successfully used some of those C++-Undocumented functions even in mex-files.
Greets
Bastian

#5 Comment By HexiuM On January 27, 2014 @ 06:20

I am not 100% sure that I understand the topic of data serialization, but while testing I found this strange (according to what I expected) behavior. If I give the following command:

a = getByteStreamFromArray(2)

I am getting a 72-column array. If I then issue:

b = getByteStreamFromArray(3)

I get the exact same array. Confirmed by:

sum(abs(a-b))

which gives 0.

Then if I do:

getArrayFromByteStream(a)

I get 2
and if I do

getArrayFromByteStream(b)

I get 3 (as expected). So the question comes down to (irrespective of what getArrayFromByteStream does) how can it give different results for the same input data (remember a==b)?

Thanks!

#6 Comment By HexiuM On January 27, 2014 @ 06:24

Oops…my mistake! sum(abs(a-b)) is obviously not right when handling uint8 data!

Sorry

#7 Comment By Yair Altman On January 27, 2014 @ 06:25

@Hexium – you made a mistake. the 2 bytestream differ in the 71st element: it is 8 for getByteStreamFromArray(3) and 0 for getByteStreamFromArray(2).

#8 Comment By Ian On January 27, 2014 @ 06:34

Try:

 
sum(abs(b-a))

And you’ll get 8, this is because the data type is u — int8a ~= b

#9 Comment By Ian On January 27, 2014 @ 06:39

Thanks Yair this is incredibly helpful, I’ve been wanting a better way to serialize matlab class objects to send across a TCP connection for ages, saving to disk was such a slow workaround…

#10 Comment By Andreas Martin On February 3, 2014 @ 23:23

Thank you, this would be very useful for me. Do you know if this is independent of platform and OS (32/64bit, little/big endian, win/linux)?

#11 Comment By Yair Altman On February 4, 2014 @ 04:31

@Andreas – I don’t know. I assume so, because I believe that this is the underlying mechanism used by the save/load functions, but I cannot be certain since I do not have the source code. It should be easy enough to test, though.

#12 Comment By Martin On September 12, 2014 @ 16:25

As far as I know this is the same mechanism that is used to distribute data when using the Parallel Computing Toolbox or the MATLAB Distributed Computing Server (parfor, smpd, etc.). This works across multiple machines with different operating systems. I think the endianess is not really relevant as all currently supported platforms are litte endian. And since the format is 32-bit even on a 64-bit machine, I would be surprised that this would be an issue. In short: I’m fairly certain the format is highly portable.

#13 Pingback By savezip utility | Undocumented Matlab On September 4, 2014 @ 11:14

[…] A few months ago I wrote about Matlab’s undocumented serialization/deserialization functions, getByteStreamFromArray and getArrayFromByteStream […]

#14 Comment By TD On September 7, 2014 @ 22:49

Thanks Yair, this is a great option for saving of customized objects that control graphics features (HG1 *sigh*). I have noticed though that it looks like listener objects created by addlistener cannot be cleanly recreated from a byte stream. The warning suggests the constructor needs a name. Potentially some other objects may have similar vulnerabilities

#15 Comment By Sam Roberts On February 22, 2016 @ 12:11

It’s worth noting that getByteStreamFromArray calls saveobj. For regular MATLAB variables, this won’t make any difference, but if you have a MATLAB class, you can overload saveobj, and then getByteStreamFromArray will be serializing the output of saveobj. (This is why Transient and Dependent properties are not serialized).

Analogously, getArrayFromByteStream also calls loadobj.

#16 Pingback By Serializing MATLAB data | Possibly Wrong On August 24, 2016 @ 17:58

[…] actually a very simple and robust built-in solution… as long as we’re comfortable with undocumented functionality.  The function b=getByteStreamFromArray(v) converts a value to a uint8 array of […]

#17 Comment By Roc Woods On March 28, 2018 @ 13:10

Thanks Yair, this would be very useful. but, there seems to be a bug for some objects to be deserialized with getArrayFromByteStream in deployed mode (dll). Take the AlexNet(we can get the alexnet from Add-On Explorer)for example: when we serialize and deserialized alexnet using the following code:

net = alexnet;
netByte = getByteStreamFromArray(net);
net2 = getArrayFromByteStream(netByte);

it performs well in both MATLAB or deployed mode.

but if we save netByte in a mat file in advance, then it does not work in dll mode:

netByte = load('netByte.mat');
netByte = netByte.netByte;
net2 = getArrayFromByteStream(netByte);
save D:\net2.mat net2

We can not load the net2.mat. It doesn’t look like to be created correctly.

Do you have any ideas for solving this problem?

Thanks!

#18 Comment By Yair Altman On March 28, 2018 @ 13:44

@Roc – try to convert your data to int16 before saving, and then convert back to uint8 after loading:

netByte = getByteStreamFromArray(net);
netByteSaved = int16(netByte);  % convert to int16
save(filename, 'netByteSaved');
...
fileData = load(filename);
netByte2 = uint8(fileData.netByteSaved);  % convert back to uint8
net2 = getArrayFromByteStream(netByte2);

#19 Comment By Roc Woods On March 28, 2018 @ 18:46

@Yair, Thank you very much. I tried your suggestion, but it still has the same result as previous. It is noteworthy that the following code works fine in MATLAB:

function testDeserialized(matfilename)
   fileData = load(matfilename);
   netByte2 = uint8(fileData.netByteSaved);  % convert back to uint8
   net2 = getArrayFromByteStream(netByte2);
   save D:\net00.mat net2

but when I use the following code

mcc -W cpplib:testDeserialized -T link:lib testDeserialized.m  -C;

generate deployed files, such as testDeserialized.dll, testDeserialized.lib, testDeserialized.ctf and call testDeserialized.dll from C++ it has error when I tried to load D:\net00.mat. The error message is as follows:

load('D:\net00.mat')
Error using load
Cannot read file D:\net00.mat. 
try open('load(''D:\net00.mat'')
          ↑
Error: Character vector is not terminated properly.

when double click net00.mat,the error message changed to be

Warning: Unable to read some of the variables due to unknown MAT-file error.
 
> In matfinfo (line 9)
  In finfo (line 118)
  In internal.matlab.importtool.ImportableFileIdentifier.isTextFile (line 113)
  In internal.matlab.importtool.ImportableFileIdentifier.useTextImportTool (line 91)
  In uiimport>useTextImportTool (line 998)
  In uiimport (line 237) 
Warning: Unable to read some of the variables due to unknown MAT-file error.
 
> In matfinfo (line 9)
  In finfo (line 118)
  In uiimport/gatherFilePreviewData (line 416)
  In uiimport (line 245) 
Error using load
Number of columns on line 2 of ASCII file D:\net00.mat must be the same as previous lines.
Error in uiimport/runImportdata (line 467)
                    datastruct = load('-ascii', fileAbsolutePath);
Error in uiimport/gatherFilePreviewData (line 435)
        [datastruct, textDelimiter, headerLines]= runImportdata(fileAbsolutePath,type);
Error in uiimport (line 245)
    gatherFilePreviewData(fileAbsolutePath); 

Do you have any good Suggestions? Thanks.

#20 Comment By Roc Woods On April 2, 2018 @ 13:08

I solved the problem. The problem is that when you compile to DLL, if you use the following command

mcc -W cpplib:testDeserialized -T link:lib testDeserialized.m -C;

the compiler can’t accurately package all the functions that need to be dependent on all the deserialization. At this time, create an empty object that needs to be deserialized, save it, such as nullObj.mat and then pack it with the -a option, and the compiler will analyze the mat file and automatically find the fully dependent functions. As shown below

mcc -W cpplib:testDeserialized -T link:lib testDeserialized.m -a nullObj.mat -C.

Thanks again @Yair

#21 Comment By Yair Altman On April 2, 2018 @ 13:15

@Roc – thanks for the follow-up for the benefit of other readers

#22 Comment By Blake On August 16, 2018 @ 16:26

This is a great article! Does anyone know the format of the header information in the converted bytestream?

Thanks!


Article printed from Undocumented Matlab: https://undocumentedmatlab.com

URL to article: https://undocumentedmatlab.com/articles/serializing-deserializing-matlab-data

URLs in this post:

[1] performance of the save function: http://undocumentedmatlab.com/blog/improving-save-performance/

[2] example: http://stackoverflow.com/questions/4807035/is-it-possible-to-intercept-a-matlab-save-bytestream

[3] File Exchange utility: https://www.mathworks.com/matlabcentral/fileexchange/34564-fast-serialize-deserialize

[4] apparently not serialized: http://www.mathworks.com/help/matlab/matlab_oop/understanding-the-save-and-load-process.html

[5] shown: http://www.robinince.net/blog/2013/06/18/saving-structures-quickly-with-serialization/

[6] mxSerialize.c: https://sccn.ucsd.edu/svn/software/tags/EGLAB7_0_1_3beta/external/fileio-20090511/private/mxSerialize.c

[7] mxDeserialize.c: https://sccn.ucsd.edu/svn/software/tags/EGLAB7_0_1_3beta/external/fileio-20090511/private/mxDeserialize.c

[8] mxCreateSharedDataCopy: http://stackoverflow.com/questions/19813718/mex-files-how-to-return-an-already-allocated-matlab-array

[9] mexCallMATLAB: http://www.mathworks.com/help/matlab/apiref/mexcallmatlab.html

[10] mxFastZeros: http://www.mathworks.com.au/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70351

[11] mxCreateUninitDoubleMatrix: http://www.mathworks.com/matlabcentral/fileexchange/31362-uninit-create-an-uninitialized-variable-like-zeros-but-faster

[12] mxGetPropertyShared: http://undocumentedmatlab.com/blog/accessing-private-object-properties/

[13] Inter-Matlab data transfer with memcached : https://undocumentedmatlab.com/articles/inter-matlab-data-transfer-with-memcached

[14] Controlling plot data-tips : https://undocumentedmatlab.com/articles/controlling-plot-data-tips

[15] Matlab mex in-place editing : https://undocumentedmatlab.com/articles/matlab-mex-in-place-editing

[16] Draggable plot data-tips : https://undocumentedmatlab.com/articles/draggable-plot-data-tips

[17] Additional license data : https://undocumentedmatlab.com/articles/additional-license-data

[18] Accessing plot brushed data : https://undocumentedmatlab.com/articles/accessing-plot-brushed-data

Copyright © Yair Altman - Undocumented Matlab. All rights reserved.