Undocumented Matlab
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT
  • SERVICES
    • Consulting
    • Development
    • Training
    • Gallery
    • Testimonials
  • PRODUCTS
    • IQML: IQFeed-Matlab connector
    • IB-Matlab: InteractiveBrokers-Matlab connector
    • EODML: EODHistoricalData-Matlab connector
    • Webinars
  • BOOKS
    • Secrets of MATLAB-Java Programming
    • Accelerating MATLAB Performance
    • MATLAB Succinctly
  • ARTICLES
  • ABOUT
    • Policies
  • CONTACT

Parsing XML strings

February 1, 2017 8 Comments

I have recently consulted in a project where data was provided in XML strings and needed to be parsed in Matlab memory in an efficient manner (in other words, as quickly as possible). Now granted, XML is rather inefficient in storing data (JSON would be much better for this, for example). But I had to work with the given situation, and that required processing the XML.
I basically had two main alternatives:

  • I could either create a dedicated string-parsing function that searches for a particular pattern within the XML string, or
  • I could use a standard XML-parsing library to create the XML model and then parse its nodes

The first alternative is quite error-prone, since it relies on the exact format of the data in the XML. Since the same data can be represented in multiple equivalent XML ways, making the string-parsing function robust as well as efficient would be challenging. I was lazy expedient, so I chose the second alternative.
Unfortunately, Matlab’s xmlread function only accepts input filenames (of *.xml files), it cannot directly parse XML strings. Yummy!
The obvious and simple solution is to simply write the XML string into a temporary *.xml file, read it with xmlread, and then delete the temp file:

% Store the XML data in a temp *.xml file
filename = [tempname '.xml'];
fid = fopen(filename,'Wt');
fwrite(fid,xmlString);
fclose(fid);
% Read the file into an XML model object
xmlTreeObject = xmlread(filename);
% Delete the temp file
delete(filename);
% Parse the XML model object
...

% Store the XML data in a temp *.xml file filename = [tempname '.xml']; fid = fopen(filename,'Wt'); fwrite(fid,xmlString); fclose(fid); % Read the file into an XML model object xmlTreeObject = xmlread(filename); % Delete the temp file delete(filename); % Parse the XML model object ...

This works well and we could move on with our short lives. But cases such as this, where a built-in function seems to have a silly limitation, really fire up the investigative reporter in me. I decided to drill into xmlread to discover why it couldn’t parse XML strings directly in memory, without requiring costly file I/O. It turns out that xmlread accepts not just file names as input, but also Java object references (specifically, java.io.File, java.io.InputStream or org.xml.sax.InputSource). In fact, there are quite a few other inputs that we could use, to specify a validation parser etc. – I wrote about this briefly back in 2009 (along with other similar semi-documented input altermatives in xmlwrite and xslt).

In our case, we could simply send xmlread as input a java.io.StringBufferInputStream(xmlString) object (which is an instance of java.io.InputStream) or org.xml.sax.InputSource(java.io.StringReader(xmlString)):

% Read the xml string directly into an XML model object
inputObject = java.io.StringBufferInputStream(xmlString);                % alternative #1
inputObject = org.xml.sax.InputSource(java.io.StringReader(xmlString));  % alternative #2
xmlTreeObject = xmlread(inputObject);
% Parse the XML model object
...

% Read the xml string directly into an XML model object inputObject = java.io.StringBufferInputStream(xmlString); % alternative #1 inputObject = org.xml.sax.InputSource(java.io.StringReader(xmlString)); % alternative #2 xmlTreeObject = xmlread(inputObject); % Parse the XML model object ...

If we don’t want to depend on undocumented functionality (which might break in some future release, although it has remained unchanged for at least the past decade), and in order to improve performance even further by passing xmlread‘s internal validity checks and processing, we can use xmlread‘s core functionality to parse our XML string directly. We can add a fallback to the standard (fully-documented) functionality, just in case something goes wrong (which is good practice whenever using any undocumented functionality):

try
    % The following avoids the need for file I/O:
    inputObject = java.io.StringBufferInputStream(xmlString);  % or: org.xml.sax.InputSource(java.io.StringReader(xmlString))
    try
        % Parse the input data directly using xmlread's core functionality
        parserFactory = javaMethod('newInstance','javax.xml.parsers.DocumentBuilderFactory');
        p = javaMethod('newDocumentBuilder',parserFactory);
        xmlTreeObject = p.parse(inputObject);
    catch
        % Use xmlread's semi-documented inputObject input feature
        xmlTreeObject = xmlread(inputObject);
    end
catch
    % Fallback to standard xmlread usage, using a temporary XML file:
    % Store the XML data in a temp *.xml file
    filename = [tempname '.xml'];
    fid = fopen(filename,'Wt');
    fwrite(fid,xmlString);
    fclose(fid);
    % Read the file into an XML model object
    xmlTreeObject = xmlread(filename);
    % Delete the temp file
    delete(filename);
end
% Parse the XML model object
...

try % The following avoids the need for file I/O: inputObject = java.io.StringBufferInputStream(xmlString); % or: org.xml.sax.InputSource(java.io.StringReader(xmlString)) try % Parse the input data directly using xmlread's core functionality parserFactory = javaMethod('newInstance','javax.xml.parsers.DocumentBuilderFactory'); p = javaMethod('newDocumentBuilder',parserFactory); xmlTreeObject = p.parse(inputObject); catch % Use xmlread's semi-documented inputObject input feature xmlTreeObject = xmlread(inputObject); end catch % Fallback to standard xmlread usage, using a temporary XML file: % Store the XML data in a temp *.xml file filename = [tempname '.xml']; fid = fopen(filename,'Wt'); fwrite(fid,xmlString); fclose(fid); % Read the file into an XML model object xmlTreeObject = xmlread(filename); % Delete the temp file delete(filename); end % Parse the XML model object ...

Related posts:

  1. Parsing mlint (Code Analyzer) output – The Matlab Code Analyzer (mlint) has a lot of undocumented functionality just waiting to be used. ...
  2. File deletion memory leaks, performance – Matlab's delete function leaks memory and is also slower than the equivalent Java function. ...
  3. Matlab compiler bug and workaround – Both the Matlab compiler and the publish function have errors when parsing block-comments in Matlab m-code. ...
  4. Undocumented XML functionality – Matlab's built-in XML-processing functions have several undocumented features that can be used by Java-savvy users...
  5. Matlab-Java memory leaks, performance – Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
  6. Improving fwrite performance – Standard file writing performance can be improved in Matlab in surprising ways. ...
Java Performance Semi-documented feature XML
Print Print
« Previous
Next »
8 Responses
  1. americanlamboard.com February 14, 2017 at 11:55 Reply

    @shsteimer I am passing in xml string and it is returning null. It does not throw any exception. What must be wrong?

  2. Ondrej March 22, 2017 at 09:47 Reply

    Hi Yair,
    I am curious about some profiling and how fast each of the method is.
    Thanks.

  3. Michal Kvasnicka October 9, 2017 at 17:16 Reply

    @Yair Undocumented functionality (first code box) does not work at all (R2017b)!!!
    I always got null result. Any workaround?

    • Yair Altman October 11, 2017 at 02:44 Reply

      @Michal – when you say that something “does not work at all” you provide zero information about what you tried to do and what happened. If you expect me to spend time to try to help you, then the minimum that you should do is to spend the time to include useful information that could help diagnose the problem. If you don’t have the time to add this information, then I don’t have the time to assist you, sorry.

  4. Peter Cook October 26, 2017 at 21:58 Reply

    Yair,

    I have an application I wrote that inhales rather large XML files using xml2struct and the performance is less-than-ideal. I’ve got an idea from a different blog post of yours about faster ASCII file parsing via fread. The idea would be to inhale the xml file in binary mode, and look for ASCII 60 and ASCII 62 and partition the data from that starting point. Do you think this approach would lend itself to faster xml reads?

    • Yair Altman October 26, 2017 at 22:35 Reply

      @Peter – in general, yes it would indeed be faster but you need to take into consideration that XML is much more than simple tags enclosed within < and >. Tags can have attributes, and comments and binary CData etc. etc. If your XML is very simple and does not contain such nuisances then text parsing would be simple and faster, but if there is a chance that the XML file contains them then you’d better off rely on the more general parser for robustness.

  5. KE November 19, 2019 at 14:54 Reply

    Once you read your XML information into Matlab, what kind of ‘container’ do you find it most useful for handing the data? I have tried nonscalar structure arrays (because that’s what xml2struct produces) but it is hard to extract field data across records, for example to make a plot. Is there a better way?

    • Yair Altman November 19, 2019 at 15:41 Reply

      I typically use structs, due to the easy and efficient way that I can aggregate data from different nodes (as long as the nesting is not too deep). For example, [data.age] or {data.name}.

      There are numerous versions of xml2struct on the Matlab File Exchange (link). Depending on your specific needs, you may find that some versions produce cleaner or more compact structs. I use a self-modified version of Wouter Falkena’s utility. My version handles a bunch of edge cases (CDATA entries etc.), and makes the resulting struct cleaner and more compact where possible – it can be downloaded here.

Leave a Reply
HTML tags such as <b> or <i> are accepted.
Wrap code fragments inside <pre lang="matlab"> tags, like this:
<pre lang="matlab">
a = magic(3);
disp(sum(a))
</pre>
I reserve the right to edit/delete comments (read the site policies).
Not all comments will be answered. You can always email me (altmany at gmail) for private consulting.

Click here to cancel reply.

Useful links
  •  Email Yair Altman
  •  Subscribe to new posts (email)
  •  Subscribe to new posts (feed)
  •  Subscribe to new posts (reader)
  •  Subscribe to comments (feed)
 
Accelerating MATLAB Performance book
Recent Posts

Speeding-up builtin Matlab functions – part 3

Improving graphics interactivity

Interesting Matlab puzzle – analysis

Interesting Matlab puzzle

Undocumented plot marker types

Matlab toolstrip – part 9 (popup figures)

Matlab toolstrip – part 8 (galleries)

Matlab toolstrip – part 7 (selection controls)

Matlab toolstrip – part 6 (complex controls)

Matlab toolstrip – part 5 (icons)

Matlab toolstrip – part 4 (control customization)

Reverting axes controls in figure toolbar

Matlab toolstrip – part 3 (basic customization)

Matlab toolstrip – part 2 (ToolGroup App)

Matlab toolstrip – part 1

Categories
  • Desktop (45)
  • Figure window (59)
  • Guest bloggers (65)
  • GUI (165)
  • Handle graphics (84)
  • Hidden property (42)
  • Icons (15)
  • Java (174)
  • Listeners (22)
  • Memory (16)
  • Mex (13)
  • Presumed future risk (394)
    • High risk of breaking in future versions (100)
    • Low risk of breaking in future versions (160)
    • Medium risk of breaking in future versions (136)
  • Public presentation (6)
  • Semi-documented feature (10)
  • Semi-documented function (35)
  • Stock Matlab function (140)
  • Toolbox (10)
  • UI controls (52)
  • Uncategorized (13)
  • Undocumented feature (217)
  • Undocumented function (37)
Tags
ActiveX (6) AppDesigner (9) Callbacks (31) Compiler (10) Desktop (38) Donn Shull (10) Editor (8) Figure (19) FindJObj (27) GUI (141) GUIDE (8) Handle graphics (78) HG2 (34) Hidden property (51) HTML (26) Icons (9) Internal component (39) Java (178) JavaFrame (20) JIDE (19) JMI (8) Listener (17) Malcolm Lidierth (8) MCOS (11) Memory (13) Menubar (9) Mex (14) Optical illusion (11) Performance (78) Profiler (9) Pure Matlab (187) schema (7) schema.class (8) schema.prop (18) Semi-documented feature (6) Semi-documented function (33) Toolbar (14) Toolstrip (13) uicontrol (37) uifigure (8) UIInspect (12) uitools (20) Undocumented feature (187) Undocumented function (37) Undocumented property (20)
Recent Comments
  • Marcel (9 days 11 hours ago): Hi, I am trying to set the legend to Static, but this command seems not to work in R2022a anymore: set(gca,’LegendColorbarL isteners’,[]); Any ideas? THANKS / marcel
  • Gres (9 days 15 hours ago): In 2018b, you can get the icons by calling [hh,icons,plots,txt] = legend({‘Line 1’});
  • Yair Altman (11 days 10 hours ago): @Mitchell – in most cases the user wants a single string identifier for the computer, that uniquely identifies it with a distinct fingerprint that is different from any...
  • Mitchell (11 days 19 hours ago): Great post! I’m not very familiar with the network interfaces being referenced here, but it seems like the java-based cross-platform method concatenates all network...
  • Yair Altman (14 days 12 hours ago): Dani – You can use jViewport.setViewPosition(java .awt.Point(0,0)) as I showed in earlier comments here
  • dani (15 days 8 hours ago): hi!! how i can set the horizontal scrollbar to the leftside when appearing! now it set to right side of text
  • Yair Altman (24 days 4 hours ago): Dom – call drawnow *just before* you set hLine.MarkerHandle.FaceColorTy pe to 'truecoloralpha'. Also, you made a typo in your code: it’s truecoloralpha, not...
  • Dom (25 days 3 hours ago): Yair I have tried your code with trucoloralpha and the markers do not appear transparent in R2021b, same as for Oliver.
  • Yair Altman (28 days 10 hours ago): Ren – This is usually the expected behavior, which avoids unnecessary duplications of the Excel process in CPU/memory. If you want to kill the process you can always run...
  • Yair Altman (29 days 0 hours ago): When you use plot() without hold(‘on’), each new plot() clears the axes and draws a new line, so your second plot() of p2 caused the first plot() line (p1) to be...
  • Cesim Dumlu (35 days 8 hours ago): Hello. I am trying to do a gradient plot for multiple functions to be displayed on the same axes and each one is colorcoded by respective colordata, using the same scaling. The...
  • Yair Altman (43 days 11 hours ago): @Veronica – you are using the new version of uitree, which uses HTML-based uifigures, and my post was about the Java-based uitree which uses legacy Matlab figures. For...
  • Veronica Taurino (43 days 11 hours ago): >> [txt1,txt2] ans = ‘abrakadabra’
  • Veronica Taurino (43 days 11 hours ago): Hello, I am just trying to change the uitree node name as you suggested: txt1 = 'abra'; txt2 = 'kadabra'; node.setName([txt1,txt2]); >> "Unrecognized method, property, or...
  • Yair Altman (46 days 11 hours ago): The version of JGraph that you downloaded uses a newer version of Java (11) than the one that Matlab supports (8). You need to either (1) find an earlier version of JGraph that...
Contact us
Undocumented Matlab © 2009 - Yair Altman
This website and Octahedron Ltd. are not affiliated with The MathWorks Inc.; MATLAB® is a registered trademark of The MathWorks Inc.
Scroll to top