XML – Undocumented Matlab https://undocumentedmatlab.com/blog_old Charting Matlab's unsupported hidden underbelly Thu, 27 Oct 2022 10:42:32 +0000 en-US hourly 1 https://wordpress.org/?v=4.4.1 Parsing XML stringshttps://undocumentedmatlab.com/blog_old/parsing-xml-strings https://undocumentedmatlab.com/blog_old/parsing-xml-strings#comments Wed, 01 Feb 2017 09:52:45 +0000 https://undocumentedmatlab.com/?p=6838 Related posts:
  1. Undocumented XML functionality Matlab's built-in XML-processing functions have several undocumented features that can be used by Java-savvy users...
  2. Matlab-Java memory leaks, performance Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
  3. Types of undocumented Matlab aspects This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
  4. Pause for the better Java's thread sleep() function is much more accurate than Matlab's pause() function. ...
]]>
I have recently consulted in a project where data was provided in XML strings and needed to be parsed in Matlab memory in an efficient manner (in other words, as quickly as possible). Now granted, XML is rather inefficient in storing data (JSON would be much better for this, for example). But I had to work with the given situation, and that required processing the XML.

I basically had two main alternatives:

  • I could either create a dedicated string-parsing function that searches for a particular pattern within the XML string, or
  • I could use a standard XML-parsing library to create the XML model and then parse its nodes

The first alternative is quite error-prone, since it relies on the exact format of the data in the XML. Since the same data can be represented in multiple equivalent XML ways, making the string-parsing function robust as well as efficient would be challenging. I was lazy expedient, so I chose the second alternative.

Unfortunately, Matlab’s xmlread function only accepts input filenames (of *.xml files), it cannot directly parse XML strings. Yummy!

The obvious and simple solution is to simply write the XML string into a temporary *.xml file, read it with xmlread, and then delete the temp file:

% Store the XML data in a temp *.xml file
filename = [tempname '.xml'];
fid = fopen(filename,'Wt');
fwrite(fid,xmlString);
fclose(fid);
 
% Read the file into an XML model object
xmlTreeObject = xmlread(filename);
 
% Delete the temp file
delete(filename);
 
% Parse the XML model object
...

This works well and we could move on with our short lives. But cases such as this, where a built-in function seems to have a silly limitation, really fire up the investigative reporter in me. I decided to drill into xmlread to discover why it couldn’t parse XML strings directly in memory, without requiring costly file I/O. It turns out that xmlread accepts not just file names as input, but also Java object references (specifically, java.io.File, java.io.InputStream or org.xml.sax.InputSource). In fact, there are quite a few other inputs that we could use, to specify a validation parser etc. – I wrote about this briefly back in 2009 (along with other similar semi-documented input altermatives in xmlwrite and xslt).

In our case, we could simply send xmlread as input a java.io.StringBufferInputStream(xmlString) object (which is an instance of java.io.InputStream) or org.xml.sax.InputSource(java.io.StringReader(xmlString)):

% Read the xml string directly into an XML model object
inputObject = java.io.StringBufferInputStream(xmlString);                % alternative #1
inputObject = org.xml.sax.InputSource(java.io.StringReader(xmlString));  % alternative #2
 
xmlTreeObject = xmlread(inputObject);
 
% Parse the XML model object
...

If we don’t want to depend on undocumented functionality (which might break in some future release, although it has remained unchanged for at least the past decade), and in order to improve performance even further by passing xmlread‘s internal validity checks and processing, we can use xmlread‘s core functionality to parse our XML string directly. We can add a fallback to the standard (fully-documented) functionality, just in case something goes wrong (which is good practice whenever using any undocumented functionality):

try
    % The following avoids the need for file I/O:
    inputObject = java.io.StringBufferInputStream(xmlString);  % or: org.xml.sax.InputSource(java.io.StringReader(xmlString))
    try
        % Parse the input data directly using xmlread's core functionality
        parserFactory = javaMethod('newInstance','javax.xml.parsers.DocumentBuilderFactory');
        p = javaMethod('newDocumentBuilder',parserFactory);
        xmlTreeObject = p.parse(inputObject);
    catch
        % Use xmlread's semi-documented inputObject input feature
        xmlTreeObject = xmlread(inputObject);
    end
catch
    % Fallback to standard xmlread usage, using a temporary XML file:
 
    % Store the XML data in a temp *.xml file
    filename = [tempname '.xml'];
    fid = fopen(filename,'Wt');
    fwrite(fid,xmlString);
    fclose(fid);
 
    % Read the file into an XML model object
    xmlTreeObject = xmlread(filename);
 
    % Delete the temp file
    delete(filename);
end
 
% Parse the XML model object
...
]]>
https://undocumentedmatlab.com/blog_old/parsing-xml-strings/feed 6
Undocumented XML functionalityhttps://undocumentedmatlab.com/blog_old/undocumented-xml-functionality https://undocumentedmatlab.com/blog_old/undocumented-xml-functionality#comments Thu, 19 Nov 2009 01:12:06 +0000 https://undocumentedmatlab.com/?p=751 Related posts:
  1. Parsing XML strings Matlab's xmlread function cannot process XML data directly, but this can easily be overcome. ...
  2. Types of undocumented Matlab aspects This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
  3. Matlab-Java memory leaks, performance Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
  4. Sending email/text messages from Matlab Sending emails and SMS (text) messages from Matlab is easy, once you know a few quirks....
]]>
Matlab’s built-in XML-processing functions have several undocumented features that can be used by Java-savvy users. We should note that the entire XML-support functionality in Matlab is java-based. I understand that some Matlab users have a general aversion to Java, some even going as far as to disable it using the -nojvm startup option. But if you disable Java, Matlab’s XML functions will simply not work. Matlab’s own documentation points users to Sun’s official Java website for explanations of how to use the XML functionality (the link in the Matlab docpage is dead – the correct link should probably be https://jaxp-sources.dev.java.net/nonav/docs/api/, but Sun keeps changing its website so this link could also be dead soon…).

Using the full Java XML parsing (JAXP) functionality is admittedly quite intimidating for the uninitiated, but extremely powerful once you understand how all the pieces fit together. Over the years, several interesting utilities were submitted to the Matlab File Exchange that simplify this intimidating post-processing. See for example XML parsing tools, the extremely popular XML Toolbox and xml_io_tools, the recent XML data import and perhaps a dozen other utilities.

Each of Matlab’s main built-in XML-processing functions, xmlread, xmlwrite and xslt has an internal set of undocumented and unsupported functionalities, which builds on their internal Java implementation. As far as I could tell, these unsupported functionalities were supported at least as early as Matlab 7.2 (R2006a), and possibly even on earlier releases. For the benefit of the Java and/or JAXP -speakers out there (it will probably not help any others), I list Matlab’s internal description of these unsupported functionalities, annotated with API hyperlinks. These description (sans the links) can be seen by simply editing the m file, as in (the R2008a variant is described below):

edit xmlread

xmlread

function [parseResult,p] = xmlread(fileName,varargin)
  • FILENAME can also be an InputSource, File, or InputStream object
  • DOMNODE = XMLREAD(FILENAME,…,P,…) where P is a DocumentBuilder object
  • DOMNODE = XMLREAD(FILENAME,…,’-validating’,…) will create a validating parser if one was not provided.
  • DOMNODE = XMLREAD(FILENAME,…,ER,…) where ER is an EntityResolver will set the EntityResolver before parsing
  • DOMNODE = XMLREAD(FILENAME,…,EH,…) where EH is an ErrorHandler will set the ErrorHandler before parsing
  • [DOMNODE,P] = XMLREAD(FILENAME,…) will return a parser suitable for passing back to XMLREAD for future parses.

xmlwrite

function xmlwrite(FILENAME,DOMNODE);
function str = xmlwrite(DOMNODE);
function str = xmlwrite(SOURCE);

xslt

function [xResultURI,xProcessor] = xslt(SOURCE,STYLE,DEST,varargin)
  • SOURCE can also be a XSLTInputSource
  • STYLE can also be a StylesheetRoot or XSLTInputSource
  • DEST can also be an XSLTResultTarget. Note that RESULT may be empty in this case since it may not be possible to determine a URL. If STYLE is absent or empty, the function uses the stylesheet named in the xml-stylesheet processing instruction in the SOURCE XML file. (This does not always work)
  • There is also an entirely undocumented feature: passing a ‘-tostring’ input argument transforms the inputs into a displayed text segment, rather than into a displayed URI; the transformed text is returned in the xResultURI output argument.

Note: internal comments within the Matlab code seem to indicate that XSLT is SAXON-based, so interested users might use SAXON’s documentation for accessing additional XSLT-related features/capabilities (also see this related thread).

]]>
https://undocumentedmatlab.com/blog_old/undocumented-xml-functionality/feed 6