Undocumented XML functionality

Matlab’s built-in XML-processing functions have several undocumented features that can be used by Java-savvy users. We should note that the entire XML-support functionality in Matlab is java-based. I understand that some Matlab users have a general aversion to Java, some even going as far as to disable it using the -nojvm startup option. But if you disable Java, Matlab’s XML functions will simply not work. Matlab’s own documentation points users to Sun’s official Java website for explanations of how to use the XML functionality (the link in the Matlab docpage is dead – the correct link should probably be https://jaxp-sources.dev.java.net/nonav/docs/api/, but Sun keeps changing its website so this link could also be dead soon…).

Using the full Java XML parsing (JAXP) functionality is admittedly quite intimidating for the uninitiated, but extremely powerful once you understand how all the pieces fit together. Over the years, several interesting utilities were submitted to the Matlab File Exchange that simplify this intimidating post-processing. See for example XML parsing tools, the extremely popular XML Toolbox and xml_io_tools, the recent XML data import and perhaps a dozen other utilities.

Each of Matlab’s main built-in XML-processing functions, xmlread, xmlwrite and xslt has an internal set of undocumented and unsupported functionalities, which builds on their internal Java implementation. As far as I could tell, these unsupported functionalities were supported at least as early as Matlab 7.2 (R2006a), and possibly even on earlier releases. For the benefit of the Java and/or JAXP -speakers out there (it will probably not help any others), I list Matlab’s internal description of these unsupported functionalities, annotated with API hyperlinks. These description (sans the links) can be seen by simply editing the m file, as in (the R2008a variant is described below):

edit xmlread

xmlread

function [parseResult,p] = xmlread(fileName,varargin)
  • FILENAME can also be an InputSource, File, or InputStream object
  • DOMNODE = XMLREAD(FILENAME,…,P,…) where P is a DocumentBuilder object
  • DOMNODE = XMLREAD(FILENAME,…,’-validating’,…) will create a validating parser if one was not provided.
  • DOMNODE = XMLREAD(FILENAME,…,ER,…) where ER is an EntityResolver will set the EntityResolver before parsing
  • DOMNODE = XMLREAD(FILENAME,…,EH,…) where EH is an ErrorHandler will set the ErrorHandler before parsing
  • [DOMNODE,P] = XMLREAD(FILENAME,…) will return a parser suitable for passing back to XMLREAD for future parses.

xmlwrite

function xmlwrite(FILENAME,DOMNODE);
function str = xmlwrite(DOMNODE);
function str = xmlwrite(SOURCE);

xslt

function [xResultURI,xProcessor] = xslt(SOURCE,STYLE,DEST,varargin)
  • SOURCE can also be a XSLTInputSource
  • STYLE can also be a StylesheetRoot or XSLTInputSource
  • DEST can also be an XSLTResultTarget. Note that RESULT may be empty in this case since it may not be possible to determine a URL. If STYLE is absent or empty, the function uses the stylesheet named in the xml-stylesheet processing instruction in the SOURCE XML file. (This does not always work)
  • There is also an entirely undocumented feature: passing a ‘-tostring’ input argument transforms the inputs into a displayed text segment, rather than into a displayed URI; the transformed text is returned in the xResultURI output argument.

Note: internal comments within the Matlab code seem to indicate that XSLT is SAXON-based, so interested users might use SAXON’s documentation for accessing additional XSLT-related features/capabilities (also see this related thread).

Categories: Java, Low risk of breaking in future versions, Semi-documented feature, Stock Matlab function

Tags: , ,

Bookmark and SharePrint Print

6 Responses to Undocumented XML functionality

  1. Donn Shull says:

    Hi Yair,

    It seems that many people follow an unusal convention for the format of their xml files. Even The MathWork’s info.xml files don’t following the standard indented form for xml files. A quick way to “pretty print” an xml file is to use saveXML method of the DOM node ie:

    x = xmlread(which('info.xml'));
    x.saveXML(x.getDocumentElement)

    Thanks for the good work,

    Donn

    • Here’s a nicer “pretty-print”, which uses the undocumented InputSource input argument format of xlswrite:

      fReader=java.io.FileReader(java.io.File(which('info.xml')));
      xmlwrite(org.xml.sax.InputSource(fReader))
  2. xmlread does a great job of parsing an XML file, but I’ve found that actually extracting data from the DOM hierarchy is pretty slow. For example, the parseXML function in the xmlread help takes about 250 times longer than the call to xmlread itself and vast majority of that time is spent interrogating the DOM objects.

    Do you have any suggestions?

    • @James – DOM is normally used for small XML models; SAX is usually better for large models that can be processed sequentially. There are numerous SAX parsers available online that you can use in Matlab. Perhaps the most widely used open-source XML parser, which includes support fro both SAX and DOM, is Xerces, which is already pre-bundled in Matlab (take a look at the %matlabroot%/java/jarext/ folder), so you can use it in Matlab out-of-the-box. Other well-known XML support packages, namely Xalan and Saxon, are also pre-bundled.

    • @Yair, That certainly sounds like a better idea, but I’m not sure how to proceed. Could you indicate how to modify the parseXML function from the xmlread help so that it used the SAX parser rather than the DOM returned by xmlread? Or am I on the wrong track?

    • @James – I don’t have an immediate answer for you, it requires some investigation. This sounds like a good idea for a future article. If you cannot wait for this article to appear, you could contact me by email (link at the top right of this page) to discuss a short consulting gig.

Leave a Reply

Your email address will not be published. Required fields are marked *