Parsing mlint (Code Analyzer) output

Mlint, Matlab’s static code-analysis parser, was written by Stephen Johnson (the original developer of the enormously successful lint parser for C/C++ back in 1977), when he was lured by MathWorks in 2002 to develop a similar tool for Matlab. Since its development (in R14 I believe), and especially since its incorporation in Matlab’s Editor in R2006a (Matlab 7.2), mlint has become a very important tool for reporting potential problems in m-files.

Unfortunately, to this day (R2013a), there is no documented manner of programmatically separating mlint warnings and errors, nor for accessing any of the multitude of features that are readily available in mlint. Naturally, there is (and has always been) an undocumented back door.

From its earliest beginnings, mlint has relied on C code (presumably modeled after lint). For many years mlint relied on a mex file (%matlabroot%/toolbox/matlab/codetools/mlintmex.mex*), which is basically just a wrapper for mlint.dll where the core algorithm resides. In recent releases, mlintmex, just like many other core mex files, was ported into a core Matlab library (libmwbuiltins.dll on Windows). However, the name and interface of the mlintmex function have remained unchanged over the years. Wrapping the core mlintmex function is the mlint m-function (%matlabroot%/toolbox/matlab/codetools/mlint.m) that calls mlintmex internally. In R2011b (Matlab 7.13) its official function name has changed to checkcode, although this was never documented in the release notes for some reason. However, using mlint still works even today. Wrapping all that is the mlintrpt function, which calls mlint/checkcode internally.

The core function mlintmex returns a long string with embedded newlines to separate the messages. For example:

>> str = mlintmex('perfTest.m')
str = 
L 3 (C 1): The value assigned to variable 'A' might be unused.
L 4 (C 1): The value assigned to variable 'B' might be unused.
L 5 (C 1-3): Variable 'ops', apparently a structure, is changed but the value seems to be unused.
L 12 (C 9): This statement (and possibly following ones) cannot be reached.
L 53 (C 19-25): The function 'subFunc' might be unused.
L 53 (C 27-35): Input argument 'iteration' might be unused. If this is OK, consider replacing it by ~.

We can parse this long string ourselves, but there is no need since mlint/checkcode do this for us, returning a struct array:

>> results = mlint('perfTest.m')
results = 
6x1 struct array with fields:
    message
    line
    column
    fix
>> results(5)
ans = 
    message: 'The function 'subFunc' might be unused.'
       line: 53
     column: [19 25]
        fix: 0

As can be seen, the message severity (warning/error) does not appear. This severity is obviously available since it is integrated in the Editor and the Code Analyzer report – orange for warnings, red for errors.

In one of my projects I needed to enable the user to dynamically create executable Matlab code that would then be run interactively. This enabled users to create dynamic data analyses functions without actually needing to know Matlab or to code all the nuts-and-bolts of a regular Matlab function. For this I needed to display warnings and errors-on-the-fly (the dynamic cell tooltips used a custom table cell-renderer). Here’s the end-result:


Analysis definition panel

Analysis definition panel


Dynamic analysis alert tooltips
Dynamic analysis alert tooltips

Dynamic analysis alert tooltips


My solution was to use mlintmex, as follows:

% Get the relevant message strings
errMsgs = mlintmex('-m2', srcFileName);
allMsgs = mlintmex('-m0', srcFileName);
 
% Parse the strings to find newline characters
numErrors = length(strfind(regexprep(errMsgs,'\*\*\*.*',''),char(10)));
numAllMsg = length(strfind(regexprep(allMsgs,'\*\*\*.*',''),char(10)));
numWarns = numAllMsg - numErrors;

(and from the messages themselves [errMsgs,allMsgs] I extracted the actual error/warning location)

Alternatively, I could have used mlint directly, as I have recently explained:

% Note that mlint returns struct arrays, so the following are all structs, not strings
errMsgs = mlint('-m2',srcFileNames); % m2 = errors only
m1Msgs  = mlint('-m1',srcFileNames); % m1 = errors and severe warnings only
allMsgs = mlint('-m0',srcFileNames); % m0 = all errors and warnings

The original information about mlintmex and the undocumented -m0/m1/m2 options came from Urs (us) Schwartz, whose contributions are an endless source of such gems. Urs also provided a list of other undocumented mlint options (the comment annotations are mostly mine):

'-all'        % ???
'-allmsg'     % display the full list of possible mlint messages and their codes
'-amb'        % display all possibly-ambiguous identifiers (variable/function)
'-body'       % ???
'-callops'    % display the internal call tree, with nesting levels and function types
'-calls'      % (looks similar to -callops, not sure what the difference is)
'-com'        % ???
'-cyc'        % display McCabe complexity value of all functions in the analyzed file
% '-db'       % == -set + -ud + -tab
'-dty'        % debug info for the mlint parsing tree
'-edit'       % display all encountered identifiers and their assumed types
'-en'         % messages in English
'-id'         % display the mlint code associated with each message
'-ja'         % messages in Japanese
'-lex'        % display the LEX parse-tree for the analyzed file
'-m0'         % + other opt
'-m1'         % + other opt
'-m2'         % + other opt
'-m3'         % + other opt
'-mess'       % debug info for mlint message-reporting (start/end locations etc.)
'-msg'        % (looks similar to -allmsg above, not sure what the difference is)
'-notok'      % disregard %#ok directives and report messages on lines having them
'-pf'         % ???
'-set'        % debug info for the mlint parsing tree
'-spmd'       % ??? (presumably display SPMD-related messages)
'-stmt'       % display the number of statements in each function within the analyzed file
'-tab'        % set-by/used-by table for all identifiers (see -edit)
'-tmtree'     % not valid anymore
'-tmw'        % not valid anymore
'-toks'       % ???
'-tree'       % debug info for the mlint parsing tree
'-ty'         % display the line numbers where each of the file's identifiers are used
'-ud'         % debug info for the mlint parsing tree
'-yacc'       % ONLY: !mlint FILE -yacc -...

to which were added in recent years ‘-eml’, ‘-codegen’ etc. – see the checkcode doc page. Also note that not all Matlab releases support all options. For example, ‘-tmw’ is ignored in R2013a, returning the same data as ‘-all’ plus a warning about the ignored option.

Urs prepared a short utility called doli that accepts an m-file name and returns a struct whose fields are the respective outputs of mlint for each of the corresponding options:

>> results = doli('perfTest.m')
MLINT >   C:\Yair\Books\MATLAB Performance Tuning\Code\perfTest.m
OPTION>   -all       6
OPTION>   -allmsg    501
OPTION>   -amb       17
OPTION>   -body      6
OPTION>   -callops   15
OPTION>   -calls     15
OPTION>   -com       6
OPTION>   -cyc       8
OPTION>   -dty       162
OPTION>   -edit      92
OPTION>   -en        7
...

Some of these options are used by Urs’ farg and fdep utilities. Their usage of mlint rather than direct m-code parsing, is part of the reason that these functions are so lightningly fast.

For example, we can use the ‘-calls’ options to parse an m-file and get the names, type, and code location of its contained functions (explanation):

>> mlint('-calls','perfTest.m')
M0 1 10 perfTest
E0 51 3 perfTest
U1 3 5 randi
U1 4 5 num2cell
U1 4 14 randn
U1 6 1 whos
U1 7 1 tic
U1 7 6 save
U1 7 45 toc
U1 9 6 savefast
S0 53 19 subFunc
E0 60 3 subFunc
U1 55 8 isempty
U1 56 20 load
U1 57 29 sin

With so many useful features, I really cannot understand why they were never exposed to the public in a documented manner. After all, they have remained pretty-much unchanged for many years and can provide enormous benefits for developers of unit-tests and interactive analysis frameworks (as I have shown above).

As a side-note, in R2010a (Matlab 7.10), mlint was renamed “Code Analyzer”, but this was really just a name change – its core functionality has changed little in the past decade. Some might argue that new checks were added and the Editor interface has improved by allowing auto-fixes and message suppression. But for a tool that is over a decade old (much more, if you count lint’s development), I contend that these are not much. Don’t get me wrong – I have the utmost respect for Steve. Serious unix C/C++ development relies on his lint and yacc tools on a regular basis. I think they show astonishing ingenuity and intelligence. It’s just that I had expected more after a decade of mlint development (I bet it’s not due to Steve suddenly losing the touch).

Addendum: A little birdie tells me that Steve left MathWorks a few years ago, which does explain things… I apologize to Steve for any misguided snide on my part. As I said above, I have nothing but the utmost respect for his work. The question of why MathWorks left his mlint work hanging without serious continuation remains open.

Addendum 2: Additional and much more detailed information about the nature of functions can be found using the semi-documented mtree function (or rather, Matlab class: %matlabroot%/toolbox/matlab/codetools/@mtree/mtree.m). This is a huge class-file (3200+ lines of code) that is well worth a dedicated future article, so stay tuned…

Categories: Medium risk of breaking in future versions, Mex, Stock Matlab function, Undocumented feature

Tags: , , , ,

Bookmark and SharePrint Print

7 Responses to Parsing mlint (Code Analyzer) output

  1. Zipuni says:

    Awesome post! What I have been actually wondering is if one can augment the settings of Code Analyzer so that a group can enforce their own programming best practices such as camelBack notation, load() with left side assignment (no magic variables), no eval() etc.

    I would basically love to go under Preferences->Code Analyzer-> Default Settings and add completely new tests, not just enable/disable the existing ones ..

    Of course, one can write their own parser for MATLAB but it’s rather hard with all the syntactic sugar and weak types …

    • @Zipuni – that’s a very tall order you have there. Note that [almost] everything about mlint is undocumented… In theory, you might be able to use mlint’s lex and yacc outputs for this. But I must say that I believe this to be quite a challenge…

  2. Jim Hokanson says:

    Incredible.

  3. Matt B. says:

    In finding a solution for a StackOverflow question asking “Is there a way to fix all MATLAB mlint messages at once?”, I discovered another mlint flag by trial and error: -fix. It exposes the autofix hints and changes. As an example:

    >> checkcode(matlab.desktop.editor.getActiveFilename(),'-fix')
    L 2 (C 3): Terminate statement with semicolon to suppress output (in functions). (CAN FIX)
    ----FIX MESSAGE <Add a semicolon.>
    ----CHANGE MESSAGE L 2 (C 13); L 2 (C 12): <;>
    L 30 (C 52-53): Input argument 'in' might be unused. If this is OK, consider replacing it by ~. (CAN FIX)
    ----FIX MESSAGE <Replace name by ~.>
    ----CHANGE MESSAGE L 30 (C 52); L 30 (C 53): <~>

  4. Pingback: Function definition meta-info | Undocumented Matlab

  5. The flag -pf displays parfor related messages.

  6. Ed Yu says:

    Hi Yair,

    Thank you for this posting… Recently I have delivered a MATLAB database product to a client and I need to check my source code to minimize errors. One of the lacking thing about MATLAB is that it is not very good at capturing the usage of “undefined” variables. Basically I wanted to know if I use a variable that was undefined (due to typing mistakes). I understand the interpreted nature of MATLAB and don’t expect it to behave like a compiled program like Java. But still, MATLAB should provide some help in this regard. When I look into mlint it seems to be useful but when I try to turn on the preference option:

    Code Analyzer cannot determine whether <name> is a variable or a function, and assumes it is a function.
    

    The report came back with more than the preset 500 warnings and truncated the report. This is very annoying as I work with GUIDE output and it usually contains a couple of thousand lines for a more than academic data entry screen. Also the warnings are mostly due to the internal DLL unable to recognize simple MATLAB commands such as get, set, true, false, upper, lower, figure, close, delete, strcmp, regexrep, etc.

    So I ended up copying mlintrpt.m and modifying it to check the output message <name> and see if I can resolve it using command ‘which’ and if it does, just filter out the message. This produces a pretty clean and useful mlint report because I actually found a couple of typos in variable names that would otherwise be delivered to clients until they hit that line of code and matlab produces the “ding” sound. I do log these error messages in a log file but the user won’t necessary know the application has erred out because they have no speakers attached to their computers or turn the sound is turned off.

    If anyone wants to know what I did, just contact me.

Leave a Reply


Your email address will not be published. Required fields are marked *