Database – Undocumented Matlab https://undocumentedmatlab.com/blog_old Charting Matlab's unsupported hidden underbelly Tue, 29 Oct 2019 15:26:09 +0000 en-US hourly 1 https://wordpress.org/?v=4.4.1 Using SQLite in Matlabhttps://undocumentedmatlab.com/blog_old/using-sqlite-in-matlab https://undocumentedmatlab.com/blog_old/using-sqlite-in-matlab#respond Wed, 27 Dec 2017 21:53:54 +0000 https://undocumentedmatlab.com/?p=7255 Related posts:
  1. Speeding up Matlab-JDBC SQL queries Fetching SQL ResultSet data from JDBC into Matlab can be made significantly faster. ...
  2. Types of undocumented Matlab aspects This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
  3. ScreenCapture utility The ScreenCapture utility uses purely-documented Matlab for capturing a screen region as an image from within Matlab. ...
  4. Smart listbox & editbox scrollbars Matlab listbox and multi-line editbox scrollbars can easily be made smarter, for improved appearance. ...
]]>
MathWorks invests a huge amount of effort in recent years on supporting large distributed databases. The business case for this focus is entirely understandable, but many Matlab users have much simpler needs, which are often served by the light-weight open-source SQLite database (which claims to be the most widely-used database worldwide). Although SQLite is very widely used, and despite the fact that built-in support for SQLite is included in Matlab (for its internal use), MathWorks has chosen not to expose any functionality or wrapper function that would enable end-users to access it. In any case, I recently came across a need to do just that, when a consulting client asked me to create an interactive data-browser for their SQLite database that would integrate with their Matlab program:
SQLite data browser

In today’s post I will discuss several possible mechanisms to integrate SQLite in Matlab code, and you can take your pick among them. Except for the Database Toolbox, all the alternatives are free (open-source) libraries (even the commercial Database Toolbox relies on one of the open-source libraries, by the way).

sqlite4java

sqlite4java is a Java package by ALM Works that is bundled with Matlab for the past several years (in the %matlabroot%/java/jarext/sqlite4java/ folder). This is a lightweight open-source package that provides a minimalist and fast (although not very convenient) interface to SQLite. You can either use the package that comes with your Matlab installation, or download and use the latest version from the project repository, where you can also find documentation.

Mark Mikofski exposed this hidden nugget back in 2015, and you are welcome to view his post for additional details. Here’s a sample usage:

% Open the DB data file
db = com.almworks.sqlite4java.SQLiteConnection(java.io.File('C:\Yair\Data\IGdb 2017-11-13.sqlite'));
db.open;
 
% Prepare an SQL query statement
stmt = db.prepare(['select * from data_table where ' conditionStr]);
 
% Step through the result set rows
row = 1;
while stmt.step
   numericValues(row) = stmt.columnInt(0);    % column #0
   stringValues{row}  = stmt.columnString(1); % column #1
end
 
% Cleanup
stmt.dispose
db.dispose

Note that since sqlite4java uses a proprietary interface (similar, but not identical, to JDBC), it can take a bit of time to get used to it. I am generally a big fan of preferring built-in components over externally-installed ones, but in this particular case I prefer other alternatives.

JDBC

JDBC (Java Database Connectivity) is the industry standard for connectivity to databases. Practically all databases nowadays have at least one JDBC connector, and many DBs have multiple JDBC drivers created by different groups. As long as they all adhere to the JDBC interface standard, these drivers are all equivalent and you can choose between them based on availability, cost, support, license, performance and other similar factors. SQLite is no exception to this rule, and has several JDBC driver implementations, including xerial’s sqlite-jdbc (also discussed by Mark Mikofski) and sqlitejdbc. If you ask me,
sqlite-jdbc is better as it is being maintained with new versions released periodically.

The example above would look something like this with sqlite-jdbc:

% Add the downloaded JAR library file to the dynamic Java classpath
javaaddpath('C:\path\to\sqlite\sqlite-jdbc-3.21.0.jar')
 
% Open the DB file
jdbc = org.sqlite.JDBC;
props = java.util.Properties;
conn = jdbc.createConnection('jdbc:sqlite:C:\Yair\Data\IGdb 2017-11-13.sqlite',props);  % org.sqlite.SQLiteConnection object
 
% Prepare and run an SQL query statement
sqlStr = ['select * from data_table where ' conditionStr];
stmt = conn.createStatement;     % org.sqlite.jdbc4.JDBC4Statement object
rs = stmt.executeQuery(sqlStr);  % org.sqlite.jdbc4.JDBC4ResultSet object
 
% Step through the result set rows
rows = 1;
while rs.next
   numericValues(row) = rs.getLong('ID');
   stringValues{row}  = rs.getString('Name');
end
 
% Cleanup
rs.close
stmt.close
conn.close

Database toolbox

In addition to all the above, MathWorks sells the Database Toolbox which has an integral SQLite connector, in two flavors – native and JDBC (the JDBC connector is simply sqlite-jdbc that I mentioned above, see a short discussion here).

I assume that the availability of this feature in the DB toolbox is the reason why MathWorks has never created a documented wrapper function for the bundled sqlite4java. I could certainly understand this from a business perspective. Still, with so many free alternatives available as discussed in this post, I see not reason to purchase the toolbox merely for its SQLite connector. Then again, if you need to connect to several different database types, not just SQLite, then getting the toolbox might make sense.

mksqlite

My personal favorite is actually none of these Java-based connectors (surprise, surprise), but rather the open-source mksqlite connector by Martin Kortmann and Andreas Martin. This is a native (Mex-file) connector that acts as a direct Matlab function. The syntax is pretty straight-forward and supports SQL queries. IMHO, its usage is a much simpler than with any of the other alternatives:

% Open the DB file
mksqlite('open', 'C:\Yair\Data\IGdb 2017-11-13.sqlite');
 
% Query the database
results = mksqlite(['select * from data_table where ' conditionStr]);
numericValues = [results.ID];
stringValues  = {results.Name};
 
% Cleanup
mksqlite('close');

Can it be any simpler than this!?

However, the main benefit of mksqlite over the other connectors is not its simplicity but the connector’s speed. This speed is due to the fact that the query is vectorized and we do not need to loop over all the separate data rows and fields. With the other connectors, it is actually not the loop that takes so long in Matlab, but rather the overheads and inefficiencies of numerous library calls to fetch one single value at a time from the result-set – this is avoided in mksqlite where there is only a single call. This results in lightning speed: A couple of years ago I consulted to a client who used a JDBC connector to an SQLite database; by switching from a JDBC connector to mksqlite, I reduced the execution time from 7 secs to 70 msecs – a 100x speedup! In that specific case, this made the difference between an unusable program and a highly interactive/responsive one.

Other alternatives

In addition to all the above, we can also use a .NET-based connector or a Python one – I leave these as an exercise for the reader…

Have I forgotten some important alternative? Or perhaps you have have some related tip you’d like to share? If so, then please leave a comment below.

Happy New Year everybody!

]]>
https://undocumentedmatlab.com/blog_old/using-sqlite-in-matlab/feed 0
Speeding up Matlab-JDBC SQL querieshttps://undocumentedmatlab.com/blog_old/speeding-up-matlab-jdbc-sql-queries https://undocumentedmatlab.com/blog_old/speeding-up-matlab-jdbc-sql-queries#comments Wed, 16 Nov 2016 11:43:17 +0000 http://undocumentedmatlab.com/?p=6742 Related posts:
  1. Using SQLite in Matlab SQLite databases can be accessed in a variety of different ways in Matlab. ...
  2. Borderless button used for plot properties A borderless button can be used to add unobtrusive functionality to plot axes...
  3. Matlab-Java memory leaks, performance Internal fields of Java objects may leak memory - this article explains how to avoid this without sacrificing performance. ...
  4. File deletion memory leaks, performance Matlab's delete function leaks memory and is also slower than the equivalent Java function. ...
]]>
Many of my consulting projects involve interfacing a Matlab program to an SQL database. In such cases, using MathWorks’ Database Toolbox is a viable solution. Users who don’t have the toolbox can also easily connect directly to the database using either the standard ODBC bridge (which is horrible for performance and stability), or a direct JDBC connection (which is also what the Database Toolbox uses under the hood). I explained this Matlab-JDBC interface in detail in chapter 2 of my Matlab-Java programming book. A bare-bones implementation of an SQL SELECT query follows (data update queries are a bit different and will not be discussed here):

% Load the appropriate JDBC driver class into Matlab's memory
% (but not directly, to bypass JIT pre-processing - we must do it in run-time!)
driver = eval('com.mysql.jdbc.Driver');  % or com.microsoft.sqlserver.jdbc.SQLServerDriver or whatever
 
% Connect to DB
dbPort = '3306'; % mySQL=3306; SQLServer=1433; Oracle=...
connectionStr = ['jdbc:mysql://' dbURL ':' dbPort '/' schemaName];  % or ['jdbc:sqlserver://' dbURL ':' dbPort ';database=' schemaName ';'] or whatever
dbConnObj = java.sql.DriverManager.getConnection(connectionStr, username, password);
 
% Send an SQL query statement to the DB and get the ResultSet
stmt = dbConnObj.createStatement(java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE, java.sql.ResultSet.CONCUR_READ_ONLY);
try stmt.setFetchSize(1000); catch, end  % the default fetch size is ridiculously small in many DBs
rs = stmt.executeQuery(sqlQueryStr);
 
% Get the column names and data-types from the ResultSet's meta-data
MetaData = rs.getMetaData;
numCols = MetaData.getColumnCount;
data = cell(0,numCols);  % initialize
for colIdx = numCols : -1 : 1
    ColumnNames{colIdx} = char(MetaData.getColumnLabel(colIdx));
    ColumnType{colIdx}  = char(MetaData.getColumnClassName(colIdx));  % http://docs.oracle.com/javase/7/docs/api/java/sql/Types.html
end
ColumnType = regexprep(ColumnType,'.*\.','');
 
% Get the data from the ResultSet into a Matlab cell array
rowIdx = 1;
while rs.next  % loop over all ResultSet rows (records)
    for colIdx = 1 : numCols  % loop over all columns in the row
        switch ColumnType{colIdx}
            case {'Float','Double'}
                data{rowIdx,colIdx} = rs.getDouble(colIdx);
            case {'Long','Integer','Short','BigDecimal'}
                data{rowIdx,colIdx} = double(rs.getDouble(colIdx));
            case 'Boolean'
                data{rowIdx,colIdx} = logical(rs.getBoolean(colIdx));
            otherwise %case {'String','Date','Time','Timestamp'}
                data{rowIdx,colIdx} = char(rs.getString(colIdx));
        end
    end
    rowIdx = rowIdx + 1;
end
 
% Close the connection and clear resources
try rs.close();   catch, end
try stmt.close(); catch, end
try dbConnObj.closeAllStatements(); catch, end
try dbConnObj.close(); catch, end  % comment this to keep the dbConnObj open and reuse it for subsequent queries

Naturally, in a real-world implementation you also need to handle database timeouts and various other errors, handle data-manipulation queries (not just SELECTs), etc.

Anyway, this works well in general, but when you try to fetch a ResultSet that has many thousands of records you start to feel the pain – The SQL statement may execute much faster on the DB server (the time it takes for the stmt.executeQuery call), yet the subsequent double-loop processing to fetch the data from the Java ResultSet object into a Matlab cell array takes much longer.

In one of my recent projects, performance was of paramount importance, and the DB query speed from the code above was simply not good enough. You might think that this was due to the fact that the data cell array is not pre-allocated, but this turns out to be incorrect: the speed remains nearly unaffected when you pre-allocate data properly. It turns out that the main problem is due to Matlab’s non-negligible overhead in calling methods of Java objects. Since the JDBC interface only enables retrieving a single data item at a time (in other words, bulk retrieval is not possible), we have a double loop over all the data’s rows and columns, in each case calling the appropriate Java method to retrieve the data based on the column’s type. The Java methods themselves are extremely efficient, but when you add Matlab’s invocation overheads the total processing time is much much slower.

So what can be done? As Andrew Janke explained in much detail, we basically need to push our double loop down into the Java level, so that Matlab receives arrays of primitive values, which can then be processed in a vectorized manner in Matlab.

So let’s create a simple Java class to do this:

// Copyright (c) Yair Altman UndocumentedMatlab.com
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Types;
 
public class JDBC_Fetch {
 
	public static int DEFAULT_MAX_ROWS = 100000;   // default cache size = 100K rows (if DB does not support non-forward-only ResultSets)
 
	public static Object[] getData(ResultSet rs) throws SQLException {
		try {
			if (rs.last()) {  // data is available
				int numRows = rs.getRow();    // row # of the last row
				rs.beforeFirst();             // get back to the top of the ResultSet
				return getData(rs, numRows);  // fetch the data
			} else {  // no data in the ResultSet
				return null;
			}
		} catch (Exception e) {
			return getData(rs, DEFAULT_MAX_ROWS);
		}
	}
 
	public static Object[] getData(ResultSet rs, int maxRows) throws SQLException {
		// Read column number and types from the ResultSet's meta-data
		ResultSetMetaData metaData = rs.getMetaData();
		int numCols = metaData.getColumnCount();
		int[] colTypes = new int[numCols+1];
		int numDoubleCols = 0;
		int numBooleanCols = 0;
		int numStringCols = 0;
		for (int colIdx = 1; colIdx <= numCols; colIdx++) {
			int colType = metaData.getColumnType(colIdx);
			switch (colType) {
				case Types.FLOAT:
				case Types.DOUBLE:
				case Types.REAL:
					colTypes[colIdx] = 1;  // double
					numDoubleCols++;
					break;
				case Types.DECIMAL:
				case Types.INTEGER:
				case Types.TINYINT:
				case Types.SMALLINT:
				case Types.BIGINT:
					colTypes[colIdx] = 1;  // double
					numDoubleCols++;
					break;
				case Types.BIT:
				case Types.BOOLEAN:
					colTypes[colIdx] = 2;  // boolean
					numBooleanCols++;
					break;
				default: // 'String','Date','Time','Timestamp',...
					colTypes[colIdx] = 3;  // string
					numStringCols++;
			}
		}
 
		// Loop over all ResultSet rows, reading the data into the 2D matrix caches
		int rowIdx = 0;
		double [][] dataCacheDouble  = new double [numDoubleCols] [maxRows];
		boolean[][] dataCacheBoolean = new boolean[numBooleanCols][maxRows];
		String [][] dataCacheString  = new String [numStringCols] [maxRows];
		while (rs.next() && rowIdx < maxRows) {
			int doubleColIdx = 0;
			int booleanColIdx = 0;
			int stringColIdx = 0;
			for (int colIdx = 1; colIdx <= numCols; colIdx++) {
				try {
					switch (colTypes[colIdx]) {
						case 1:  dataCacheDouble[doubleColIdx++][rowIdx]   = rs.getDouble(colIdx);   break;  // numeric
						case 2:  dataCacheBoolean[booleanColIdx++][rowIdx] = rs.getBoolean(colIdx);  break;  // boolean
						default: dataCacheString[stringColIdx++][rowIdx]   = rs.getString(colIdx);   break;  // string
					}
				} catch (Exception e) {
					System.out.println(e);
					System.out.println(" in row #" + rowIdx + ", col #" + colIdx);
				}
			}
			rowIdx++;
		}
 
		// Return only the actual data in the ResultSet
		int doubleColIdx = 0;
		int booleanColIdx = 0;
		int stringColIdx = 0;
		Object[] data = new Object[numCols];
		for (int colIdx = 1; colIdx <= numCols; colIdx++) {
			switch (colTypes[colIdx]) {
				case 1:   data[colIdx-1] = dataCacheDouble[doubleColIdx++];    break;  // numeric
				case 2:   data[colIdx-1] = dataCacheBoolean[booleanColIdx++];  break;  // boolean
				default:  data[colIdx-1] = dataCacheString[stringColIdx++];            // string
			}
		}
		return data;
	}
}

So now we have a JDBC_Fetch class that we can use in our Matlab code, replacing the slow double loop with a single call to JDBC_Fetch.getData(), followed by vectorized conversion into a Matlab cell array (matrix):

% Get the data from the ResultSet using the JDBC_Fetch wrapper
data = cell(JDBC_Fetch.getData(rs));
for colIdx = 1 : numCols
   switch ColumnType{colIdx}
      case {'Float','Double'}
          data{colIdx} = num2cell(data{colIdx});
      case {'Long','Integer','Short','BigDecimal'}
          data{colIdx} = num2cell(data{colIdx});
      case 'Boolean'
          data{colIdx} = num2cell(data{colIdx});
      otherwise %case {'String','Date','Time','Timestamp'}
          %data{colIdx} = cell(data{colIdx});  % no need to do anything here!
   end
end
data = [data{:}];

On my specific program the resulting speedup was 15x (this is not a typo: 15 times faster). My fetches are no longer limited by the Matlab post-processing, but rather by the DB’s processing of the SQL statement (where DB indexes, clustering, SQL tuning etc. come into play).

Additional speedups can be achieved by parsing dates at the Java level (rather than returning strings), as well as several other tweaks in the Java and Matlab code (refer to Andrew Janke’s post for some ideas). But certainly the main benefit (the 80% of the gain that was achieved in 20% of the worktime) is due to the above push of the main double processing loop down into the Java level, leaving Matlab with just a single Java call to JDBC_Fetch.

Many additional ideas of speeding up database queries and Matlab programs in general can be found in my second book, Accelerating Matlab Performance.

If you’d like me to help you speed up your Matlab program, please email me (altmany at gmail), or fill out the query form on my consulting page.

]]>
https://undocumentedmatlab.com/blog_old/speeding-up-matlab-jdbc-sql-queries/feed 7
Secure SSL connection between Matlab and PostgreSQLhttps://undocumentedmatlab.com/blog_old/secure-ssl-connection-between-matlab-and-postgresql https://undocumentedmatlab.com/blog_old/secure-ssl-connection-between-matlab-and-postgresql#respond Fri, 18 Mar 2016 10:39:49 +0000 http://undocumentedmatlab.com/?p=6318 Related posts:
  1. Matlab’s internal memory representation Matlab's internal memory structure is explored and discussed. ...
  2. Matlab compilation quirks – take 2 A few hard-to-trace quirks with Matlab compiler outputs are explained. ...
  3. Customizing uiundo This article describes how Matlab's undocumented uiundo undo/redo manager can be customized...
  4. JMI wrapper – remote MatlabControl An example using matlabcontrol for calling Matlab from a separate Java process is explained....
]]>
I’d like to introduce guest blogger Jeff Mandel of the Perelman School of Medicine at the University of Pennsylvania. Today Jeff will discuss a how-to guide for setting up an SSL connection between Matlab and a PostgreSQL database. While this specific topic may be of interest to only a few readers, it involves hard-to-trace problems that are not well documented anywhere. The techniques discussed below may also be applicable, with necessary modifications, to other SSL targets and may thus be of use to a wider group of Matlab users.

PostgreSQL database
I’m developing software for pharmacokinetic control, and needed secure access to a central database from users at remote sites. The client software is written in Matlab, and while I have targeted MacOS, this could be adapted to Windows fairly easily. Hopefully, this will save someone the week it took me to figure all this out.

My environment:

  • PostgreSQL 9.4 installed on the server (Windows 7 PC, but Linux would be equally good)
  • DynDNS CNAME pointing at the server (diseserver.mydomain.org)
  • CACert.org registration for domain mydomain.org
  • Matlab 2015b running on El Capitan

Here are the neccesary steps:

  1. First, we need a certificate for the server. We can generate this with OpenSSL:
    $openssl req -out diseserver.csr -new -newkey rsa:2048 -nodes -keyout diseserver.key

    Specify any information you want on the key, but ensure CN=diseserver.mydomain.org.

  2. Paste the resulting diseserver.csr file into a new key request at CACert.org. Save the resulting certificate as diseserver.crt on your machine.
  3. While still at CACert.org, grab the Class 1 root certificate and save it as root.crt.
  4. Put the files diseserver.key, diseserver.crt, and root.crt in the PostgreSQL data directory.
  5. Edit your postgresql.conf file:
    ssl = on
    ssl_cert_file = 'diseserver.crt'  # (change requires restart)
    ssl_key_file  = 'diseserver.key'  # (change requires restart)
    ssl_ca_file   = 'root.crt'        # (change requires restart)
  6. Restart the PostgreSQL server. The server will now permit SSL connections, a necessary pre-condition for certificate authentication.
  7. We now add 2 lines to pg_hba.conf:
    hostnossl  all    all   0.0.0.0/0   reject
    hostssl	 mytable  all   0.0.0.0/0   cert map=ssl clientcert=1

    The first line causes all non-SSL connections to be rejected. The second allows certificate logins for mytable using the map ssl that is defined in pg_ident.conf:

    ssl  /^(.*).mydomain\.org$ \1

    this line extracts the username prefix from CN=username.mydomain.org.

  8. Now we need to generate client certificates. PostgreSQL expects these to be in ~/.postgresql (Windows %appdata%\postgresql\):
    $mkdir ~/.postgresql
    $cd ~/.postgresql
    $openssl req -out postgresql.csr -new -newkey rsa:2048 -nodes -keyout postgresql.key

    for this key, make CN=username.mydomain.org.

  9. Again, paste the resulting postgresql.csr file into CACert.org, saving the certificate as postgresql.crt.
  10. Test this:
    $psql "sslmode=verify-full host=diseserver.mydomain.org dbname=effect user=username"

    The server should respond:

    psql (9.4.6, server 9.4.1)
    SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
  11. Next we need to convert our key into pkcs8 format so that Java can read it:
    $openssl pkcs8 -topk8 -inform PEM -outform DER -in postgresql.key -out postgresql.pk8 -nocrypt
  12. Next, ensure that we have the correct version of the JDBC driver (Java-to-database connector). From the Mac command line:
    $java -version
    java version "1.8.0_05"

    and in Matlab:

    >> version -java
    ans =
    Java 1.7.0_75-b13 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode

    This shows that although we have Java 8 installed on El Capitan (at the OS level), Matlab uses a private Java 7 version. So we need the correct version of the jdbc on our static java classpath that is used by Matlab:

    ~/Matlab/postgresql-9.4.1208.jre7.jar
  13. The next part is very poorly documented in both the MathWorks and the PostgreSQL documentation, but I found it in Russel Gray’s Basildon Coder blog: We need to use the jdbc postgresql driver to check the client certificate. To do this, we need a custom SSLSocketFactory – LibPQFactory. This will grab our certificate and key from ~/.postgresql and present them to the server. The url is (note the trailing &):
    jdbc:postgresql://diseserver.mydomain.org/mytable?ssl=true&sslfactory=org.postgresql.ssl.jdbc4.LibPQFactory&sslmode=verify-full&
  14. Next we need the username. Rather than hard-coding this in the source code, we get the system username:
    >> username = java.lang.System.getProperty('user.name');
  15. Bundle this all up in a Matlab function, stripping the trailing CR from the username:
    function dbtest
       driver = 'org.postgresql.Driver';
       [~,username] = system('whoami');
       url = 'jdbc:postgresql://diseserver.mydomain.org/mytable?ssl=true&sslfactory=org.postgresql.ssl.jdbc4.LibPQFactory&sslmode=verify-full&';
       myconn = database('mytable', username, '', driver, url);
       if ~isempty(myconn.Message)
          fprintf(2,'%s\n', myconn.Message);
       else
          fprintf(1, 'Connected!\n');
       end
    end

Now we can connect from the Matlab command line or a Matlab program.

What if we’re deployed? We also need to add the contents of our .postgresql directory, plus the jdbc jar file to our deployed app:

>> mcc -m dbtest.m -a ~/.postgresql -a ~/Matlab/postgresql-9.4.1208.jre7.jar

Let’s test the compiled program from the OS command line:

$./run_dbtest.sh /Applications/Matlab/Matlab_Runtime/v90
Connected!

Note that the key and certificates are part of the encrypted bundle produced by Matlab’s mcc compiler.

I hope this helps someone!

Yair’s note: the Matlab code above uses Matlab’s Database Toolbox (specifically, the database function) to connect to the database. In future posts I plan to show how we can connect Matlab directly to a database via JDBC. This topic is covered in detail in chapter 2 of my Matlab-Java programming secrets book.

p.s. – this blog celebrates a 7-year anniversary tomorrow: I published my very first post here on March 19, 2009, showing how to change Matlab’s command-window colors (a post that later led to the now-famous cprintf utility). It’s been a long and very interesting ride indeed, but I have no plans to retire anytime soon :-)

]]>
https://undocumentedmatlab.com/blog_old/secure-ssl-connection-between-matlab-and-postgresql/feed 0
Static Java classpath hackshttps://undocumentedmatlab.com/blog_old/static-java-classpath-hacks https://undocumentedmatlab.com/blog_old/static-java-classpath-hacks#comments Wed, 29 Jul 2015 08:00:05 +0000 http://undocumentedmatlab.com/?p=5945 Related posts:
  1. FindJObj – find a Matlab component’s underlying Java object The FindJObj utility can be used to access and display the internal components of Matlab controls and containers. This article explains its uses and inner mechanism....
  2. Matlab callbacks for Java events Events raised in Java code can be caught and handled in Matlab callback functions - this article explains how...
  3. Types of undocumented Matlab aspects This article lists the different types of undocumented/unsupported/hidden aspects in Matlab...
  4. Variables Editor scrolling The Matlab Variables Editor can be accessed to provide immediate scrolling to a specified cell location. ...
]]>
A few days ago I encountered a situation where I needed to incorporate a JAR file into Matlab’s static Java classpath. I came across several relevant hacks that I thought could be useful for others:

The documented approach

In most use-cases, adding Java classes (or ZIP/JAR files) to the dynamic Java classpath is good enough. This can easily be done in run-time using the javaaddpath function. In certain cases, for example where the Java code uses asynchronous events, the Java files need to be added to the static, rather than the dynamic, Java classpath. In other cases, the Java code misbehaves when loaded using Matlab’s dynamic Java classloader, rather than the system classloader (which is used for the static classpath (some additional info). One example of this are the JAR/ZIP files used to connect to various databases (each database has its own Java JDBC connector, but they all need to reside in the static Java classpath to work properly).

Adding class-file folders or ZIP/JAR files to the static Java classpath can be done in several manners: we can update the Matlab installation’s classpath.txt file, or (starting in R2012b) a user-prepared javaclasspath.txt file. Either of these files can be placed in the Matlab (or deployed application’s) startup folder, or the user’s Matlab preferences folder (prefdir). This is all documented.

There are several important drawbacks to this approach:

  1. Both classpath.txt and javaclasspath.txt do not accept relative paths, only full path-names. In other words, we cannot specify ../../libs/mylib.jar but only C:\Yair\libs\mylib.jar or similar variants. We can use the $matlabroot, $arch, and $jre_home place-holder macros, but not user folders. We can easily do this in our personal computer where the full path is known, but this is difficult to ensure if we deploy the application onto other computers.
    Matlab’s compiler only supports adding Java classes and JARs to the dynamic classpath, but not to the static one – we need to manually add a classpath.txt or javaclasspath.txt to the deployment folder, complete with the deployment target’s correct full path. Naturally this is problematic when we wish to grant the user the flexibility of installing the deployed application anywhere on their computer disk.
  2. The javaclasspath.txt file is only supported on R2012b onward. The classpath.txt file is supported in both old and new releases, but (1) modifying it in Matlab’s installation folder may require system privileges, and (2) require update upon each and every Matlab reinstallation or upgrade. A classpath.txt file from one Matlab release will NOT be compatible with another Matlab release – if we try using an incorrect file Matlab will crash or hang. If you are not using just a single matlab release, this quickly turns into a maintenance nightmare.
  3. The javaclasspath.txt file loads the user-specified classes after those loaded in Matlab’s pre-defined classpath.txt. If we edit The classpath.txt file, we can place our classes at the top of the file, but this again entails the maintenance nightmare described above

Luckily, there are a few hacks that can alleviate some of the pain when dealing with deployed applications that use static Java classes:

Fixing javaclasspath.txt in run-time

At the very top of our main function, we could fix (or create) the javaclasspath.txt file to include the current folder’s actual full path, and then relaunch the application and exit the current session. The newly-launched application will use this javaclasspath.txt file and everything should work well from that moment onward. Here’s a bare-bones implementation example:

function mainProgram()
   if ~exist('javaclasspath.txt','file')
      try
         fid = fopen('javaclasspath.txt', 'wt');
         fprintf(fid, [pwd '/classesFolder/']);  % add folder of *.class files
         fprintf(fid, [pwd '/subFolder/classes.jar']);  % add jar/zip file
         fclose(fid);
         system(['./' mfilename ' &']);
      catch err
         msg = sprintf('Could not create %s/javaclasspath.txt - error was: %s', pwd, err.message);
         uiwait(msgbox(msg, 'Error', 'warn'));  % need uiwait since otherwise exit below will delete the msgbox
      end
      exit
   end
 
   % If we reached this point, then everything is ok - proceed normally
   ...

A related solution is to recreate the classpath.txt file using the build-generation script, as described by Andrew Janke.

Preloading Java class files in javaclasspath.txt

Class folders and JAR/ZIP files can be loaded before those in classpath.txt, without having to modify the system’s classpath.txt, as exposed by Christopher Barber. All we need to do is to add a line containing the <before> string as a separate line in our javaclasspath.txt user-file. All entries beneath this line will be loaded into the front (rather than the end) of the static Java classpath. For example:

# This will be placed at the end of the static Java classpath
C:\placed\at\the\end\of\the\static\classpath\mylib1.jar
 
<before>
 
# This will be placed at the beginning of the static Java classpath
C:\placed\at\the\top\of\the\static\classpath\mylib2.jar

Loading Java class files into the static classpath in run-time (!)

The crown jewels in Java static classpath hacking belong without a doubt to power-users Andrew Janke and Amro. Based on Amro’s tip (based on a Java forum post from back in 2002), Andrew created a Matlab function that loads a Matlab class into the static classpath at run-time (i.e., without needing to modify/create either classpath.txt or javaclasspath.txt), slightly modified by me to include an optional classname input arg. Here’s the resulting javaaddpathstatic function:

function javaaddpathstatic(file, classname)
%JAVAADDPATHSTATIC Add an entry to the static classpath at run time
%
% javaaddpathstatic(file, classname)
%
% Adds the given file to the static classpath. This is in contrast to the
% regular javaaddpath, which adds a file to the dynamic classpath.
%
% Files added to the path will not show up in the output of
% javaclasspath(), but they will still actually be on there, and classes
% from it will be picked up.
%
% Caveats:
%  * This is a HACK and bound to be unsupported.
%  * You need to call this before attempting to reference any class in it,
%    or Matlab may "remember" that the symbols could not be resolved. Use
%    the optional classname input arg to let Matlab know this class exists.
%  * There is no way to remove the new path entry once it is added.
 
% Andrew Janke 20/3/2014 http://stackoverflow.com/questions/19625073/how-to-run-clojure-from-matlab/22524112#22524112
 
parms = javaArray('java.lang.Class', 1);
parms(1) = java.lang.Class.forName('java.net.URL');
loaderClass = java.lang.Class.forName('java.net.URLClassLoader');
addUrlMeth = loaderClass.getDeclaredMethod('addURL', parms);
addUrlMeth.setAccessible(1);
 
sysClassLoader = java.lang.ClassLoader.getSystemClassLoader();
 
argArray = javaArray('java.lang.Object', 1);
jFile = java.io.File(file);
argArray(1) = jFile.toURI().toURL();
addUrlMeth.invoke(sysClassLoader, argArray);
 
if nargin > 1
    % load the class into Matlab's memory (a no-args public constructor is expected for classname)
    eval(classname);
end

The usage is super simple: instead of using javaaddpath to load a class-files folder (or ZIP/JAR file) into Matlab’s dynamic Java classpath, we’d use javaaddpathstatic to load the same input into the static classpath – in runtime.

As Andrew notes, there are important caveats to using the new javaaddpathstatic function. One of the important caveats is that the class needs to be loaded before Matlab gets around to “discover” that it does not exist, because from then on even if we load the class into the static classpath, Matlab will report the class as missing. Therefore, we need to call javaaddpathstatic at the very top of the program (possibly even in our startup.m file). Moreover, if we directly call the class anywhere in the same m-file as our call to javaaddpathstatic, it will fail. The reason is that Matlab’s built-in interpreter processes the direct Java call when it pre-parses the file, before the javaaddpathstatic had a chance to run:

function this_Will_Fail()
   % mylib.jar contains the com.app.MyClass class
   javaaddpathstatic([pwd '/mylib.jar']);
   obj = com.app.MyClass;  % this will fail!

Instead, we need to prevent the interpreter from processing the class before run-time, by loading the class at run-time:

function this_Should_Work()
   % mylib.jar contains the com.app.MyClass class
   javaaddpathstatic([pwd '/mylib.jar']);
   obj = javaObject('com.app.MyClass');  % this should work ok

At least in one case, for a recent client project that required deploying a compiled Matlab application that used JDBC to connect to a database (and therefore relied on the static classpath), this saved the day for me.

I’m sad I wasn’t aware of this years ago, but now that I have posted it, we have less of an excuse to complain about static classpath files in Matlab. Hopefully MathWorks will incorporate this idea into Matlab so that we can more easily load classes into the static classpath in runtime, and remove the need for ugly deployment hacks (and yes, this includes MathWorks’ own toolboxes), hopefully removing the caveats listed above along the way.

Analyzing loaded Java classes in run-time

A related Matlab function, whereisjavaclassloadingfrom, was posted by Andrew Janke to report information about loaded classes, including whether they were loaded by the static (=standard Java) classloader, or the dynamic (=MathWorks’ custom) classloader, and from which location. For example:

% This is loaded into the static classpath as part of Matlab's standard classpath.txt file:
>> whereisjavaclassloadingfrom('java.util.HashMap')
Version: 8.6.0.232648 (R2015b)
Class:       java.util.HashMap
ClassLoader: sun.misc.Launcher$AppClassLoader@69365360
URL:         jar:file:/C:/Program%20Files/Matlab/R2015b/sys/java/jre/win64/jre/lib/rt.jar!/java/util/HashMap.class
 
% This is loaded into the dynamic classpath using MathWorks' custom classloader
>> javaaddpath('C:\Yair\Work\MultiClassTableModel')
>> whereisjavaclassloadingfrom('MultiClassTableModel')
Version: 8.6.0.232648 (R2015b)
Class:       MultiClassTableModel
ClassLoader: com.mathworks.jmi.CustomURLClassLoader@5270d6ad
URL:         file:/C:/Yair/Work/MultiClassTableModel/MultiClassTableModel.class
 
% This is an example of a class that is not loaded
>> whereisjavaclassloadingfrom('no.such.Class')
Error using javaArray
No class no.such.Class can be located on the Java class path
Error in whereisjavaclassloadingfrom (line 25)

Parting words

How risky is all this in terms of future compatibility? well, I’m not a prophet, but I’d say that using these hacks is pretty risky in this regard. Matlab’s custom classloader is way-deep in undocumented territory. This does not mean that the functionality will change tomorrow, but don’t be surprised if and when it does.

Have you ever used any additional undocumented hack relating to using the Java classpaths in Matlab? If so then please post a comment below.

Users who have endured my post this far, should probably also read my related article about Java class access pitfalls.

]]>
https://undocumentedmatlab.com/blog_old/static-java-classpath-hacks/feed 3