Reading non-Latin text files

In the spirit of the Jewish New Year that begins tonight, I would like to share a workaround that I received from blog reader Ro’ee Gilron of Tel-Aviv University:

Matlab users who use non-Latin computer Locales are aware of the issues that the Matlab Command Window has had with such languages for many years. I am not sure whether these problems are due to the LTR nature of Hebrew/Arabic, or their use of a non-supported code-page, or some other reason. To this day (R2011b), I am not aware of any fix or workaround for these issues.

But it seems that in addition, Matlab has a problem reading files that contain text in these languages, even when the computer’s Locale is set correctly, to a Locale that supports the non-Latin text. This is where Ro’ee’s workaround helps. In his words:

To give some more background, this used to work with a 32bit system, and an older version of Matlab (7.1). Now it doesn’t. Saving the file in UTF-8 and using fopen and textscan instead of importdata gives me this:

nowords =
‘× ×™×›×˜×¨’
‘×œ×§×˜× ‘

The solution is as follows (requires Simulink):

1) Change system Locale to Hebrew:

(this doesn’t change the language of the OS etc.).

2) Change the encoding that Matlab uses:

They tell you not to, but I did… – you must change it to encoding that works for Hebrew:

Any other language should work as well (I hope…). For Hebrew the code that works for me is ISO_8859-8

3) You should now be able to read TXT files that have Hebrew characters in them.

>> a='הצלחה!'
a =
>> currentCharacterEncoding = slCharacterEncoding();
>> currentCharacterEncoding = get_param(0, 'CharacterEncoding')  % equivalent alternative
currentCharacterEncoding =
% Now modify the default encoding to something more useful
>> slCharacterEncoding('ISO_8859-8')
>> set_param(0, 'CharacterEncoding', 'ISO_8859-8');   % equivalent alternative
>> currentCharacterEncoding = slCharacterEncoding()
currentCharacterEncoding =
>> a='הצלחה!'
a =
!                  % still no good in the Command Window...
% Let's try to read a file with some Hebrew words:
>> neutral = importdata('neutral.txt')
neutral = 

So, it appears that while we did not solve the problems with the Command Window, at least we can now read the prayer book for our New Year prayers…

Let this be a year of fulfillment, prosperity, health and happiness to all. Shana Tova everybody!

Categories: Low risk of breaking in future versions, Stock Matlab function

Tags: ,

Bookmark and SharePrint Print

Leave a Reply

Your email address will not be published. Required fields are marked *