- Undocumented Matlab - https://undocumentedmatlab.com/blog_old -

Reading non-Latin text files

Posted By Yair Altman On September 28, 2011 | 2 Comments

In the spirit of the Jewish New Year that begins tonight, I would like to share a workaround that I received from blog reader Ro’ee Gilron of Tel-Aviv University:

Matlab users who use non-Latin computer Locales are aware of the issues that the Matlab Command Window has had with such languages for many years. I am not sure whether these problems are due to the LTR nature of Hebrew/Arabic, or their use of a non-supported code-page, or some other reason. To this day (R2011b), I am not aware of any fix or workaround for these issues.

But it seems that in addition, Matlab has a problem reading files that contain text in these languages, even when the computer’s Locale is set correctly, to a Locale that supports the non-Latin text. This is where Ro’ee’s workaround helps. In his words:

To give some more background, this used to work with a 32bit system, and an older version of Matlab (7.1). Now it doesn’t. Saving the file in UTF-8 and using fopen and textscan instead of importdata gives me this:

nowords =
‘שלבק’
‘התלכב’
‘× ×™×›×˜×¨’
‘תלפורש’
‘×œ×§×˜× ‘
‘מזוחש’
‘שלטיק’
‘טיבר’
‘עולג’
‘סלבוחד’
‘משוחגות’
‘מלוגסות’
‘סבק’
‘צמשר’
‘הכריב’
‘תמציל’

The solution is as follows (requires Simulink):

1) Change system Locale to Hebrew: http://windows.microsoft.com/en-US/windows7/Change-the-system-locale [3]

(this doesn’t change the language of the OS etc.).

2) Change the encoding that Matlab uses:
http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html [4]

They tell you not to, but I did… – you must change it to encoding that works for Hebrew: http://www.iana.org/assignments/character-sets [5]

Any other language should work as well (I hope…). For Hebrew the code that works for me is ISO_8859-8

3) You should now be able to read TXT files that have Hebrew characters in them.

>> a='הצלחה!'
a =
!
 
>> currentCharacterEncoding = slCharacterEncoding();
>> currentCharacterEncoding = get_param(0, 'CharacterEncoding')  % equivalent alternative
currentCharacterEncoding =
windows-1252
 
% Now modify the default encoding to something more useful
>> slCharacterEncoding('ISO_8859-8')
>> set_param(0, 'CharacterEncoding', 'ISO_8859-8');   % equivalent alternative
 
>> currentCharacterEncoding = slCharacterEncoding()
currentCharacterEncoding =
ISO-8859-8
 
>> a='הצלחה!'
a =
!                  % still no good in the Command Window...
 
% Let's try to read a file with some Hebrew words:
>> neutral = importdata('neutral.txt')
neutral = 
שולחן'
    'כסא'
    'מנורה'
    'צלחת'
    'סיר'
    'מזלג'

So, it appears that while we did not solve the problems with the Command Window, at least we can now read the prayer book for our New Year prayers…

Let this be a year of fulfillment, prosperity, health and happiness to all. Shana Tova everybody!

Categories: Low risk of breaking in future versions, Stock Matlab function


2 Comments (Open | Close)

2 Comments To "Reading non-Latin text files"

#1 Comment By Nir On May 21, 2017 @ 2:59 pm

Do you know how can I change character encoding from within a compiled code.

ie. set_param(0, ‘CharacterEncoding’, ‘ISO_8859-8’) could not be added to the matlab compiled exe file.

Thanks

#2 Comment By Yair Altman On May 21, 2017 @ 3:36 pm

Try to place this command in a startup.m file in your code folder, and then recompile your application. I’m not sure it will help, but it’s worth a try.


Article printed from Undocumented Matlab: https://undocumentedmatlab.com/blog_old

URL to article: https://undocumentedmatlab.com/blog_old/reading-non-latin-text-files

URLs in this post:

[1] Image: https://undocumentedmatlab.com/feed/

[2] email feed: https://undocumentedmatlab.com/subscribe_email.html

[3] http://windows.microsoft.com/en-US/windows7/Change-the-system-locale: http://windows.microsoft.com/en-US/windows7/Change-the-system-locale

[4] http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html: http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html

[5] http://www.iana.org/assignments/character-sets: http://www.iana.org/assignments/character-sets

[6] FIG files format : https://undocumentedmatlab.com/blog_old/fig-files-format

[7] Matlab installation woes : https://undocumentedmatlab.com/blog_old/matlab-installation-woes

[8] Bug and workaround in timeseries plot : https://undocumentedmatlab.com/blog_old/bug-and-workaround-in-timeseries-plot

[9] Performance: accessing handle properties : https://undocumentedmatlab.com/blog_old/performance-accessing-handle-properties

[10] Some Matlab performance-tuning tips : https://undocumentedmatlab.com/blog_old/some-performance-tuning-tips

[11] Convolution performance : https://undocumentedmatlab.com/blog_old/convolution-performance

Copyright © Yair Altman - Undocumented Matlab. All rights reserved.