- Undocumented Matlab - https://undocumentedmatlab.com -

Reading non-Latin text files

Posted By Yair Altman On September 28, 2011 | 2 Comments

In the spirit of the Jewish New Year that begins tonight, I would like to share a workaround that I received from blog reader Ro’ee Gilron of Tel-Aviv University:
Matlab users who use non-Latin computer Locales are aware of the issues that the Matlab Command Window has had with such languages for many years. I am not sure whether these problems are due to the LTR nature of Hebrew/Arabic, or their use of a non-supported code-page, or some other reason. To this day (R2011b), I am not aware of any fix or workaround for these issues.
But it seems that in addition, Matlab has a problem reading files that contain text in these languages, even when the computer’s Locale is set correctly, to a Locale that supports the non-Latin text. This is where Ro’ee’s workaround helps. In his words:
To give some more background, this used to work with a 32bit system, and an older version of Matlab (7.1). Now it doesn’t. Saving the file in UTF-8 and using fopen and textscan instead of importdata gives me this:
nowords =
‘שלבק’
‘התלכב’
‘× ×™×›×˜×¨’
‘תלפורש’
‘×œ×§×˜× ‘
‘מזוחש’
‘שלטיק’
‘טיבר’
‘עולג’
‘סלבוחד’
‘משוחגות’
‘מלוגסות’
‘סבק’
‘צמשר’
‘הכריב’
‘תמציל’

The solution is as follows (requires Simulink):
1) Change system Locale to Hebrew: http://windows.microsoft.com/en-US/windows7/Change-the-system-locale [1]
(this doesn’t change the language of the OS etc.).
2) Change the encoding that Matlab uses:
http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html [2]
They tell you not to, but I did… – you must change it to encoding that works for Hebrew: http://www.iana.org/assignments/character-sets [3]
Any other language should work as well (I hope…). For Hebrew the code that works for me is ISO_8859-8
3) You should now be able to read TXT files that have Hebrew characters in them.

>> a='הצלחה!'
a =
!
>> currentCharacterEncoding = slCharacterEncoding();
>> currentCharacterEncoding = get_param(0, 'CharacterEncoding')  % equivalent alternative
currentCharacterEncoding =
windows-1252
% Now modify the default encoding to something more useful
>> slCharacterEncoding('ISO_8859-8')
>> set_param(0, 'CharacterEncoding', 'ISO_8859-8');   % equivalent alternative
>> currentCharacterEncoding = slCharacterEncoding()
currentCharacterEncoding =
ISO-8859-8
>> a='הצלחה!'
a =
!                  % still no good in the Command Window...
% Let's try to read a file with some Hebrew words:
>> neutral = importdata('neutral.txt')
neutral =
שולחן'
    'כסא'
    'מנורה'
    'צלחת'
    'סיר'
    'מזלג'

So, it appears that while we did not solve the problems with the Command Window, at least we can now read the prayer book for our New Year prayers…
Let this be a year of fulfillment, prosperity, health and happiness to all. Shana Tova everybody!

Categories: Low risk of breaking in future versions, Stock Matlab function


2 Comments (Open | Close)

2 Comments To "Reading non-Latin text files"

#1 Comment By Nir On May 21, 2017 @ 14:59

Do you know how can I change character encoding from within a compiled code.

ie. set_param(0, ‘CharacterEncoding’, ‘ISO_8859-8’) could not be added to the matlab compiled exe file.

Thanks

#2 Comment By Yair Altman On May 21, 2017 @ 15:36

Try to place this command in a startup.m file in your code folder, and then recompile your application. I’m not sure it will help, but it’s worth a try.


Article printed from Undocumented Matlab: https://undocumentedmatlab.com

URL to article: https://undocumentedmatlab.com/articles/reading-non-latin-text-files

URLs in this post:

[1] http://windows.microsoft.com/en-US/windows7/Change-the-system-locale: http://windows.microsoft.com/en-US/windows7/Change-the-system-locale

[2] http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html: http://www.mathworks.com/help/toolbox/simulink/slref/slcharacterencoding.html

[3] http://www.iana.org/assignments/character-sets: http://www.iana.org/assignments/character-sets

[4] FIG files format : https://undocumentedmatlab.com/articles/fig-files-format

[5] Command Window text manipulation : https://undocumentedmatlab.com/articles/command-window-text-manipulation

[6] Another Command Window text color hack : https://undocumentedmatlab.com/articles/another-command-window-text-color-hack

[7] Bold color text in the Command Window : https://undocumentedmatlab.com/articles/bold-color-text-in-the-command-window

[8] Setting status-bar text : https://undocumentedmatlab.com/articles/setting-status-bar-text

[9] cprintf – display formatted color text in the Command Window : https://undocumentedmatlab.com/articles/cprintf-display-formatted-color-text-in-command-window

Copyright © Yair Altman - Undocumented Matlab. All rights reserved.