commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Lea <ja...@kumachan.net.nz>
Subject Re: [Configuration] Problems reading Chinese text from an XMLConfiguration
Date Sat, 03 Sep 2005 21:01:32 GMT
XML allows you to specify the encoding of the document, otherwise it 
defaults to ISO-8859-1.

they normally have something like this at the top:

<?xml  encoding="ISO-8859-1"?>

change it to

<?xml encoding="UTF-8"?>

if that isn't in the xml document or the very first item, add it.

Matthias Bräuer wrote:

>Hello,
>
>I'm having problems reading Chinese data from an XMLConfiguration. The 
>configuration file is encoded in UTF-8. For instance, in my test file 
>the attribute 'name' of the element 'source' is called "我的文件" 
>(Chinese for "My files"). When I request this value from the 
>configuration I get back "我的文件" which obviously is the result
of 
>some wrong character decoding.
>
>The 'name' attribute is used as a key for a HashMap. Consequently, 
>searching with the key '我的文件' (the original Chinese characters) does 
>not return the entry because apparently the hash code of this Unicode 
>string differs from what the XMLConfiguration returned. Also, printing 
>the name on a JLabel with a Chinese-capable font like "SimSun" gives the 
>wrong result listed above. However, when I write the configuration back 
>to a file, the correct Unicode characters are written.
>
>I used the following code fragment to investigate the problem:
>
>        XMLConfiguration config = null;
>       
>        try {
>            config = new XMLConfiguration("tests/conf/sources_chinese.xml");
>        }
>        catch (ConfigurationException e) {
>            e.printStackTrace();
>        }
>       
>        String name = config.getString("source(0)[@name]");
>        String name2 = "我的文件";
>       
>When I use a debugger to check the memory content I see the correct 
>Chinese characters in the debug view for the (manually constructed) 
>'name2'. However, the variable 'name' (which is read from the 
>XMLConfiguration) shows the garbled "我的文件" string.
>
>It appears to me that XMLConfiguration reads and writes in a different 
>format than UTF-8 which, however, would be a bit strange. I had no time 
>today to check this in the sources. Maybe someone on the list has 
>experienced similar problems. Please do not point me to the Java 
>internationalization pages, I've browsed through these a long time. :-)
>
>Thank you very much in advance,
>Kind regards from Taiwan, Matthias
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>  
>

-- 
Jason Lea




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message