commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Heger <oliver.he...@t-online.de>
Subject Re: [Configuration] Problems reading Chinese text from an XMLConfiguration
Date Sat, 03 Sep 2005 09:11:03 GMT
Matthias Bräuer wrote:

> Hello,
>
> I'm having problems reading Chinese data from an XMLConfiguration. The 
> configuration file is encoded in UTF-8. For instance, in my test file 
> the attribute 'name' of the element 'source' is called "我的文件" 
> (Chinese for "My files"). When I request this value from the 
> configuration I get back "我的文件" which obviously is the result

> of some wrong character decoding.
>
> The 'name' attribute is used as a key for a HashMap. Consequently, 
> searching with the key '我的文件' (the original Chinese characters) 
> does not return the entry because apparently the hash code of this 
> Unicode string differs from what the XMLConfiguration returned. Also, 
> printing the name on a JLabel with a Chinese-capable font like 
> "SimSun" gives the wrong result listed above. However, when I write 
> the configuration back to a file, the correct Unicode characters are 
> written.
>
> I used the following code fragment to investigate the problem:
>
>        XMLConfiguration config = null;
>              try {
>            config = new 
> XMLConfiguration("tests/conf/sources_chinese.xml");
>        }
>        catch (ConfigurationException e) {
>            e.printStackTrace();
>        }
>              String name = config.getString("source(0)[@name]");
>        String name2 = "我的文件";
>       When I use a debugger to check the memory content I see the 
> correct Chinese characters in the debug view for the (manually 
> constructed) 'name2'. However, the variable 'name' (which is read from 
> the XMLConfiguration) shows the garbled "我的文件" string.
>
> It appears to me that XMLConfiguration reads and writes in a different 
> format than UTF-8 which, however, would be a bit strange. I had no 
> time today to check this in the sources. Maybe someone on the list has 
> experienced similar problems. Please do not point me to the Java 
> internationalization pages, I've browsed through these a long time. :-)
>
> Thank you very much in advance,
> Kind regards from Taiwan, Matthias
>
I am no expert for encoding of Chinese characters, so I am not sure 
whether this really helps: XMLConfiguration allows you to specify the 
exact encoding you want to use by calling the setEncoding() method. This 
method must be called before load(). Did you try this?

Note also that in configuration 1.1 final there was a bug that the 
encoding was not always taken into account 
(http://issues.apache.org/bugzilla/show_bug.cgi?id=34204). So you might 
want to check out the newest version from SVN.

HTH
Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message