commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Bräuer <>
Subject [Configuration] Problems reading Chinese text from an XMLConfiguration
Date Fri, 02 Sep 2005 14:19:18 GMT

I'm having problems reading Chinese data from an XMLConfiguration. The 
configuration file is encoded in UTF-8. For instance, in my test file 
the attribute 'name' of the element 'source' is called "我的文件" 
(Chinese for "My files"). When I request this value from the 
configuration I get back "我的文件" which obviously is the result of 
some wrong character decoding.

The 'name' attribute is used as a key for a HashMap. Consequently, 
searching with the key '我的文件' (the original Chinese characters) does 
not return the entry because apparently the hash code of this Unicode 
string differs from what the XMLConfiguration returned. Also, printing 
the name on a JLabel with a Chinese-capable font like "SimSun" gives the 
wrong result listed above. However, when I write the configuration back 
to a file, the correct Unicode characters are written.

I used the following code fragment to investigate the problem:

        XMLConfiguration config = null;
        try {
            config = new XMLConfiguration("tests/conf/sources_chinese.xml");
        catch (ConfigurationException e) {
        String name = config.getString("source(0)[@name]");
        String name2 = "我的文件";
When I use a debugger to check the memory content I see the correct 
Chinese characters in the debug view for the (manually constructed) 
'name2'. However, the variable 'name' (which is read from the 
XMLConfiguration) shows the garbled "我的文件" string.

It appears to me that XMLConfiguration reads and writes in a different 
format than UTF-8 which, however, would be a bit strange. I had no time 
today to check this in the sources. Maybe someone on the list has 
experienced similar problems. Please do not point me to the Java 
internationalization pages, I've browsed through these a long time. :-)

Thank you very much in advance,
Kind regards from Taiwan, Matthias

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message