commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Heger <oliver.he...@oliver-heger.de>
Subject Re: UTF-8 sequence problem.
Date Mon, 27 Nov 2006 21:43:51 GMT
Thomas Thomas wrote:
> Hi,
> 
> In my configuration file, I have the following :
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> 
> <!DOCTYPE configuration [
>  <!ENTITY amp "&#x26;">
>  <!ENTITY lt "&#x3C;">
>  <!ENTITY minus "&#45;">
> ]>
> 
> When I do operations in this file (read & write) ,
> something really weird happens :
> 
> The above code changes to :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> Then when I reload the program it says :
> 
> [27/11/06 16:07:46:951 CET] 6083bafc SystemErr     R
> org.apache.commons.configuration.ConfigurationException: Octet 2 incorrect
> dans la séquence UTF-8 à 2-octets.
<snip/>

Hi,

if I understand you correctly, there are two points:
1. the encoding is changed and
2. the DTD with the entity definitions is dropped.

ad 1: There seems to be no portable way of extracting the encoding from 
a XML document using standard Java APIs (at least I have found none). So 
as a workaround you have to set the encoding manually before you save 
the configuration. This can be done using the setEncoding() method.

ad 2: Again here is a limitation of the underlying Java API, which does 
not support writing DTDs. I have found a reference [1] about this topic. 
Obviously there is no easy solution for this problem.

Oliver

[1] http://forum.java.sun.com/thread.jspa?threadID=784467&messageID=4459240

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message