commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: [Configuration] UTF-8 encoding problem
Date Thu, 28 Dec 2006 12:00:51 GMT
On Thu, 2006-12-28 at 11:15 +0000, Andrew Shirley wrote:
> On Thu, Dec 28, 2006 at 11:30:07AM +0100, DECAFFMEYER MATHIEU wrote:
> > 
> > Hi,
> > 
> > I am using Jakarta Configuration to manipulate some XML files.
> > 
> > 
> > What do u suggest me to do ?
> > 
> > Thank u for any help ! Will be greatly appreciated !
> This may be that the file isn't actually UTF-8 i.e. it contains some
> extended ASCII characters. The usual problem in the uk is the pound
> sign but the euro is probably a good candidate as well. I would check
> that you are only using the standard (i.e. < 128) ascii characters.

The UTF-8 encoding can handle any character at all, not just ASCII.

The error message you are seeing is not being generated by
commons-configuration, but by the underlying xml parser:

> Caused by: Octet 2 incorrect dans la
> séquence UTF-8 à 3-octets. 
>         at 

In other words, your input file is corrupt; the xml parser has
encountered a sequence of bytes that does not correspond to any valid

You will need to fix your input file so that it is valid UTF-8. There is
no way that the commons-configuration library can process your data if
the xml parser refuses to parse it.

One possibility is that the input file is actually encoded in an 8-bit
character encoding such as LATIN-1, NOT UTF-8 at all.

With UTF-8, any byte from 0 through 127 is an ASCII character, while a
byte from 128 through 255 indicates the start of a multibyte sequence
(two or more bytes) that represents a character that is NOT in the ascii

With an 8-bit encoding like LATIN-1, values from 128 to 255 are NOT
multibyte sequences, but instead represent a specific set of 128
"extended characters", and there is no way to represent a character that
is not in the set associated with that encoding.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message