commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anders Kofoed" <anders.kof...@avaleo.net>
Subject Out Of Office until 2nd January 2007
Date Thu, 28 Dec 2006 12:01:18 GMT
Thank you for your mail.
 
Avaleo office is closed from December 23 to January 2 were we will address you request.
If you have urgent requests it's possible to call the following persons:
 
- From December 23th to December 27th (both days included) please call Niels Dyrnesli on +45
40741496.
- From December 28th to January 1 (both days included) please call Anders Kofoed +4529442941
 
Kind regards
Anders Kofoed
Avaleo ApS


-----Original Message-----
Reply-To: "Jakarta Commons Users List" <commons-user@jakarta.apache.org>
Subject: Re: [Configuration] UTF-8 encoding problem
From: Simon Kitching <skitching@apache.org>
To: Jakarta Commons Users List <commons-user@jakarta.apache.org>
Date: Fri, 29 Dec 2006 01:00:51 +1300

> On Thu, 2006-12-28 at 11:15 +0000, Andrew Shirley wrote:
> > On Thu, Dec 28, 2006 at 11:30:07AM +0100, DECAFFMEYER MATHIEU wrote:
> > > 
> > > Hi,
> > > 
> > > I am using Jakarta Configuration to manipulate some XML files.
> > > 
> > 
> > 
> > > 
> > > What do u suggest me to do ?
> > > 
> > > Thank u for any help ! Will be greatly appreciated !
> > 
> > This may be that the file isn't actually UTF-8 i.e. it contains some
> > extended ASCII characters. The usual problem in the uk is the pound
> > sign but the euro is probably a good candidate as well. I would check
> > that you are only using the standard (i.e. < 128) ascii characters.

> The UTF-8 encoding can handle any character at all, not just ASCII.

> The error message you are seeing is not being generated by
> commons-configuration, but by the underlying xml parser:

> > 
> > Caused by: java.io.UTFDataFormatException: Octet 2 incorrect dans la
> > séquence UTF-8 à 3-octets. 
> >         at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown 

> In other words, your input file is corrupt; the xml parser has
> encountered a sequence of bytes that does not correspond to any valid
> character. 

> You will need to fix your input file so that it is valid UTF-8. There is
> no way that the commons-configuration library can process your data if
> the xml parser refuses to parse it.

> One possibility is that the input file is actually encoded in an 8-bit
> character encoding such as LATIN-1, NOT UTF-8 at all.

> With UTF-8, any byte from 0 through 127 is an ASCII character, while a
> byte from 128 through 255 indicates the start of a multibyte sequence
> (two or more bytes) that represents a character that is NOT in the ascii
> set.

> With an 8-bit encoding like LATIN-1, values from 128 to 255 are NOT
> multibyte sequences, but instead represent a specific set of 128
> "extended characters", and there is no way to represent a character that
> is not in the set associated with that encoding.

> Regards,

> Simon


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message