ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 5837] - Unable to build when project file contains double-byte characters
Date Tue, 15 Jan 2002 15:20:10 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5837>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5837

Unable to build when project file contains double-byte characters





------- Additional Comments From conor@cortexebusiness.com.au  2002-01-15 07:20 -------
The result from the zip is the same. I still feel it is not valid UTF-8

Go here http://www.ietf.org/rfc/rfc2044.txt?number=2044 for the details. In
particular there is a section

 UCS-4 range (hex.)           UTF-8 octet sequence (binary)
   0000 0000-0000 007F   0xxxxxxx
   0000 0080-0000 07FF   110xxxxx 10xxxxxx
   0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx

   0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
   0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
   0400 0000-7FFF FFFF   1111110x 10xxxxxx ... 10xxxxxx

You will see that the byte sequence from your file, c6 d8 c5 22, which in binary is

11000110 11011000 11000101 00100010

is not a valid sequence according to the above. 

BTW, don't worry about the FXE. The od command just cannot represent the chars
c6 d8 c5 so it strips the high bit (FXE is 46 58 45). 

The default for a Reader is not UTF-8 - it is the platform default encoding -
something like Cp1252 on Windows. The squished AE thing is what you get when
read your bytes as Cp1252. By using the reader with the default encoding you are
 bypassing the XML parser's normal default of UTF-8 (because you are feeding it
characters, it assumes character decoding has been done outside parser)

Try opening your reader as 
        is = new FileInputStream(inputFile);
        reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));

and see what happens.

--
To unsubscribe, e-mail:   <mailto:ant-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:ant-dev-help@jakarta.apache.org>


Mime
View raw message