Return-Path: Delivered-To: apmail-jakarta-ant-dev-archive@apache.org Received: (qmail 70426 invoked from network); 15 Jan 2002 15:20:13 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 15 Jan 2002 15:20:13 -0000 Received: (qmail 26190 invoked by uid 97); 15 Jan 2002 15:20:11 -0000 Delivered-To: qmlist-jakarta-archive-ant-dev@jakarta.apache.org Received: (qmail 26160 invoked by uid 97); 15 Jan 2002 15:20:11 -0000 Mailing-List: contact ant-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Ant Developers List" Reply-To: "Ant Developers List" Delivered-To: mailing list ant-dev@jakarta.apache.org Received: (qmail 26147 invoked by uid 50); 15 Jan 2002 15:20:10 -0000 Date: 15 Jan 2002 15:20:10 -0000 Message-ID: <20020115152010.26146.qmail@nagoya.betaversion.org> From: bugzilla@apache.org To: ant-dev@jakarta.apache.org Cc: Subject: DO NOT REPLY [Bug 5837] - Unable to build when project file contains double-byte characters X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5837 Unable to build when project file contains double-byte characters ------- Additional Comments From conor@cortexebusiness.com.au 2002-01-15 07:20 ------- The result from the zip is the same. I still feel it is not valid UTF-8 Go here http://www.ietf.org/rfc/rfc2044.txt?number=2044 for the details. In particular there is a section UCS-4 range (hex.) UTF-8 octet sequence (binary) 0000 0000-0000 007F 0xxxxxxx 0000 0080-0000 07FF 110xxxxx 10xxxxxx 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0400 0000-7FFF FFFF 1111110x 10xxxxxx ... 10xxxxxx You will see that the byte sequence from your file, c6 d8 c5 22, which in binary is 11000110 11011000 11000101 00100010 is not a valid sequence according to the above. BTW, don't worry about the FXE. The od command just cannot represent the chars c6 d8 c5 so it strips the high bit (FXE is 46 58 45). The default for a Reader is not UTF-8 - it is the platform default encoding - something like Cp1252 on Windows. The squished AE thing is what you get when read your bytes as Cp1252. By using the reader with the default encoding you are bypassing the XML parser's normal default of UTF-8 (because you are feeding it characters, it assumes character decoding has been done outside parser) Try opening your reader as is = new FileInputStream(inputFile); reader = new BufferedReader(new InputStreamReader(is, "UTF-8")); and see what happens. -- To unsubscribe, e-mail: For additional commands, e-mail: