Return-Path: Delivered-To: apmail-incubator-harmony-dev-archive@www.apache.org Received: (qmail 54739 invoked from network); 28 Feb 2006 08:44:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Feb 2006 08:44:40 -0000 Received: (qmail 48185 invoked by uid 500); 28 Feb 2006 08:44:20 -0000 Delivered-To: apmail-incubator-harmony-dev-archive@incubator.apache.org Received: (qmail 48135 invoked by uid 500); 28 Feb 2006 08:44:20 -0000 Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: harmony-dev@incubator.apache.org Delivered-To: mailing list harmony-dev@incubator.apache.org Received: (qmail 48123 invoked by uid 99); 28 Feb 2006 08:44:19 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2006 00:44:08 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id F06A8DD for ; Tue, 28 Feb 2006 09:43:40 +0100 (CET) Message-ID: <90895006.1141116220981.JavaMail.jira@ajax.apache.org> Date: Tue, 28 Feb 2006 09:43:40 +0100 (CET) From: "Richard Liang (JIRA)" To: harmony-dev@incubator.apache.org Subject: [jira] Commented: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string In-Reply-To: <1964685948.1141024287236.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HARMONY-137?page=comments#action_12368087 ] Richard Liang commented on HARMONY-137: --------------------------------------- Please see the bug info in ICU bug system: http://bugs.icu-project.org/cgi-bin/icu-bugs?findid=5085&go=Go And attached here is ICU team's response to this bug: You are expecting incorrect behavior from cp1250. Both Microsoft's conversion APIs and IBM mapping tables convert byte 81 to Unicode character 0081. This conversion behavior will not change. The tables on unicode.org may tell you about the official mappings, but there are other mappings that are commonly expected. More details about ICU charset conversion can be found on this page: http://icu.sourceforge.net/charts/charset/ This charset conversion works as expected. > CharsetDecoder should replace undefined bytes with replacement string > --------------------------------------------------------------------- > > Key: HARMONY-137 > URL: http://issues.apache.org/jira/browse/HARMONY-137 > Project: Harmony > Type: Bug > Components: Classlib > Reporter: Vladimir Strigun > Priority: Minor > > Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT > So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. > Testcase for reproducing this issue: > import java.nio.charset.*; > import java.nio.*; > public class Harmony137 { > public static void main(String[] args) throws Exception { > ByteBuffer bb = ByteBuffer.allocate(5); > bb.put((byte)0x81); bb.flip(); > Charset cp1250 = Charset.forName("cp1250"); > CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb); > if(cb.get(0)!=65533) { > System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase()); > } > } > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira