Return-Path: Delivered-To: apmail-commons-issues-archive@locus.apache.org Received: (qmail 63431 invoked from network); 21 Jan 2009 15:52:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Jan 2009 15:52:26 -0000 Received: (qmail 65420 invoked by uid 500); 21 Jan 2009 15:52:24 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 65357 invoked by uid 500); 21 Jan 2009 15:52:24 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 65346 invoked by uid 99); 21 Jan 2009 15:52:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 07:52:23 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 15:52:21 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D1D81234C4A9 for ; Wed, 21 Jan 2009 07:51:59 -0800 (PST) Message-ID: <1939778209.1232553119858.JavaMail.jira@brutus> Date: Wed, 21 Jan 2009 07:51:59 -0800 (PST) From: =?utf-8?Q?Alexander_Kj=C3=A4ll_=28JIRA=29?= To: issues@commons.apache.org Subject: [jira] Commented: (LANG-480) StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters In-Reply-To: <2053374472.1232473020229.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LANG-480?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D126658= 68#action_12665868 ]=20 Alexander Kj=C3=A4ll commented on LANG-480: -------------------------------------- That is a bit sad. How likely do you think that the JDK 5 version to be, will it happen within= this quarter? I guess i could try to write a patch that is compatible with java 1.2, but = that would require me to do my own parsing of the format that java stores c= haracters in memory, so i would really like to avoid having that code in a = library. > StringEscapeUtils.escapeHtml incorrectly converts unicode characters abov= e U+00FFFF into 2 characters > -------------------------------------------------------------------------= ---------------------------- > > Key: LANG-480 > URL: https://issues.apache.org/jira/browse/LANG-480 > Project: Commons Lang > Issue Type: Bug > Affects Versions: 2.4 > Environment: doesn't matter > Reporter: Alexander Kj=C3=A4ll > Priority: Minor > Attachments: lang-480.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Characters that are represented as a 2 characters internaly by java are i= ncorrectly converted by the function. The following test displays the probl= em quite nicely: > import org.apache.commons.lang.*; > public class J2 { > public static void main(String[] args) throws Exception { > // this is the utf8 representation of the character: > // COUNTING ROD UNIT DIGIT THREE > // in unicode > // codepoint: U+1D362 > byte[] data =3D new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, = (byte)0xA2 }; > //output is: �� > // should be: 𝍢 > System.out.println("'" + StringEscapeUtils.escapeHtml(new String(= data, "UTF8")) + "'"); > } > } > Should be very quick to fix, feel free to drop me an email if you want a = patch. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.