Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 992F4200B91 for ; Thu, 15 Sep 2016 06:16:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 97BA1160AD4; Thu, 15 Sep 2016 04:16:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DED21160AB4 for ; Thu, 15 Sep 2016 06:16:21 +0200 (CEST) Received: (qmail 53386 invoked by uid 500); 15 Sep 2016 04:16:20 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@cocoon.apache.org List-Id: Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 53367 invoked by uid 99); 15 Sep 2016 04:16:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2016 04:16:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B0E162C0D57 for ; Thu, 15 Sep 2016 04:16:20 +0000 (UTC) Date: Thu, 15 Sep 2016 04:16:20 +0000 (UTC) From: "Ben Fortuna (JIRA)" To: dev@cocoon.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 15 Sep 2016 04:16:22 -0000 [ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492310#comment-15492310 ] Ben Fortuna commented on COCOON-2352: ------------------------------------- A possibly less-instrusive approach would be to leave the method signatures as is, but when a surrogate char is detected, record it and return an empty char array. Expect the second surrogate in the pair to be encoded next and return the correct char array result (if second surrogate in the pair isn't encoded throw encoding exception). > XMLEncoder doesn't support Unicode surrogate pairs > -------------------------------------------------- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core > Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)