Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 69A6A200B82 for ; Fri, 16 Sep 2016 09:23:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 685BD160ADB; Fri, 16 Sep 2016 07:23:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B8FC1160AC4 for ; Fri, 16 Sep 2016 09:23:21 +0200 (CEST) Received: (qmail 53071 invoked by uid 500); 16 Sep 2016 07:23:20 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@cocoon.apache.org List-Id: Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 53034 invoked by uid 99); 16 Sep 2016 07:23:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Sep 2016 07:23:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9081A2C0D58 for ; Fri, 16 Sep 2016 07:23:20 +0000 (UTC) Date: Fri, 16 Sep 2016 07:23:20 +0000 (UTC) From: "Ben Fortuna (JIRA)" To: dev@cocoon.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 16 Sep 2016 07:23:22 -0000 [ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495624#comment-15495624 ] Ben Fortuna commented on COCOON-2352: ------------------------------------- Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder. I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder). > XMLEncoder doesn't support Unicode surrogate pairs > -------------------------------------------------- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers > Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)