cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Fortuna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
Date Thu, 20 Oct 2016 01:25:58 GMT

    [ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590441#comment-15590441
] 

Ben Fortuna commented on COCOON-2352:
-------------------------------------

My sincerest apologies, but I discovered a bug in the patch I submitted. Unfortunately I had
assumed we can cast an int to a char to encode the higher order unicode characters, but of
course this isn't possible and is why unicode surrogate pairs exist in the first place..

So I had to make a slight change to the code (again) - I have updated two files: XMLEncoder
and XMLEncoderTestCase to ensure that after combining a surrogate pair to a code point we
are then correctly encoding the int value as an HTML-compatible string.

https://github.com/apache/cocoon/pull/3/files

Thanks again, and fingers crossed there are no more changes required. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --------------------------------------------------
>
>                 Key: COCOON-2352
>                 URL: https://issues.apache.org/jira/browse/COCOON-2352
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core, Blocks: Serializers
>    Affects Versions: 2.1.12
>            Reporter: Ben Fortuna
>            Assignee: Francesco Chicchiriccò
>             Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji characters,
I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate
pairs to represent higher order unicode characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message