cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Fortuna (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
Date Mon, 17 Oct 2016 12:45:58 GMT

    [ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582147#comment-15582147
] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM:
----------------------------------------------------------------

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old
code. I noticed the error is on line 42, but the test I submitted only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which is why I had
the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```



was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old
code. I noticed the error is on line 42, but the test I submitted only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which is why I had
the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --------------------------------------------------
>
>                 Key: COCOON-2352
>                 URL: https://issues.apache.org/jira/browse/COCOON-2352
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core, Blocks: Serializers
>    Affects Versions: 2.1.12
>            Reporter: Ben Fortuna
>            Assignee: Francesco Chicchiriccò
>             Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji characters,
I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate
pairs to represent higher order unicode characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message