cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Fortuna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
Date Thu, 15 Sep 2016 03:55:21 GMT

    [ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492282#comment-15492282
] 

Ben Fortuna commented on COCOON-2352:
-------------------------------------

So I've looked at XMLEncoder, and it seems that the fix will require a change to the method
signature - specifically XMLEncoder.encode(char c):

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88

Unfortunately this also means the Encoder interface needs to change, so will need an exercise
to identify what else implements this interface. The proposed change would be something like:

public char[] Encoder.encode(char[] chars)

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36

I'm happy to implement a fix and submit a pull request, just looking for some acknowledgement
of the issue before proceeding.


> XMLEncoder doesn't support Unicode surrogate pairs
> --------------------------------------------------
>
>                 Key: COCOON-2352
>                 URL: https://issues.apache.org/jira/browse/COCOON-2352
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core
>            Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji characters,
I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate
pairs to represent higher order unicode characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message