commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri Yandell (JIRA)" <j...@apache.org>
Subject [jira] Closed: (LANG-617) StringEscapeUtils.escapeXML() can't process UTF-16 supplementary characters
Date Fri, 17 Sep 2010 05:13:32 GMT

     [ https://issues.apache.org/jira/browse/LANG-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Henri Yandell closed LANG-617.
------------------------------

    Resolution: Fixed

Marking this as closed. Feel free to reopen if you find the current codebase is still problematic
David. Things seem good with the data you provided (many thanks for that by the way).

> StringEscapeUtils.escapeXML() can't process UTF-16 supplementary characters
> ---------------------------------------------------------------------------
>
>                 Key: LANG-617
>                 URL: https://issues.apache.org/jira/browse/LANG-617
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.4
>            Reporter: David Garcia
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: utf8-fragment.txt, xml-escaped-fragment.txt
>
>
> Supplementary characters in UTF-16 are those whose code points are above 0xffff, that
is, require more than 1 Java char to be encoded, as explained here: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
> Currently, StringEscapeUtils.escapeXML() isn't aware of this coding scheme and treats
each char as one character, which is not always right.
> A possible solution in class Entities would be:
>     public void escape(Writer writer, String str) throws IOException {
>         int len = str.length();
>         for (int i = 0; i < len; i++) {
>             int code = str.codePointAt(i);
>             String entityName = this.entityName(code);
>             if (entityName != null) {
>                 writer.write('&');
>                 writer.write(entityName);
>                 writer.write(';');
>             } else if (code > 0x7F) {
>                     writer.write("&#");
>                     writer.write(code);
>                     writer.write(';');
>             } else {
>                     writer.write((char) code);
>             }
>             if (code > 0xffff) {
>                     i++;
>             }
>         }
>     }
> Besides fixing escapeXML(), this will also affect HTML escaping functions. I guess that's
a good thing, but please remember I have only tested escapeXML().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message