commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taro Yabuki (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LANG-720) StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters in Supplementary Planes.
Date Thu, 14 Jul 2011 13:25:00 GMT
StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters
in Supplementary Planes.
-------------------------------------------------------------------------------------------------------------------

                 Key: LANG-720
                 URL: https://issues.apache.org/jira/browse/LANG-720
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*, lang.text.translate.*
    Affects Versions: 3.0
            Reporter: Taro Yabuki


Hello.

I use StringEscapeUtils.escapeXml(input) to escape special characters for XML.
This method outputs wrong results when input contains characters in Supplementary Planes.

String str1 = "\uD842\uDFB7" + "A";
String str2 = StringEscapeUtils.escapeXml(str1);

// The value of str2 must be equal to the one of str1,
// because str1 does not contain characters to be escaped.
// However, str2 is diffrent from str1.

System.out.println(URLEncoder.encode(str1, "UTF-16BE")); //%D8%42%DF%B7A
System.out.println(URLEncoder.encode(str2, "UTF-16BE")); //%D8%42%DF%B7%FF%FD

The cause of this problem is that the loop to translate input character by character is wrong.
In CharSequenceTranslator.translate(CharSequence input, Writer out),
loop counter "i" moves from 0 to Character.codePointCount(input, 0, input.length()),
but it should move from 0 to input.length().


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message