commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kazuki Hamasaki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-857) StringIndexOutOfBoundsException in CharSequenceTranslator
Date Wed, 21 Nov 2012 14:51:58 GMT

    [ https://issues.apache.org/jira/browse/LANG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502022#comment-13502022
] 

Kazuki Hamasaki commented on LANG-857:
--------------------------------------

I created additional test cases.
But tests for {{escapeJava}} and {{escapeEcmaScript}} fail at this time, due to [LANG-858]

{code:java}
    @Test
public void testEscapeSurrogatePairs() throws Exception {
    assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
    // Examples from https://en.wikipedia.org/wiki/UTF-16
    assertEquals("\uD800\uDC00", StringEscapeUtils.escapeCsv("\uD800\uDC00"));
    assertEquals("\uD834\uDD1E", StringEscapeUtils.escapeCsv("\uD834\uDD1E"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeCsv("\uDBFF\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeHtml3("\uDBFF\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeHtml4("\uDBFF\uDFFD"));
    assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeJava("\uDBFF\uDFFD"));       //fail
    assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeEcmaScript("\uDBFF\uDFFD")); //fail
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeXml("\uDBFF\uDFFD"));
}

@Test
public void testUnEscapeSurrogatePairs() throws Exception {
    assertEquals("\uD83D\uDE30", StringEscapeUtils.unescapeCsv("\uD83D\uDE30"));
    // Examples from https://en.wikipedia.org/wiki/UTF-16
    assertEquals("\uD800\uDC00", StringEscapeUtils.unescapeCsv("\uD800\uDC00"));
    assertEquals("\uD834\uDD1E", StringEscapeUtils.unescapeCsv("\uD834\uDD1E"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeCsv("\uDBFF\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeHtml3("\uDBFF\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeHtml4("\uDBFF\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeJava("\\uDBFF\\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeEcmaScript("\\uDBFF\\uDFFD"));
    assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeXml("\uDBFF\uDFFD"));
}
{code}
                
> StringIndexOutOfBoundsException in CharSequenceTranslator
> ---------------------------------------------------------
>
>                 Key: LANG-857
>                 URL: https://issues.apache.org/jira/browse/LANG-857
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.x
>            Reporter: Kazuki Hamasaki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.2
>
>         Attachments: CharSequenceTranslator_translate.patch
>
>
> I found that there is bad surrogate pair handling in the CharSequenceTranslator
> This is a simple test case for this problem.
> \uD83D\uDE30 is a surrogate pair.
> {code:java}
> @Test
> public void testEscapeSurrogatePairs() throws Exception {
>     assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
> }
> {code}
> You'll get the exception as shown below.
> {code}
> java.lang.StringIndexOutOfBoundsException: String index out of range: 2
> 	at java.lang.String.charAt(String.java:658)
> 	at java.lang.Character.codePointAt(Character.java:4668)
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95)
> 	at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
> 	at org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556)
> {code}
> Patch attached, the method affected:
> # public final void translate(CharSequence input, Writer out) throws IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message