commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] Created: (LANG-607) StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
Date Sun, 14 Mar 2010 00:40:27 GMT
StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
------------------------------------------------------------------------------------------

                 Key: LANG-607
                 URL: https://issues.apache.org/jira/browse/LANG-607
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 2.5
         Environment: java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

Microsoft Windows [Version 6.0.6002]

Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
Java version: 1.6.0_16
Java home: C:\Program Files\Java\jdk1.6.0_16\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
            Reporter: Gary Gregory
            Assignee: Gary Gregory
            Priority: Minor
             Fix For: 3.0


StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.

For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written
in Java source as "\uD840\uDC00"

	private static final String CharU20000 = "\uD840\uDC00";
	private static final String CharU20001 = "\uD840\uDC01";

You can see Unicode supplementary characters correctly implemented in the JRE call:

	assertEquals(-1, CharU20000.indexOf(CharU20001));

But this is broken:

	assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
	assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));

This is fine:

	assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
	assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
	assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
	assertEquals(false, StringUtils.contains(CharU20000, CharU20001));

because the method calls the JRE to perform the match.

More than you want to know:
- http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message