db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject Re: Single character does not match high value unicode character with collation TERRITORY_BASED. Is this a bug
Date Mon, 23 Jul 2007 16:47:55 GMT
Hi Kathey,

I debugged the code below and it looks like _ not matching \uFA2D might be a
bug.
The actual code for comparison happens in the existing code that was left
over for National character types. In SQLChar and in the newly introduced
classes for collation, there are two methods

public BooleanDataValue like(DataValueDescriptor pattern)
public BooleanDataValue like(DataValueDescriptor pattern,DataValueDescriptor
escape) throws StandardException

In SQLChar, we check if we are dealing with national character types and if
so, we do special code for it's like implementation. The same special code
gets used for collation related classes like CollatorSQLChar.

The special processing involves getting the collation elements using the
RuleBasedCollator for the character string. The collation elements for a
string are obtained using RuleBasedCollator.getCollationElementIterator(
characterString.getString()). Taking specific example of Norwegian, '\uFA2D'
converts into 2 (and not 1 and this is the cause of the problem) collation
elements. These collation elements are passed as in int array to following
method in iapi.types.Like class
public static Boolean like(int[] value, int valueLength, int[] pattern, int
patternLength, RuleBasedCollator collator)

The method above uses the passed RuleBasedCollator to find the collation
element for '_'. For our specific example, in Norwegian, '_' translates into
only one collation element (vs 2 elements for '\uFA2D'). When looking for
'_', we eliminate only 1 collation element from the array created for
'\uFA2D' because '_' got translated into 1 collation element. Following is
the code copied from Like.like
   else if (matchSpecial(pat, pLoc, pEnd, anyCharInts))
   {
    // regardless of the char, it matches
    vLoc += anyCharInts.length;
    pLoc += anyCharInts.length;

    result = checkLengths(vLoc, vEnd, pLoc, pat, pEnd, anyStringInts);
    if (result != null)
     return result;
   }

So, it seems that the code above can't assume that the collation elements
for all the characters in say Norwegian are 1 in length just because
collation element for '_' is 1 element.

I think we should go ahead and open a jira entry for this. Would like to
hear if anyone has any comments on this.

thanks,
Mamta

On 7/20/07, Kathey Marsden <kmarsdenderby@sbcglobal.net> wrote:
>
> With TERRITORY_BASED collation '_' does not match  the character
> \uFA2D.  It is the same for english or norwegian. FOR collation
> UCS_BASIC it matches fine.  Could you tell me if this is a bug?
> Here is a program to reproduce.
>
>
> Kathey
>
>
> import java.sql.*;
>
> public class HighCharacter {
>
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn =
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn =
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>
>    }
>
>
> public static  void testLikeWithHighestValidCharacter(Connection conn)
> throws SQLException {
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>
>
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>
>    String[] match = { "%", "_", "\uFA2D" };
>
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " +
> match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery ();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else
>        System.out.println("FAIL: no match");
>
>    rs.close();
>    }
>
> }
> }
>
>
>

Mime
View raw message