db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED
Date Thu, 18 Oct 2007 16:10:50 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535958

Mamta A. Satoor commented on DERBY-2967:

Thanks, Knut,  for checking my commit. I was hesitant too about all the objects creations.

I think we can definitely make the first change suggested by you. I will go ahead and give
it a try
*************part of the change suggested by Knut****************
a) We could use the compare() method instead of iterators. It caches and reuses the iterators
across calls and therefore it might be more efficient. It would also simplify the code, since
the else clause in checkEquality() could be rewritten to: 

} else {//dealing with territory based character string 
    return collator.compare(new String(pat, pLoc, 1), new String(val, vLoc, 1)) == 0: 
*************end of part of the change suggested by Knut*********

But as for the second alternative, we can't create a CollationElementIerator for the entire
string ahead of time for LIKE operation. Let me use an example to illustrate why. In Norway,
the collation element(s) returned for string 'aa' is not same as collation element(s) return
for one 'a' at a time. So, when the user has a WHERE clause  'caad' LIKE '%a%', SQL spec requires
us to return a TRUE for this WHERE clause. We will not implement that behavior if we generated
collation elements for entire string 'caad' at one shot. We need to break 'caad' into four
characters and have collation element for each one of those 4 characters. In Norway, if we
generated collation elements for string 'caad', it will find only 3 characters in that string
and those 3 characters will be 'c', 'aa' and 'd'.  Because of this, we have to generate collation
element(s) one character at a time.

Would love to hear if there are any other ideas to cut down on object creation.

> Single character does not match high value unicode character with collation TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY2967_Oct11_07_diff.txt, DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt,
DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt,
patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, step1_iteratorbased_Sep1507_stat.txt,
temp_diff.txt, temp_stat.txt, TestFrench.java, TestNorway.java
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It is the same
for english or norwegian. FOR collation UCS_BASIC it matches fine.  Could you tell me if this
is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws SQLException
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message