db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED
Date Thu, 23 Aug 2007 06:22:31 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522039
] 

Mamta A. Satoor commented on DERBY-2967:
----------------------------------------

I spent some time on this Jira entry to explore Dan's suggestion for _ search in a string
*************
--Note that the iterator object is of type CollationElementIterator. 
int currentChar = iterator.getOffset(); 
do { 
  iterator.next(); 
} while (iterator.getOffset() == currentChar) 
*************

I believe the code suggested by Dan above will do the trick but I am not sure how to fit that
logic in the current code inside the iapi.types.Like.like method (method starting at line
258) which is where the current implementation for _ resides. 

Some background information on the classes and methods involved in this discussion: There
are 2 like methods inside WorkHorseForCollatorDatatypes(which handles collation sensitive
methods for character string types with territory based collation) and they only differ in
the sense that one accepts the escape DVD while the other one does not. Both these methods
call the like method(starting at line 96) in iapi.types.Like. This like method ends up calling
another like method in the same class (starting at line 258) which provides the actual implementation.
Notice, that this like method does not work with CollationElementIterator. Instead, it expects
the caller to send the int array containing the collation elements for string to be searched
into, pattern to be looked and escape sequence. This is done for performance reasons. We do
not want to construct the collation element arrary for the strings during every call to like.
Instead, we want to construct it once and reuse it every subsequent time. And hence, the current
implementation does not work with CollationElementIterator.

As a solution, I am thinking that may be I should have another int array in WorkHorseForCollatorDatatypes,
which will keep track of the starting position of the collation elements for each of the characters.
We already have an int array, collationElementsForString, which holds the collation elements
for all the characters that this WorkHorseForCollatorDatatypes holds. If we knew where the
new collation elements start in collationElementsForString, we can just advance to the next
character's collation element starting position when we find a _. 

Let me know if anyone has any feedback on this approach or has any other suggestions on fixing
the problem.

> Single character does not match high value unicode character with collation TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>         Attachments: TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It is the same
for english or norwegian. FOR collation UCS_BASIC it matches fine.  Could you tell me if this
is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws SQLException
{
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message