db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED
Date Fri, 14 Sep 2007 08:26:33 GMT

     [ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mamta A. Satoor updated DERBY-2967:

    Attachment: step1_iteratorbased_Sep1507_stat.txt

Attaching a new patch (svn diff is attached as step1_iteratorbased_Sep1507_diff.txt and svn
stat -q is attached as step1_iteratorbased_Sep1507_stat.txt). This patch does not build the
collation elements for the value string in advance, instead it fetches the collation element
from the CollationElementIterator as needed for the value string. In addition, it does not
build CollationElementIterator on entire pattern string. The metacharacters in pattern are
compated using their unicode values. Rest of the characters in pattern will have CollationElementIterator
associated with them. In other words, for the pattern string, collation elements are used
only for non-metacharacters. 

The new logic for LIKE implementation is as follows(This is really the javadoc for the iapi.types.Like:like(CollationElementIterator
valueIterator, String pattern, String escape, RuleBasedCollator collator)). I do have 2 questions
that I would appreciate help on. The 2 questions are at the end of the nice :) javadoc below.

	 * This method will be called for character string types with territory
	 * based collation. The logic of the method is as follows
	 * A)If pattern string or value Iterator is null, then this method will
	 * return null. Because the results of LIKE can't be established in such
	 * a situation.
	 * B)Intialize the pointer into pattern string to 0
	 * C)Start the loop
	 *   a)Check if we have reached the end of value Iterator. If yes
	 *     1)Check if we have reached the end of pattern string. If yes 
	 *     return TRUE.
	 *     2)Check if we pattern string only has % left. If yes, then 
	 *     return TRUE.
	 *     3)If a1) and a2) not true, then return FALSE.
	 *   c)Start looking at pattern where the pointer is pointing and keep
	 *     going until you find end of pattern or one of the metacharacters
	 *     ie %, * or escape character. 
	 *   d)Get a CollationElementIterator for the non-metacharacters found in 
	 *     step c(using the Collator passed to this method. The same Collator
	 *     was used to construct a CollationElementIterator for value string).
	 *     and make sure that they match the collation elements found in
	 *     value CollationElementIterator. A mismatch would require us to 
	 *     return FALSE from this method.
	 *   e)Do the checks performed by step Ca).
	 *   f)Check what metacharacter is the offset in pattern pointing to
	 *     1)If it is escape character, then convert the next character in
	 *       pattern to it's collation element(s) and compare those collation
	 *       elements to elements in valueIterator. If they do not match,
	 *       we need to return FALSE. 
	 *     2)If it is not escape character, then check if it is a _. If yes,
	 *       then skip all the collation elements in valueIterator 
	 *       corresponding to the next character in value.
	 *     3)If it is not escape character or a '_' character, then check if
	 *       it is a '%'. If not, then go back to step C). If yes, then check
	 *       if we have reached the end of pattern. If end of pattern, then we
	 *       can simply return from this method with TRUE return value. I have
	 *       a question Q1(written below). If the code in question in Q1 is not
	 *       satisified and we have not reached end of pattern, then check if
	 *       rest of the characters in pattern are all '%'. If yes, then we 
	 *       can simply return from this method wil TRUE return value. I have
	 *       question Q2 at this point
	 *       Q1)I copied the code from the old method implementation which at
	 *          this point checks if we have reached the end of valueIterator
	 *          then we should return TRUE value. I think that is incorrect
	 *          because we have reached the end of valueIterator, but there
	 *          might be more characters in the pattern that we have not
	 *          matched yet.
	 *       Q2)What would be the best way to implement the logic to handle 
	 *          valueIterator for a % found in the pattern.
	 *   g)Go back to step C).

> Single character does not match high value unicode character with collation TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: step1_iteratorbased_Sep1507_diff.txt, step1_iteratorbased_Sep1507_stat.txt,
temp_diff.txt, temp_stat.txt, TestFrench.java, TestNorway.java
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It is the same
for english or norwegian. FOR collation UCS_BASIC it matches fine.  Could you tell me if this
is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws SQLException
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message