db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathey Marsden (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED
Date Wed, 08 Aug 2007 20:58:59 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518548
] 

Kathey Marsden commented on DERBY-2967:
---------------------------------------

Dan said:

>As for implementing it, I think one has to use the getOffset/setOffset method on CollationElementIterator.
>E.g. along the lines of this to skip a character. The real solution would be more than
this but you get the idea.

>   if (patternChar == '_')
>       iterator.setOffset(iterator.getOffset() + 1);

So for the Norwegian aa, which is one collation element but two characters, I think this code
would set us back to the same offset where we started, since getOffset always returns the
first character of the collation element.  That would leave us I think in an unpleasant loop.
 

Attaching TestNorway.java showing current behavior for Norwegian aa.

default strength:TERTIARY
default decomposition:NO_DECOMPOSITION
aa.length()2
jdbc:derby:nordb
1 rows matching SELECT COUNT(*) FROM T WHERE VC = aa
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE aa
0 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE a_
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE _
0 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE __
jdbc:derby:regdb
1 rows matching SELECT COUNT(*) FROM T WHERE VC = aa
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE aa
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE a_
0 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE _
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE __

I think the correct results for nordb should be:
1 rows matching SELECT COUNT(*) FROM T WHERE VC = aa
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE aa
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE a_
0 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE _
1 rows matching SELECT COUNT(*) FROM T WHERE VC LIKE __

I am going to try to look at another database product to compare behavior so I can better
understand what needs to be implemented, because I am still a bit fuzzy on all this.



> Single character does not match high value unicode character with collation TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>            Assignee: Kathey Marsden
>         Attachments: TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It is the same
for english or norwegian. FOR collation UCS_BASIC it matches fine.  Could you tell me if this
is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws SQLException
{
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message