db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED
Date Mon, 22 Oct 2007 18:01:04 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536768
] 

Mamta A. Satoor commented on DERBY-2967:
----------------------------------------

Knut, yes, I did mean the SQL = operation. Also, thanks for your testing.

>From various discussions in the past on Derby list about aa and å in Norwegian, I made
the assumption that the JVM's collation table for Norwegian must have same collation element
for aa and å. But that is not the case as shwon by your test case inside ij. I also wrote
a very simple test case outside of Derby(copied below) which shows the collation elements
for aa and å are different in Norwegian and that is why the SQL operation 'aa'='å' is returning
false.

RuleBasedCollator myCollator = (RuleBasedCollator)Collator.getInstance(new Locale("da","DK"));

System.out.println("what happens if iterator is on aa string");
CollationElementIterator aIterator = myCollator.getCollationElementIterator("aa");
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());

System.out.println("what happens if iterator is on å string");
aIterator = myCollator.getCollationElementIterator("å");
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());

Output of the code above
what happens if iterator is on aa string
next is 7405570
offset is 2
next is -1
offset is 2
what happens if iterator is on σ string
next is 7405568
offset is 1
next is -1
offset is 1

So, my example to show different behavior of SQL LIKE and SQL = is not correct. 

I am wondering if anyone knows of any characters in a language where the characters are different
but they have the same collation elements in that language. The test case is going to require
different *number* of characters in each side of =. Having different *number* of characters(but
same collation element(s)) is crucial to show the difference between = and LIKE.

> Single character does not match high value unicode character with collation TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY2967_Oct11_07_diff.txt, DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt,
DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt,
patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, step1_iteratorbased_Sep1507_stat.txt,
temp_diff.txt, temp_stat.txt, TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It is the same
for english or norwegian. FOR collation UCS_BASIC it matches fine.  Could you tell me if this
is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws SQLException
{
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message