db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DERBY-5959) Territory-based collation is not robust against changes in the collation rules
Date Tue, 23 Oct 2012 15:47:14 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482363#comment-13482363
] 

Knut Anders Hatlen edited comment on DERBY-5959 at 10/23/12 3:46 PM:
---------------------------------------------------------------------

Java 8 fixes a bug in Thai locale ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6755060
), and that causes a similar issue on upgrade with territory-based collation.

Example, create database in Java 7:

connect 'jdbc:derby:thaidb;territory=th;collation=TERRITORY_BASED;create=true';
create table t(x int, c char(1) unique not null);
insert into t values (1, '๎'), (2, '์');

(The character in row 1 is \u0e4e, and the one in row 2 is \u0e4c.)

Update the database in Java 8, which has different ordering:

connect 'jdbc:derby:thaidb';
insert into t values (3, '๎');

(The character is \u0e4e.)

The table contents now are:

ij> select * from t;
X          |C
-------------
1          |๎
2          |์
3          |๎

3 rows selected

The value of C is identical in row 1 and row 3, even though there is a UNIQUE constraint on
the column.

[Comment edited: Added spaces around URL to prevent JIRA from garbling it.]
                
      was (Author: knutanders):
    Java 8 fixes a bug in Thai locale (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6755060),
and that causes a similar issue on upgrade with territory-based collation.

Example, create database in Java 7:

connect 'jdbc:derby:thaidb;territory=th;collation=TERRITORY_BASED;create=true';
create table t(x int, c char(1) unique not null);
insert into t values (1, '๎'), (2, '์');

(The character in row 1 is \u0e4e, and the one in row 2 is \u0e4c.)

Update the database in Java 8, which has different ordering:

connect 'jdbc:derby:thaidb';
insert into t values (3, '๎');

(The character is \u0e4e.)

The table contents now are:

ij> select * from t;
X          |C
-------------
1          |๎
2          |์
3          |๎

3 rows selected

The value of C is identical in row 1 and row 3, even though there is a UNIQUE constraint on
the column.
                  
> Territory-based collation is not robust against changes in the collation rules
> ------------------------------------------------------------------------------
>
>                 Key: DERBY-5959
>                 URL: https://issues.apache.org/jira/browse/DERBY-5959
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.10.0.0
>            Reporter: Knut Anders Hatlen
>
> When accessing a database with territory-based collation, Derby will use the collation
rules of the collator returned by Collator.getInstance(databaseLocale). However, there is
no guarantee that those rules are consistent across different JVM vendors and versions. This
means that the ordering could vary, and inconsistencies could sneak into the indexes.
> One example is that Oracle's JDK changed the collation rules for Turkish between Java
5 and Java 6, so if you run the following script
> connect 'jdbc:derby:memory:db;territory=tr_TR;collation=TERRITORY_BASED;create=true';
> create table t(c char(2));
> insert into t values 'ıa', 'Ia', 'ia', 'İa', 'ıb', 'Ib', 'ib', 'İb';
> select * from t order by c;
> you'll get different results on Java 5 and on Java 6 and later.
> Java 5 will order the results like this:
> ij> select * from t order by c;
> C   
> ----
> ıa  
> Ia  
> ia  
> İa  
> ıb  
> Ib  
> ib  
> İb  
> 8 rows selected
> Java 6 and later order them like this like this:
> ij> select * from t order by c;
> C   
> ----
> ıa  
> Ia  
> ıb  
> Ib  
> ia  
> İa  
> ib  
> İb  
> 8 rows selected

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message