db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5959) Territory-based collation is not robust against changes in the collation rules
Date Mon, 22 Oct 2012 15:02:12 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481411#comment-13481411
] 

Knut Anders Hatlen commented on DERBY-5959:
-------------------------------------------

Example where this would lead to inconsistencies:

Create a database using Java 5:

connect 'jdbc:derby:tr-db;territory=tr_TR;collation=TERRITORY_BASED;create=true';
create table t(c char(2));
insert into t values 'ıa', 'Ia', 'ia', 'İa', 'ıb', 'Ib', 'ib', 'İb';
create unique index idx on t(c);

Then update the database using Java 6 or later:

connect 'jdbc:derby:tr-db';
insert into t values 'ıb';
select * from t;

The result of the SELECT statement is:

ij> select * from t;
C   
----
ıa  
Ia  
ıb  
ia  
İa  
ıb  
Ib  
ib  
İb  

9 rows selected

The value 'ıb' appears twice, even though there is a unique index on the column.
                
> Territory-based collation is not robust against changes in the collation rules
> ------------------------------------------------------------------------------
>
>                 Key: DERBY-5959
>                 URL: https://issues.apache.org/jira/browse/DERBY-5959
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.10.0.0
>            Reporter: Knut Anders Hatlen
>
> When accessing a database with territory-based collation, Derby will use the collation
rules of the collator returned by Collator.getInstance(databaseLocale). However, there is
no guarantee that those rules are consistent across different JVM vendors and versions. This
means that the ordering could vary, and inconsistencies could sneak into the indexes.
> One example is that Oracle's JDK changed the collation rules for Turkish between Java
5 and Java 6, so if you run the following script
> connect 'jdbc:derby:memory:db;territory=tr_TR;collation=TERRITORY_BASED;create=true';
> create table t(c char(2));
> insert into t values 'ıa', 'Ia', 'ia', 'İa', 'ıb', 'Ib', 'ib', 'İb';
> select * from t order by c;
> you'll get different results on Java 5 and on Java 6 and later.
> Java 5 will order the results like this:
> ij> select * from t order by c;
> C   
> ----
> ıa  
> Ia  
> ia  
> İa  
> ıb  
> Ib  
> ib  
> İb  
> 8 rows selected
> Java 6 and later order them like this like this:
> ij> select * from t order by c;
> C   
> ----
> ıa  
> Ia  
> ıb  
> Ib  
> ia  
> İa  
> ib  
> İb  
> 8 rows selected

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message