db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation
Date Tue, 16 Dec 2008 22:54:45 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Attachment: derby-3975-1a.stat
                derby-3975-1a.diff

The attached patch attempts to fix the problem by implementing a hashCode() method in CollatorSQLChar,
CollatorSQLVarchar, CollatorSQLLongvarchar and CollatorSQLClob based on CollationKey.hashCode().
It also extends CollationTest.compareAgrave() with a test case for SELECT DISTINCT, and makes
it test both CHAR and VARCHAR (previously it only tested VARCHAR). CollationTest fails without
the fix and passes with the fix. The test is based on the fact that in the French locale,
À (Unicode code point 00C0) is the same as À ('A' + Unicode code point 0300), whereas they
are different in UCS_BASIC.

I will start the regression tests now.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I
tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an
example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the
duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates
were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message