db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation
Date Wed, 17 Dec 2008 13:07:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657373#action_12657373
] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

One thing I forgot to mention in the description of the patch:

The patch changes the hashCode() implementation in CollatorSQL{Char,Varchar,Longvarchar,Clob},
but it only tests CHAR and VARCHAR. That's because LONG VARCHAR and CLOB columns cannot be
used in distinct queries without casting them to another data type first, and since you cannot
compare two such columns with =, you cannot perform a hash join to test it either. I'm not
aware of any other code than distinct scans and hash scans that will ever call the hashCode()
methods of these objects, so I don't know of any way to test those two data types.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I
tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an
example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the
duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates
were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message