db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation
Date Mon, 08 Dec 2008 17:16:46 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654469#action_12654469
] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

Just a guess, but I think the problem might be that o.a.d.iapi.types.CollatorSQLVarchar doesn't
override SQLChar.hashCode().

The reason why the abba/baab duplicate was removed, is that SQLChar.hashCode() just adds up
the char values ignoring the positions, so the two strings have the same hash code because
they contain the exact same characters (though in different order). Not necessary for this
issue, but it would probably be better to calculate with a formula similar to the one described
here <URL:http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()>. Just
adding the char values will give a poor distribution and the higher bits won't be used unless
the string is very long.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I
tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an
example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the
duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates
were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message