db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-6227) Distinct aggregates don't work well with territory-based collation
Date Wed, 18 Jun 2014 13:43:02 GMT

     [ https://issues.apache.org/jira/browse/DERBY-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-6227:

    Attachment: d6227-1a.diff

I think the fix for this bug is to make the list of distinct values hold DataValueDescriptors
instead of java.lang.String objects, since the DataValueDescriptor classes implement equals()
in a way that takes the database collation into consideration.

The attached patch [^d6227-1a.diff] changes GroupedAggregateResultSet so that it does that.
All regression tests ran cleanly with that patch, including a new test case for this bug.

I wasn't quite sure if it was safe to store the DVDs directly in the list of distinct values,
or if they would need to be cloned first (because some result sets reuse old DVDs when reading
new rows, and we don't want the values in the list to change). I concluded that it was safe
to store them directly, because distinct aggregates always take their rows from a sorter and
not directly from the result set, and the sorter returns clones of the original DVDs. I added
an assert that verifies that the row is read from the sorter, and a comment that says we should
check ResultSet.needsToClone() if we ever start reading directly from the result set when
processing distinct aggregates.

> Distinct aggregates don't work well with territory-based collation
> ------------------------------------------------------------------
>                 Key: DERBY-6227
>                 URL: https://issues.apache.org/jira/browse/DERBY-6227
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:,,,,,,,
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_11
>         Attachments: d6227-1a.diff
>  When working on DERBY-5840, I noticed that GroupedAggregateResultSet would do duplicate
elimination by comparing the java.lang.String representation of the values. With territory-based
collation, it is possible that two values that have different java.lang.String representation
should be considered duplicates, and this logic will produce incorrect results.
> Example:
> ij version 10.10
> ij> connect 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true';
> ij> create table t(i int, s varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), (3, 'A'),
(3, 'b'), (3, 'B'), (3, 'c');
> 9 rows inserted/updated/deleted
> ij> select distinct s from t;
> S         
> ----------
> b         
> a         
> c         
> 3 rows selected
> ij> select i, count(distinct s) from t group by i;
> I          |2          
> -----------------------
> 1          |1          
> 2          |2          
> 3          |5          
> 3 rows selected
> I would have expected the last query to return
> (1, 1)
> (2, 1)
> (3, 3)

This message was sent by Atlassian JIRA

View raw message