db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-504) SELECT DISTINCT returns duplicates when selecting from subselects
Date Mon, 05 Sep 2005 09:27:32 GMT
    [ http://issues.apache.org/jira/browse/DERBY-504?page=comments#action_12322650 ] 

Knut Anders Hatlen commented on DERBY-504:

It is true that the change in SelectNode is doing redundant work, but not because (resultColumns.countNumberOfSimpleColumnReferences()
== resultColumns.size()) guarantees that all result columns are simple columns (if we by "simple
columns" mean a column in a base table, no aggregates etc). E.g. in the query 'SELECT a FROM
(SELECT AVG(age) AS a FROM names) AS n', resultColumns.countNumberOfSimpleColumnReferences()
equals resultColumns.size(), but the result column is not simple. The redundancy is the other
way around: If (but not only if) all colums are simple, then (countNumberOfSimpleColumnReferences()
== size()) is true.

I can submit a patch which removes this redundant checking. It doesn't seem like ResultColumnList.countNumberOfSimpleColumnReferences()
is used anywhere else in the code. If I remove the call to countNumberOfSimpleColumnReferences()
and it is not used anywhere else, should I then also remove the definition of the method to
make the code cleaner, or should I leave the method in case it would be needed in the future?

> SELECT DISTINCT returns duplicates when selecting from subselects
> -----------------------------------------------------------------
>          Key: DERBY-504
>          URL: http://issues.apache.org/jira/browse/DERBY-504
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions:
>  Environment: Latest development sources (SVN revision 232227), Sun JDK 1.5, Solaris/x86
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>     Priority: Minor
>  Attachments: DERBY-504.diff, DERBY-504.stat, DERBY-504_b.diff, DERBY-504_b.stat, DERBY-504_c-CRLF.diff,
DERBY-504_c-CRLF.diff, DERBY-504_c.diff, DERBY-504_c.stat
> When one performs a select distinct on a table generated by a subselect, there sometimes
are duplicates in the result. The following example shows the problem:
> ij> CREATE TABLE names (id INT PRIMARY KEY, name VARCHAR(10));
> 0 rows inserted/updated/deleted
> ij> INSERT INTO names (id, name) VALUES
>        (1, 'Anna'), (2, 'Ben'), (3, 'Carl'),
>        (4, 'Carl'), (5, 'Ben'), (6, 'Anna');
> 6 rows inserted/updated/deleted
> ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> Carl      
> Ben       
> Anna      
> Six names are returned, although only three names should have been returned.
> When the result is explicitly sorted (using ORDER BY) or the id column is removed from
the subselect, the query returns three names as expected:
> ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n ORDER BY name;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> 3 rows selected
> ij> SELECT DISTINCT(name) FROM (SELECT name FROM names) AS n;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> 3 rows selected

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message