db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5584) Select statement with subqueries with group by and count distinct statements returns wrong number of results
Date Wed, 25 Jan 2012 03:12:43 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192835#comment-13192835
] 

Bryan Pendleton commented on DERBY-5584:
----------------------------------------

If you remove DISTINCT from the COUNT() aggregates, the problem disappears (from what I can
tell).

I think it's possible that the problem involves this section of GroupedAggregateResultSet:

                        /*
                        ** If there was a distinct aggregate, then that column
                        ** was automatically included as the last column in
                        ** the sort ordering. But we don't want it to be part
                        ** of the ordering anymore, because we aren't grouping
                        ** by that column, we just sorted it so that distinct
                        ** aggregation would see the values in order.
                        */

This part was quite tricky, and I remember struggling with it for a long time.

I think that the solution that is currently implemented in GroupedAggregateResultSet
may be assuming that the result set is only opened and read once.

However, during a query plan such as this cartesian product, the GROUP BY
subquery is created, then opened/read/closed, opened/read/closed, etc.,
once per row of the other side of the cartesian product.

Perhaps what we need to do  is have a better way of handling that extra
invisible column, so that we can consider it sometimes, and ignore it other times,
without doing something as destructive as physically removing it from the
ordering array, which is what we do now.

                
> Select statement with subqueries with group by and count distinct statements returns
wrong number of results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5584
>                 URL: https://issues.apache.org/jira/browse/DERBY-5584
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>    Affects Versions: 10.7.1.1
>         Environment: Output from sysinfo
> java.specification.name: Java Platform API Specification
> java.specification.version: 1.6
> java.runtime.version: 1.6.0_20-b02
> --------- Derby Information --------
> JRE - JDBC: Java SE 6 - JDBC 4.0
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derby.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbytools.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbynet.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbyclient.jar] 10.7.1.1 - (1040133)
> ------------------------------------------------------
> ----------------- Locale Information -----------------
> Current Locale :  [English/United States [en_US]]
> Found support for locale: [cs]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [de_DE]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [es]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [fr]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [hu]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [it]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ja_JP]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ko_KR]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [pl]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [pt_BR]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ru]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [zh_CN]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [zh_TW]
> 	 version: 10.7.1.1 - (1040133)
>            Reporter: Piotr Zgadzaj
>         Attachments: query.log
>
>
> Steps to reproduce:
> 1. Create database, connect to database with any JDBC client
> 2. create two tables:
> CREATE TABLE TEST_5 (
>        profile_id INTEGER NOT NULL,
>        group_ref INTEGER NOT NULL,
>        matched_count INTEGER NOT NULL
>    );
>    CREATE TABLE TEST_6 (
>        profile_id INTEGER NOT NULL,
>        group_ref INTEGER NOT NULL,
>        matched_count INTEGER NOT NULL
>    );
> 3. Insert two records for each table:
> insert into test_5 values (1, 10000,1);
> insert into test_5 values (2, 10000, 2);
> insert into test_6 values (1, 10000,1);
> insert into test_6 values (2, 10000, 2);
> 4. Run following statement
> SELECT *
> FROM
>  (SELECT ps1.group_ref,
>    COUNT(DISTINCT ps1.matched_count) AS matched_count
>  FROM test_5 ps1
>  GROUP BY ps1.group_ref,
>    ps1.profile_id
>  ) a,
>  (SELECT ps2.group_ref,
>    COUNT( DISTINCT ps2.matched_count) AS matched_count
>  FROM test_6 ps2
>  GROUP BY ps2.group_ref,
>    ps2.profile_id
>  ) b
> As a result I've got 3 records instead of 4 - at least Oracle 10g
> returns 4 records for this statement. Maybe i'm doing something wrong.
> Do you have any suggestions / possible workarounds for this problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message