db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5584) Select statement with subqueries with group by and count distinct statements returns wrong number of results
Date Tue, 24 Jan 2012 05:57:40 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191908#comment-13191908
] 

Bryan Pendleton commented on DERBY-5584:
----------------------------------------

Not only are there 3 rows instead of 4, but the 3rd row has strange values. When I run the
query I see:

GROUP_REF  |MATCHED_CO&|GROUP_REF  |MATCHED_CO&
-----------------------------------------------
10000      |1          |10000      |1          
10000      |1          |10000      |1          
10000      |1          |10000      |2          

Given that the two subqueries, separately, each return

GROUP_REF  |MATCHED_CO&
-----------------------
10000      |1          
10000      |1          

I think that the resulting cartesian product 4-row answer should have been:

GROUP_REF  |MATCHED_CO&|GROUP_REF  |MATCHED_CO&
-----------------------------------------------
10000      |1          |10000      |1          
10000      |1          |10000      |1          
10000      |1          |10000      |1          
10000      |1          |10000      |1          

That is, it doesn't seem like we're losing a row; rather, it seems like we're combining two
rows into one.

So perhaps the problem has something to do with temporary data management; that is, perhaps
during the execution of the cartesian produce we are closing and re-opening the inner table-expression,
and when we do that, instead of re-computing and returning two rows (10000,1), (10000,1),
we are instead returning a single row (10000,2)

I tried running a slight variation of the problematic query:

SELECT *
FROM
 (SELECT ps1.group_ref, ps1.profile_id,
   COUNT(DISTINCT ps1.matched_count) AS matched_count
 FROM test_5 ps1
 GROUP BY ps1.group_ref,
   ps1.profile_id
 ) a,
 (SELECT ps2.group_ref, ps2.profile_id,
   COUNT( DISTINCT ps2.matched_count) AS matched_count
 FROM test_6 ps2
 GROUP BY ps2.group_ref,
   ps2.profile_id
 ) b 

(note that I included 'profile_id' explicitly in the select list of each sub-query), and the
results were:

GROUP_REF  |PROFILE_ID |MATCHED_CO&|GROUP_REF  |PROFILE_ID |MATCHED_CO&
-----------------------------------------------------------------------
10000      |1          |1          |10000      |1          |1          
10000      |1          |1          |10000      |2          |1          
10000      |2          |1          |10000      |1          |2          

I also extended the test case a bit by additionally doing:

insert into test_6 values (3, 10000, 3 );

after which the query produced:

GROUP_REF  |PROFILE_ID |MATCHED_CO&|GROUP_REF  |PROFILE_ID |MATCHED_CO&
-----------------------------------------------------------------------
10000      |1          |1          |10000      |1          |1          
10000      |1          |1          |10000      |2          |1          
10000      |1          |1          |10000      |3          |1          
10000      |2          |1          |10000      |1          |3          

and then

insert into test_6 values (4, 10000, 4 );

after which the query produced:

GROUP_REF  |PROFILE_ID |MATCHED_CO&|GROUP_REF  |PROFILE_ID |MATCHED_CO&
-----------------------------------------------------------------------
10000      |1          |1          |10000      |1          |1          
10000      |1          |1          |10000      |2          |1          
10000      |1          |1          |10000      |3          |1          
10000      |1          |1          |10000      |4          |1          
10000      |2          |1          |10000      |1          |4          

And, lastly,

insert into test_5 values (3, 10000, 3);

and got 

GROUP_REF  |PROFILE_ID |MATCHED_CO&|GROUP_REF  |PROFILE_ID |MATCHED_CO&
-----------------------------------------------------------------------
10000      |1          |1          |10000      |1          |1          
10000      |1          |1          |10000      |2          |1          
10000      |1          |1          |10000      |3          |1          
10000      |1          |1          |10000      |4          |1          
10000      |2          |1          |10000      |1          |4          
10000      |3          |1          |10000      |4          |4          

By this point it becomes quite clear that the problem is that, on the first pass through the
inner table,
the cartesian product observes all the rows of the inner table, but on all subsequent passes,
the
inner table's rows are collapsed into a single row (perhaps because we are losing track of
the 'profile_id' column in that inner table somehow?)

I'm not sure how much more time I'll have to play with this right away, but wanted to record
these findings.

                
> Select statement with subqueries with group by and count distinct statements returns
wrong number of results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5584
>                 URL: https://issues.apache.org/jira/browse/DERBY-5584
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>    Affects Versions: 10.7.1.1
>         Environment: Output from sysinfo
> java.specification.name: Java Platform API Specification
> java.specification.version: 1.6
> java.runtime.version: 1.6.0_20-b02
> --------- Derby Information --------
> JRE - JDBC: Java SE 6 - JDBC 4.0
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derby.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbytools.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbynet.jar] 10.7.1.1 - (1040133)
> [/home/piotrz/Desktop/db-derby-10.7.1.1-bin/lib/derbyclient.jar] 10.7.1.1 - (1040133)
> ------------------------------------------------------
> ----------------- Locale Information -----------------
> Current Locale :  [English/United States [en_US]]
> Found support for locale: [cs]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [de_DE]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [es]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [fr]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [hu]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [it]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ja_JP]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ko_KR]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [pl]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [pt_BR]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [ru]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [zh_CN]
> 	 version: 10.7.1.1 - (1040133)
> Found support for locale: [zh_TW]
> 	 version: 10.7.1.1 - (1040133)
>            Reporter: Piotr Zgadzaj
>
> Steps to reproduce:
> 1. Create database, connect to database with any JDBC client
> 2. create two tables:
> CREATE TABLE TEST_5 (
>        profile_id INTEGER NOT NULL,
>        group_ref INTEGER NOT NULL,
>        matched_count INTEGER NOT NULL
>    );
>    CREATE TABLE TEST_6 (
>        profile_id INTEGER NOT NULL,
>        group_ref INTEGER NOT NULL,
>        matched_count INTEGER NOT NULL
>    );
> 3. Insert two records for each table:
> insert into test_5 values (1, 10000,1);
> insert into test_5 values (2, 10000, 2);
> insert into test_6 values (1, 10000,1);
> insert into test_6 values (2, 10000, 2);
> 4. Run following statement
> SELECT *
> FROM
>  (SELECT ps1.group_ref,
>    COUNT(DISTINCT ps1.matched_count) AS matched_count
>  FROM test_5 ps1
>  GROUP BY ps1.group_ref,
>    ps1.profile_id
>  ) a,
>  (SELECT ps2.group_ref,
>    COUNT( DISTINCT ps2.matched_count) AS matched_count
>  FROM test_6 ps2
>  GROUP BY ps2.group_ref,
>    ps2.profile_id
>  ) b
> As a result I've got 3 records instead of 4 - at least Oracle 10g
> returns 4 records for this statement. Maybe i'm doing something wrong.
> Do you have any suggestions / possible workarounds for this problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message