hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5873) SubQuery: In subquery Count Bug
Date Mon, 13 Feb 2017 22:09:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864552#comment-15864552
] 

Julian Hyde commented on HIVE-5873:
-----------------------------------

This is fixed in Calcite as part of CALCITE-365. It's worth re-evaluating when Hive picks
up Calcite 1.12.

By the way, it's ambiguous in [~rhbutani]'s description above, but the query should return
two rows (yes, including PNum=8). I confirmed on Postgres:

{noformat}
> create table Part (PNum int, OrderOnHand int);
CREATE TABLE
> insert into Part values (3,6),(10,1),(8,0);
INSERT 0 3
> create table Supply (PNum int, Qty int);
CREATE TABLE
> insert into Supply values (3,4),(3,2),(10,1);
INSERT 0 3
> select pnum                                  
from Part p
where orderOnHand
     in (select count(*) from Supply s
          where s.pnum = p.pnum
         );
 pnum 
------
   10
    8
(2 rows)
{noformat}

> SubQuery: In subquery Count Bug
> -------------------------------
>
>                 Key: HIVE-5873
>                 URL: https://issues.apache.org/jira/browse/HIVE-5873
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Harish Butani
>
> This is from the Optimization of Nested SQl Queries Revisited paper: http://dl.acm.org/citation.cfm?id=38723
> Consider Part table having:
> {noformat}
> PNum OrderOnHand
> -------- ------------------
> 3          6
> 10        1
> 8          0
> {noformat}
> Supply table having:
> {noformat}
> PNum  Qty  
> 3          4
> 3          2
> 10        1
> {noformat}
> The query:
> {noformat}
> select pnum
> from parts p
> where orderOnHand
>      in (select count(*) from supply s
>           where s.pnum = p.pnum
>          )
> {noformat}
> should return the row with PNum=8.
> But a transformation to a semi-join would eliminate this row, as there are no rows in
supply table with PNum=8.
> AS shown in the paper the soln is to transform to:
> {noformat}
> select pnum
> from parts p semijoin
>         (select p1.pnum, count(*) as c
>           from (select distinct pnum from parts) p1 join supply s
>           where s.pnum = p1.pnum
>          ) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c
> {noformat}
> The additional distinct query within the SubQuery is to handle duplicates in the outer
query on the joining columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message