hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates
Date Sat, 03 Jun 2017 00:05:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035656#comment-16035656
] 

Pengcheng Xiong commented on HIVE-15758:
----------------------------------------

Here is my 2 cents. I use a CTE to describe how i want to rewrite. The basic idea is to give
it a row number. In the first join, we use left join to keep all the row_nums. In the second
join, we are trying to match exactly the same "row" as we want.
{code}
with part_rewrite as
(select *, ROW_NUMBER() over () as row_num from part)
select p1.p_size, p1.p_type from part_rewrite p1,
(select count(pp.p_size) as cnt, row_num from part_rewrite left outer join part pp on part_rewrite.p_type
<> pp.p_type group by row_num)p2
where p1.p_size<>p2.cnt and p1.row_num=p2.row_num;
{code}

Here are the rewrite test results:
{code}
horton=# select * from part;
 p_size | p_type
--------+--------
      1 |      1
      1 |
      1 |      2
      1 |      3
      2 |      3
      2 |     32
    233 |      2
        |      2
        |
        |      3
    233 |      2
(11 rows)

horton=# select * from part where p_size <> (select count(p_size) from part pp where
part.p_type <> pp.p_type);
 p_size | p_type
--------+--------
      1 |      1
      1 |
      1 |      2
      1 |      3
      2 |      3
      2 |     32
    233 |      2
    233 |      2
(8 rows)

horton=# with part_rewrite as
horton-# (select *, ROW_NUMBER() over () as row_num from part)
horton-# select p1.p_size, p1.p_type from part_rewrite p1,
horton-# (select count(pp.p_size) as cnt, row_num from part_rewrite left outer join part pp
on part_rewrite.p_type <> pp.p_type group by row_num)p2
horton-# where p1.p_size<>p2.cnt and p1.row_num=p2.row_num;
 p_size | p_type
--------+--------
      1 |      1
      1 |
      1 |      2
      1 |      3
      2 |      3
      2 |     32
    233 |      2
    233 |      2
(8 rows)
{code}


> Allow correlated scalar subqueries with aggregates which has non-equi join predicates
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-15758
>                 URL: https://issues.apache.org/jira/browse/HIVE-15758
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Logical Optimizer
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>              Labels: sub-query
>
> Queries such as 
> {code} select * from part where p_size <> (select count(p_size) from part pp where
part.p_type <> pp.p_type); {code} are currently not allowed since HIVE doesn't know
how to rewrite such queries to preserve the correctness for cases when there is zero row



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message