hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vineet Garg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
Date Sun, 11 Jun 2017 19:46:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046080#comment-16046080
] 

Vineet Garg commented on HIVE-6348:
-----------------------------------

[~ashutoshc] Plan generated after subquery remove rule/de-correlation doesn't generate HiveSortLimit
on HiveSortLimit e.g. for query {code:sql} select * from part where p_size IN (select p_size
from part p where p.p_type <> part.p_name order by p_size) {code} plan just after decorrelation
looks like
{code:sql}
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5],
p_container=[$6], p_retailprice=[$7], p_comment=[$8])
  HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5],
p_container=[$6], p_retailprice=[$7], p_comment=[$8], BLOCK__OFFSET__INSIDE__FILE=[$9], INPUT__FILE__NAME=[$10],
ROW__ID=[$11])
    LogicalJoin(condition=[AND(<>($1, $13), =($5, $12))], joinType=[inner])
      HiveTableScan(table=[[default.part]], table:alias=[part])
      HiveAggregate(group=[{0, 1}])
        HiveProject(p_size=[$0], p_type0=[$1])
          HiveProject(p_size=[$0], p_type0=[$13])
            HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first])
              HiveProject(p_size=[$5], p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3],
p_type=[$4], p_size1=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8], block__offset__inside__file=[$9],
input__file__name=[$10], row__id=[$11], p_type0=[$4])
                LogicalFilter(condition=[IS NOT NULL($4)])
                  HiveTableScan(table=[[default.part]], table:alias=[p])
{code}
So you have one sort limit on right side of join.  One possible rule could be if top project
doesn't project any column/expression from right side then remove HiveSortLimit from right
side of join.

> Order by/Sort by in subquery
> ----------------------------
>
>                 Key: HIVE-6348
>                 URL: https://issues.apache.org/jira/browse/HIVE-6348
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Rui Li
>            Priority: Minor
>              Labels: sub-query
>         Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any order by/sort
by in the sub query unless you use 'limit '. Could even go so far as barring it at the semantic
level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message