hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
Date Wed, 06 Feb 2013 08:59:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572273#comment-13572273
] 

Phabricator commented on HIVE-2340:
-----------------------------------

navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138 ok.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 I wish
I could but CommonJoinResolver is a physical optimizer, which means there is no RS-RS operator
tree which could me merged on that stage.

  I'm thinking of disabling this optimization if user configured hive.auto.convert.join=true
or hive.auto.convert.join.noconditionaltask=true.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251 I'll add
more explanations on hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 For rules
with same cost, DefaultRuleDispatcher selects last one, something like this,
  {code}
  if ((cost >= 0) && (cost <= minCost)) {
      minCost = cost;
      rule = r;
  }
  {code}
  So R2 will be selected.
  conf/hive-default.xml.template:1034 It's commented on https://issues.apache.org/jira/browse/HIVE-2340?focusedCommentId=13568361&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568361

  This optimization merges two RSs by moving key/parts/num-reducers of child-RS to parent-RS,
which means if num-reducer of child-RS is fixed (order by or forced bucketing) and small,
it can resulted to very slow, single MR. For preventing this, the configuration makes min
threshold for applying this optimization. It's not good enough, but I cannot think of better
idea.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain

                
> optimize orderby followed by a groupby
> --------------------------------------
>
>                 Key: HIVE-2340
>                 URL: https://issues.apache.org/jira/browse/HIVE-2340
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: perfomance
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch,
HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch,
testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by
following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message