Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E7EB0E447 for ; Wed, 6 Feb 2013 08:59:16 +0000 (UTC) Received: (qmail 15537 invoked by uid 500); 6 Feb 2013 08:59:16 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 15220 invoked by uid 500); 6 Feb 2013 08:59:16 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 14716 invoked by uid 500); 6 Feb 2013 08:59:15 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 14711 invoked by uid 99); 6 Feb 2013 08:59:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 08:59:15 +0000 Date: Wed, 6 Feb 2013 08:59:14 +0000 (UTC) From: "Phabricator (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572273#comment-13572273 ] Phabricator commented on HIVE-2340: ----------------------------------- navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138 ok. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 I wish I could but CommonJoinResolver is a physical optimizer, which means there is no RS-RS operator tree which could me merged on that stage. I'm thinking of disabling this optimization if user configured hive.auto.convert.join=true or hive.auto.convert.join.noconditionaltask=true. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251 I'll add more explanations on hive-default.xml.template ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 For rules with same cost, DefaultRuleDispatcher selects last one, something like this, {code} if ((cost >= 0) && (cost <= minCost)) { minCost = cost; rule = r; } {code} So R2 will be selected. conf/hive-default.xml.template:1034 It's commented on https://issues.apache.org/jira/browse/HIVE-2340?focusedCommentId=13568361&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568361 This optimization merges two RSs by moving key/parts/num-reducers of child-RS to parent-RS, which means if num-reducer of child-RS is fixed (order by or forced bucketing) and small, it can resulted to very slow, single MR. For preventing this, the configuration makes min threshold for applying this optimization. It's not good enough, but I cannot think of better idea. REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -------------------------------------- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira