Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 16396 invoked from network); 31 Mar 2011 12:05:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 12:05:46 -0000 Received: (qmail 89529 invoked by uid 500); 31 Mar 2011 12:05:46 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 89466 invoked by uid 500); 31 Mar 2011 12:05:46 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 89458 invoked by uid 500); 31 Mar 2011 12:05:45 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 89455 invoked by uid 99); 31 Mar 2011 12:05:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 12:05:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 12:05:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C66C98BE14 for ; Thu, 31 Mar 2011 12:05:05 +0000 (UTC) Date: Thu, 31 Mar 2011 12:05:05 +0000 (UTC) From: "Amareshwari Sriramadasu (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <184499456.24167.1301573105809.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <851514033.3984.1300206449788.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938 ] Amareshwari Sriramadasu commented on HIVE-2056: ----------------------------------------------- For a query of the form, "From table T insert overwrite table test1 select col1, count(distinct colx) group by col1 insert overwrite table test2 select col2, count(distinct colx) group by col2;" it is not possible to generate a single M/R job, because partitioning the input row by both col1 and col2 in a single stage does not work here. If the groupby keys are such that one keyset is a subset of the other, i.e. of the following form: "From table T insert overwrite table test1 select col1, count(distinct colx) group by col1 insert overwrite table test2 select col1, col2, count(distinct colx) group by col1, col2;", we can run it in a single MR job by spraying over common groupby keyset( i.e. col1). Will implement this and see if it reduces query execution time. Thoughts? > Generate single MR job for multi groupby query. > ----------------------------------------------- > > Key: HIVE-2056 > URL: https://issues.apache.org/jira/browse/HIVE-2056 > Project: Hive > Issue Type: Improvement > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira