hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4139) [Hive] multi group by statement is not optimized
Date Wed, 10 Sep 2008 16:16:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629845#action_12629845
] 

Ashish Thusoo commented on HADOOP-4139:
---------------------------------------

I should be done reviewing this in couple of hours...

A few minor comments though:

1. In the tests we should drop the created destination tables. At some point we want to ensure
that the cleanup code for a test is isolated within the test. (This is minor - I am ok with
it as is for now).
2. The check to disallow different distincts - can that be moved up and potentially even before
we generate the groupbyPlan. No point going through the entire processing stuff if we can
disallow it right up front.
3. Also a comment describing the algorithm somewhere would be great


> [Hive] multi group by statement is not optimized
> ------------------------------------------------
>
>                 Key: HADOOP-4139
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4139
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1
>
>
> A simple multi-group by statement is not optimized. A simple statement like:
> FROM SRC
> INSERT OVERWRITE TABLE DEST1 SELECT SRC.key, count(distinct  SUBSTR(SRC.value,4)) GROUP
BY SRC.key
> INSERT OVERWRITE TABLE DEST2 SELECT SRC.key, count(distinct  SUBSTR(SRC.value,4)) GROUP
BY SRC.key;
> results in making 2 copies of the data (SRC). Instead, the data can be first partially
aggregated on the distinct value and then aggregated. 
> The first step can be common to all group bys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message