hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4415) [Hive] group by over a subquery with a cluster by not optimized
Date Wed, 15 Oct 2008 06:25:44 GMT
[Hive] group by over a subquery with a cluster by not optimized
---------------------------------------------------------------

                 Key: HADOOP-4415
                 URL: https://issues.apache.org/jira/browse/HADOOP-4415
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/hive
            Reporter: Namit Jain
            Assignee: Namit Jain


Consider the following


select x.a, count(x.b) from (select ...... cluster by a) x group by x.a


Even though the user has specifically asked to cluster by a, the group by will again run 2
map-reduce jobs,
sorting by a random number and a in that order. So, there will be a total of 3 map-reduce
jobs sorting
by a, random and a respectively - this should be optimized

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message