hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2056) Generate single MR job for multi groupby query.
Date Thu, 31 Mar 2011 12:05:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013938#comment-13013938
] 

Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

For a query of the form,
"From table T
 insert overwrite table test1 select col1, count(distinct colx) group by col1
 insert overwrite table test2 select col2, count(distinct colx) group by col2;" 
it is not possible to generate a single M/R job, because partitioning the input row by both
col1 and col2 in a single stage does not work here. 
If the groupby keys are such that one keyset is a subset of the other, i.e. of the following
form: 
"From table T 
insert overwrite table test1 select col1, count(distinct colx) group by col1 
insert overwrite table test2 select col1, col2, count(distinct colx) group by col1, col2;",

we can run it in a single MR job by spraying over common groupby keyset( i.e. col1). Will
implement this and see if it reduces query execution time.

Thoughts? 



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message