hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-2056) Generate single MR job for multi groupby query.
Date Tue, 15 Mar 2011 16:29:29 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006995#comment-13006995
] 

Amareshwari Sriramadasu commented on HIVE-2056:
-----------------------------------------------

Here is a request from one of our customers:

here is a real example of need to have multi group by with 1 M/R. If
you look at the query below, we have two aggregates being generated out of single fact table.
The 1st aggregate
generates unique count by date and the 2nd one generates unique count by date and gender.
We have lot of
these aggregates to be built. We would like this to be done in 1 M/R job as against three
below. Is it possible to do
this in Hive?

// created two intermediate tables

hive> create table test_1 (dt string, bc_cnt bigint);

OK

Time taken: 9.004 seconds

hive> create table test_2 (dt string, gender string, bc_cnt bigint);

OK



// multi group by in insert statement



hive> from fact_table f

    > insert overwrite table test_1 select dt, count(distinct id) group by dt

    > insert overwrite table test_2 select dt,gender,count(distinct id) group by dt,gender;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified. Estimated from input data size: 999

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>



Thanks

Sudhish



> Generate single MR job for multi groupby query.
> -----------------------------------------------
>
>                 Key: HIVE-2056
>                 URL: https://issues.apache.org/jira/browse/HIVE-2056
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message