hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek <abhishek.dod...@gmail.com>
Subject Re: How to optimize a group by query
Date Wed, 26 Sep 2012 17:58:10 GMT
Thanks bejoy.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 1:42 PM, Bejoy KS <bejoy_ks@yahoo.com> wrote:

> Hi Abshiek
> 
> From the map reduce logs you can see whether the data processed by one reducer is much
more than that of other reducers. Or in short one reducer takes relatively longer time complete
compared to others.
> 
> Also to my previous mail, one more optimization is possible for group By if your table
is bucketed or sorted bucketed. This optimization applies when the Group By columns are same
as bucketed columns or the group by columns are a subset of sorted bucked columns. This optimization
is enabled using 'hive.optimize.groupby' which is true by default
>  
> Regards,
> Bejoy KS
> 
> From: Abhishek <abhishek.dodda1@gmail.com>
> To: "user@hive.apache.org" <user@hive.apache.org> 
> Cc: "user@hive.apache.org" <user@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 10:59 PM
> Subject: Re: How to optimize a group by query
> 
> Hi Bejoy,
> 
> Thanks for the reply, how can I know data skew among reducers.
> 
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 1:20 PM, Bejoy KS <bejoy_ks@yahoo.com> wrote:
> 
>> Hi Abshiek
>> 
>> Group by performance can be improved by the following
>> 1)enabling map side aggregation. In latest versions it is enabled by default
>> SET hive.map.aggr = true;
>> 
>> 2)Is there a data skew observed in some of the reducers?
>> If so a better performance can be yielded by setting the following property
>> SET hive.groupby.skewindata=true;
>> 
>>  
>> Regards,
>> Bejoy KS
>> 
>> From: Abhishek <abhishek.dodda1@gmail.com>
>> To: Hive <user@hive.apache.org> 
>> Sent: Wednesday, September 26, 2012 10:31 PM
>> Subject: How to optimize a group by query 
>> 
>> Hi all,
>> 
>> I have written a query with group by clause, it is consuming lot of time is there
any way to optimize this any configuration property or some thing.
>> 
>> Regards 
>> Abhi
>> 
>> 
>> Sent from my iPhone
> 
> 

Mime
View raw message