hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek <abhishek.dod...@gmail.com>
Subject Re: Hive configuration property
Date Wed, 26 Sep 2012 15:39:55 GMT
Thanks bejoy, I will try that.

Regards 
Abhi

Sent from my iPhone

On Sep 26, 2012, at 11:34 AM, Bejoy KS <bejoy_ks@yahoo.com> wrote:

> Hi Abshiek
> 
> Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks)
based on the data volume your query handles. It can yield you better performance numbers.

>  
> Regards,
> Bejoy KS
> 
> From: Abhishek <abhishek.dodda1@gmail.com>
> To: "user@hive.apache.org" <user@hive.apache.org> 
> Cc: "user@hive.apache.org" <user@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.
> 
> Regards
> Abhi
> 
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bharathvissapragada1990@gmail.com>
wrote:
> 
>> 
>> I'm no expert in hive, but here are my 2 cents. 
>> 
>> By default hive schedules a reducer per every 1 GB of data ( change that value by
modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will
be large number of reducers, which might be unnecessary.( Sometimes large number of reducers
slows down the job because their number exceeds total task slots and they keep waiting for
their turn. Not to forget, the initialization overheads for each task..jvm etc.).
>> 
>> Overall, I think there cannot be any optimum values for a cluster. It depends on
the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs
). So you can can check various values and see which one is the best. From my experience setting
"hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent
performance since all the reducers are completed in a single wave. (This may or maynot work
for you, worth giving a try).
>> 
>> 
>> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dodda1@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have doubt regarding below properties, is it a good practice to override below
properties in hive.
>> 
>> If yes, what is the optimal values for the following properties?
>> 
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> 
>> 
>> -- 
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
> 
> 

Mime
View raw message