hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek <abhishek.dod...@gmail.com>
Subject Re: Hive query optimization
Date Tue, 24 Jul 2012 12:47:26 GMT
Hi Tatarinov,

Thanks for the reply, by my understanding did you mean to set number to reduce tasks equal
to number of reduce slots in the cluster?

Regards
Abhi


Sent from my iPhone

On Jul 24, 2012, at 12:51 AM, Igor Tatarinov <igor@decide.com> wrote:

> Here is my 2 cents.
> The parameters you are looking at are quite specific. Unless you know what you are doing
it might be hard to set them exactly right and they shouldn't make that much of a difference
- again unless you know the specifics.
> 
> What worked for me is using a single "wave" of reducers. Basically, you want to set the
number of reduce tasks to be equal to the number of reduce slots (assuming your job will run
by itself).
> 
> It might also help to re-arrange your joins so that the larger table is streamed (https://cwiki.apache.org/Hive/languagemanual-joins.html).
> That seems especially important with map joins since those fail if there is not enough
memory and have to be rerun as regular joins.
> 
> Hope this helps.
> 
> On Mon, Jul 23, 2012 at 6:54 PM, abhiTowson cal <abhishek.dodda1@gmail.com> wrote:
> Hi all,
> 
> Some queries in hive are executing for too long.So i have overriden
> some parameters in hive, for some querys performance increased rapidly
> when i overriden this properities  for some querys no change in
> performance.Can any one you
> tell me any other optimizations in hive apart from partitions and
> buckets,
> 
> set io.sort.mb=512;
> set io.sort.factor=100;
> set mapred.reduce.parallel.copies=40;
> set hive.map.aggr =true;
> set hive.exec.parallel=true;
> set hive.groupby.skewindata=true;
> set mapred.job.reuse.jvm.num.tasks=-1;
> 
> default values were
> 
> io.sort.mb=256;
> io.sort.factor=10;
> mapred.reduce.parallel.copies=10;
> 
> Thanks
> Abhishek
> 

Mime
View raw message