hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy.had...@gmail.com>
Subject Re: guessing number of reducers.
Date Wed, 21 Nov 2012 16:50:32 GMT
Hi Sasha

In general the number of reduce tasks is chosen mainly based on the data volume to reduce
phase. In tools like hive and pig by default for every 1GB of map output there will be a reducer.
So if you have 100 gigs of map output then 100 reducers.
If your tasks are more CPU intensive then you need lesser volume of data per reducer for better
performance results. 

In general it is better to have the number of reduce tasks slightly less than the number of
available reduce slots in the cluster.

Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: jamal sasha <jamalshasha@gmail.com>
Date: Wed, 21 Nov 2012 11:38:38 
To: user@hadoop.apache.org<user@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: guessing number of reducers.

By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers....
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data...
say average transaction occurring...
Now the output will be just one line in one "part"... rest of them will be
empty.So i am guessing i need loads of reducers but then most of them will
be empty but at the same time one reducer won't suffice..
What's the best way to solve this..
How to guess optimal number of reducers..

View raw message