hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bejoy.had...@gmail.com
Subject Re: How to balance reduce job
Date Wed, 17 Apr 2013 06:18:53 GMT
Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom
issue as I understand is that his reduce tasks are scheduled on just a few nodes.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Ajay Srivastava <Ajay.Srivastava@guavus.com>
Date: Wed, 17 Apr 2013 06:02:30 
To: <user@hadoop.apache.org><user@hadoop.apache.org>; <bejoy.hadoop@gmail.com><bejoy.hadoop@gmail.com>
Reply-To: user@hadoop.apache.org
Cc: Mohammad Tariq<dontariq@gmail.com>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that
keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
 <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly
distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely
based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <dontariq@gmail.com<mailto:dontariq@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org><user@hadoop.apache.org<mailto:user@hadoop.apache.org>>;
Bejoy Ks<bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly
distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks
for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy
to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <liujin666jin@sina.com<mailto:liujin666jin@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org><user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running
the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin



Mime
View raw message