hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Burton <TBur...@SportingIndex.com>
Subject RE: How to balance reduce job
Date Tue, 07 May 2013 15:13:14 GMT

The typical Partitioner method for assigning reducer r from reducers R is

r = hash(key) % count(R)

However if you find your partitioner is assigning your data to too few or one reducers, I
found that changing the count(R) to the next odd number or (even better) prime number above
count(R) is a good rule of thumb to follow.


From: bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
Sent: 17 April 2013 07:19
To: user@hadoop.apache.org
Cc: Mohammad Tariq
Subject: Re: How to balance reduce job

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom
issue as I understand is that his reduce tasks are scheduled on just a few nodes.
Bejoy KS

Sent from remote device, Please excuse typos
From: Ajay Srivastava <Ajay.Srivastava@guavus.com<mailto:Ajay.Srivastava@guavus.com>>
Date: Wed, 17 Apr 2013 06:02:30 +0000
To: <user@hadoop.apache.org<mailto:user@hadoop.apache.org>><user@hadoop.apache.org<mailto:user@hadoop.apache.org>>;
ReplyTo: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Cc: Mohammad Tariq<dontariq@gmail.com<mailto:dontariq@gmail.com>>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that
keys are skewed then most of the records may go to same reducer.

Ajay Srivastava

On 17-Apr-2013, at 11:08 AM, <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
 <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>> wrote:

Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly
distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely
based on slot availability.
Bejoy KS

Sent from remote device, Please excuse typos
From: Mohammad Tariq <dontariq@gmail.com<mailto:dontariq@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org><user@hadoop.apache.org<mailto:user@hadoop.apache.org>>;
Bejoy Ks<bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly
distributed across the HDFS?

Warm Regards,

On Wed, Apr 17, 2013 at 10:39 AM, <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks
for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy
to ensure that all TT are evenly loaded with same number of reducers.
Bejoy KS

Sent from remote device, Please excuse typos
From: rauljin <liujin666jin@sina.com<mailto:liujin666jin@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org><user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running
the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?



Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally
privileged.  If you are not the intended recipient, then the dissemination or copying of this
email is prohibited. If you have received this in error, please notify the sender by replying
by email and then delete the email completely from your system.  Neither Sporting Index nor
the sender accepts responsibility for any virus, or any other defect which might affect any
computer or IT system into which the email is received and/or opened.  It is the responsibility
of the recipient to scan the email and no responsibility is accepted for any loss or damage
arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered
in England and Wales with company number 2636842, whose registered office is at Gateway House,
Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the
UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
 Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM
View raw message