hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashwat shriparv <dwivedishash...@gmail.com>
Subject Re: How to balance reduce job
Date Tue, 07 May 2013 15:58:03 GMT
The number of reducer running depends on the data available.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton <TBurton@sportingindex.com>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <Ajay.Srivastava@guavus.com> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<user@hadoop.apache.org><user@hadoop.apache.org>; <
> bejoy.hadoop@gmail.com><bejoy.hadoop@gmail.com>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Cc: *Mohammad Tariq<dontariq@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <bejoy.hadoop@gmail.com>****
>
>  <bejoy.hadoop@gmail.com> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <dontariq@gmail.com> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *user@hadoop.apache.org<user@hadoop.apache.org>; Bejoy Ks<
> bejoy.hadoop@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <bejoy.hadoop@gmail.com> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <liujin666jin@sina.com> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *user@hadoop.apache.org<user@hadoop.apache.org>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .****
>
>  ****
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.****
>
>  ****
>
> Any ideas?****
>
>  ****
>
> Thanks.****
>
>  ****
> ------------------------------
>
> rauljin****
>
> ** **
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

Mime
View raw message