Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC2BEDB35 for ; Mon, 3 Sep 2012 15:47:31 +0000 (UTC) Received: (qmail 31388 invoked by uid 500); 3 Sep 2012 15:47:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 31311 invoked by uid 500); 3 Sep 2012 15:47:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31304 invoked by uid 99); 3 Sep 2012 15:47:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 15:47:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bejoy.hadoop@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 15:47:20 +0000 Received: by iecs9 with SMTP id s9so4317133iec.35 for ; Mon, 03 Sep 2012 08:46:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=92OMt/Tw/f7BLPYMeq1Y5z9PZubC2A+K3fngBDsd6Ds=; b=zpB149s8i9nEvXtFtukws2Q+2fviwZSinJJinVb6QR7dQZj7xaPRGBKp2IMyIHfQK8 XwuVB6d3ynKKZc9wv0MeY8wLi+DCPL3RIGUvCHxRV/lNUzP6U1RiVpRaPvnYxhhlThii M6YGBX1CR01ssjI4WU/k9fJS+rhYH47Mh26XOAVJ3TRNrEfcvDqfEH/F7dELjwXp8c4N K/Lz5CFeV02ymjo4Dzi45+KpBKyY24OLBpLRP7F56j4Y6LUdUCsYuLvSxQoydbjajZvG QF1fOf+0EMqW5oAnn5aZctge2PG/rJrmtC2UuYbl+tz0t0KjvDZr52n5Hk3zJ1+qbbWJ zhZA== MIME-Version: 1.0 Received: by 10.43.7.132 with SMTP id oo4mr15524536icb.6.1346687219695; Mon, 03 Sep 2012 08:46:59 -0700 (PDT) Received: by 10.64.68.169 with HTTP; Mon, 3 Sep 2012 08:46:59 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Sep 2012 21:16:59 +0530 Message-ID: Subject: Re: knowing the nodes on which reduce tasks will run From: Bejoy Ks To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec50fe2df5a80ed04c8ce0d77 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec50fe2df5a80ed04c8ce0d77 Content-Type: text/plain; charset=ISO-8859-1 Hi Abhay You need this value to be changed before you submit your job and restart TT. Modifying this value in mid time won't affect the running jobs. On Mon, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi < abhay.ratnaparkhi@gmail.com> wrote: > How can I set 'mapred.tasktracker.reduce.tasks.maximum' to "0" in a > running tasktracker? > Seems that I need to restart the tasktracker and in that case I'll loose > the output of map tasks by particular tasktracker. > > Can I change 'mapred.tasktracker.reduce.tasks.maximum' to "0" without > restarting tasktracker? > > ~Abhay > > > On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks wrote: > >> HI Abhay >> >> The TaskTrackers on which the reduce tasks are triggered is chosen in >> random based on the reduce slot availability. So if you don't need the >> reduce tasks to be scheduled on some particular nodes you need to set >> 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The >> bottleneck here is that this property is not a job level one you need to >> set it on a cluster level. >> >> A cleaner approach will be to configure each of your nodes with the right >> number of map and reduce slots based on the resources available on each >> machine. >> >> >> On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi < >> abhay.ratnaparkhi@gmail.com> wrote: >> >>> Hello, >>> >>> How can one get to know the nodes on which reduce tasks will run? >>> >>> One of my job is running and it's completing all the map tasks. >>> My map tasks write lots of intermediate data. The intermediate directory >>> is getting full on all the nodes. >>> If the reduce task take any node from cluster then It'll try to copy the >>> data to same disk and it'll eventually fail due to Disk space related >>> exceptions. >>> >>> I have added few more tasktracker nodes in the cluster and now want to >>> run reducer on new nodes only. >>> Is it possible to choose a node on which the reducer will run? What's >>> the algorithm hadoop uses to get a new node to run reducer? >>> >>> Thanks in advance. >>> >>> Bye >>> Abhay >>> >> >> > --bcaec50fe2df5a80ed04c8ce0d77 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Abhay

You need this value to be changed before you su= bmit your job and restart TT. Modifying this value in =A0mid time won't= affect the running jobs.=A0

On Mo= n, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi <abhay.ratnaparkhi= @gmail.com> wrote:
How can I set=A0 'mapred.tasktracker.reduce.tasks.maximum'=A0 to "0" in a = running tasktracker?
Seems that I need to restart the tasktracker and i= n that case I'll loose the output of map tasks by particular tasktracke= r.

Can I change=A0 =A0'mapred.tasktracker.reduce.tasks.maximum'=A0 to "0"=A0= without restarting tasktracker?

~Abhay


On Mon, Sep 3, 2012 at= 8:53 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
HI Abhay

The TaskTrackers= on which the reduce tasks are triggered is chosen in random based on the r= educe slot availability. So if you don't need the reduce tasks to be sc= heduled on some=A0particular=A0nodes you need to set 'mapred.tasktracke= r.reduce.tasks.maximum' on those nodes to 0. The bottleneck here is tha= t this=A0property=A0is not a job level one you need to set it on a cluster = level.

A cleaner approach will be to configure each of your no= des with the right number of map and reduce slots based on the resources=A0= available=A0on each machine.


On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi <abhay.ratnapa= rkhi@gmail.com> wrote:
Hello,

How can one get to= know the nodes on which reduce tasks will run?

On= e of my job is running and it's completing all the map tasks.
My map tasks write lots of intermediate data. The intermediate directo= ry is getting full on all the nodes.=A0
If the reduce task take any node from cluster then It'll try to co= py the data to same disk and it'll eventually fail due to Disk space re= lated exceptions.

I have added few more tasktracke= r nodes in the cluster and now want to run reducer on new nodes only.
Is it possible to choose a node on which the reducer will run? What= 9;s the algorithm hadoop uses to get a new node to run reducer?
<= br>
Thanks in advance.

Bye
Abhay



--bcaec50fe2df5a80ed04c8ce0d77--