Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 85785 invoked from network); 31 Jan 2011 17:32:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Jan 2011 17:32:49 -0000 Received: (qmail 20965 invoked by uid 500); 31 Jan 2011 17:32:49 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 20757 invoked by uid 500); 31 Jan 2011 17:32:47 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 20736 invoked by uid 99); 31 Jan 2011 17:32:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 17:32:46 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mailmaverick666@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 17:32:39 +0000 Received: by wye20 with SMTP id 20so5996637wye.35 for ; Mon, 31 Jan 2011 09:32:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=21dawo/bFjxEcugpd72HvUefQl91r7ulDCPUdiHBH3Y=; b=HEdX0xA51HOZk1+P3YROHbA1Uvv/ojHbRtEDvG9Cxy92HVEUb7gCRUWwHopGDJ9UYP LPW57uGKK0sobHDVq+qpwLfiIbIZXPaEqWzuSmsxXaCCMu65DxFib0YHzzYm4U3e4UsD O+IepCpuasLYC+VedDmh2er8eT84IaRjlnsyA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=IjPLCp1Xo3iuCjCS79EMmbQvpynD0v3F4NvNbd6Bv/CDrc/i7BIUF9VDtEx3i9FX6y VwEURYc8rLBBoYgsYsd9t+CgZgjf90Jo3yW17UNsZyHasKP7Alx5Rp1nxJzGzKS0pIx7 VEfwrnT7eMdy3kNNF0+BrfigyUw0CBJihb0Wo= MIME-Version: 1.0 Received: by 10.216.45.131 with SMTP id p3mr1268747web.63.1296495139433; Mon, 31 Jan 2011 09:32:19 -0800 (PST) Received: by 10.216.25.137 with HTTP; Mon, 31 Jan 2011 09:32:19 -0800 (PST) In-Reply-To: References: Date: Mon, 31 Jan 2011 23:02:19 +0530 Message-ID: Subject: Re: Draining/Decommisioning a tasktracker From: rishi pathak To: Koji Noguchi Cc: "common-user@hadoop.apache.org" , "mapreduce-user@hadoop.apache.org" , "phil.wills.young@gmail.com" , "qwertymaniac@gmail.com" Content-Type: multipart/alternative; boundary=0016e6dbe8033d2d09049b27cccf X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dbe8033d2d09049b27cccf Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Still need to figure out whether a queue can be associated with a TT. i.e. TT acl for a queue in which tasks submitted to that queue will only be relayed to TT in the ac= l list for the queue. On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak w= rote: > Hi Koji, > Thanks for opening feature request. Right now for the purpose > stated earlier > I have upgraded to hadoop to 0.21. , and trying to see if creating > individual leaf level queues for every tasktracker and changing the state= of > it to 'stopped' before the expiry of the walltime. Seems like it will wor= k > for now. > > P.S. - What credentials are required for commentiong on an issue in Jira > > On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi wr= ote: > >> Rishi, >> >> > Using exclude list for TT will not help as Koji has already mentioned >> > >> It=92ll help a bit in a sense that no more tasks are assigned to that >> TaskTracker once excluded. >> >> As for TT decommissioning and map outputs handling, opened a Jira for >> further discussion. >> https://issues.apache.org/jira/browse/MAPREDUCE-2291 >> >> Koji >> >> >> >> On 1/29/11 5:37 AM, "rishi pathak" wrote: >> >> HI, >> Here is a description of what we are trying to achieve(whether it is >> possible or not is still not cear): >> We have large computing clusters used majorly for MPI jobs. We use >> PBS/Torque and Maui for resource allocation and scheduling. >> At most times utilization is very high except for very small resource >> pockets of say 16 cores for 2-5 Hrs. We are trying establish feasibility= of >> using these small(but fixed sized) resource pockets for nutch crawls. Ou= r >> configuration is: >> >> # Hadoop 0.20.2 (packaged with nutch) >> #Lustre parallel filesystem for data storage >> # No HDFS >> >> We have JT running on one of the login nodes at all times. >> Request for resource (nodes=3D16, walltime=3D05 Hrs.) is made using batc= h >> system and as a part of job TTs are provisioned. The problem is, when a = job >> expires, user processes are cleaned up and thus TT gets killed. With tha= t, >> completed and running map/reduce tasks for nutch job are killed and are >> rescheduled. Solution could be as we see it: >> >> 1. As the filesystem is shared(& persistent), restart tasks on another = TT >> and make intermediate task data available. i.e. sort of checkpointing. >> 2. TT draining - based on a speculative time for task completion, TT who= se >> walltime is nearing expiry will go into draining mode.i.e. no new tasks = will >> be scheduled on that TT. >> >> For '1', it is very far fetched(we are no Hadoop expert) >> '2' seems to be a more sensible approach. >> >> Using exclude list for TT will not help as Koji has already mentioned >> We looked into capacity scheduler but did'nt find any pointers. Phil, wh= at >> version of hadoop >> have these hooks in scheduler. >> >> On Sat, Jan 29, 2011 at 3:34 AM, phil young >> wrote: >> >> There are some hooks available in the schedulers that could be useful >> also. >> I think they were expected to be used to allow you to schedule tasks bas= ed >> on load average on the host, but I'd expect you can customize them for >> your >> purpose. >> >> >> On Fri, Jan 28, 2011 at 6:46 AM, Harsh J wrote: >> >> > Moving discussion to the MapReduce-User list: >> > mapreduce-user@hadoop.apache.org >> > >> > Reply inline: >> > >> > On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak < >> mailmaverick666@gmail.com> >> > wrote: >> > > Hi, >> > > Is there a way to drain a tasktracker. What we require is not >> to >> > > schedule any more map/red tasks onto a tasktracker(mark it offline) >> but >> > > still the running tasks should not be affected. >> > >> > You could simply shut the TT down. MapReduce was designed with faults >> > in mind and thus tasks that are running on a particular TaskTracker >> > can be re-run elsewhere if they failed. Is this not usable in your >> > case? >> > >> > -- >> > Harsh J >> > www.harshj.com >> > >> >> >> >> > > > -- > --- > Rishi Pathak > National PARAM Supercomputing Facility > C-DAC, Pune, India > > > --=20 --- Rishi Pathak National PARAM Supercomputing Facility C-DAC, Pune, India --0016e6dbe8033d2d09049b27cccf Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Still need to figure out whether a queue can be associated with a TT. i.e. = TT acl for a queue=A0
in which tasks submitted to that queue will only = be relayed to TT in the acl list for the queue.

On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak <mailmaverick666@gmail.com> wrote:
Hi Koji,
=A0=A0 =A0 =A0 =A0 =A0 Thanks for opening feature request. Rig= ht now for the purpose stated earlier
I have upgraded to hadoop t= o 0.21. , and trying to see if creating individual leaf level queues for=A0= every tasktracker and changing the state of it to 'stopped' before = the expiry of the walltime. Seems like it will work for now.=A0

P.S. - What credentials are required for commentiong on= an issue in Jira

On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi <kn= oguchi@yahoo-inc.com> wrote:
Rishi,

> Using exclude list for TT will not help as Koji has already mentioned<= br> >
It=92ll help a bit in a sense that no more tasks are assigned to that TaskT= racker once excluded.

As for TT decommissioning and map outputs handling, opened a Jira for furth= er discussion.
https://issues.apache.org/jira/browse/MAPREDUCE-2291

Koji



On 1/29/11 5:37 AM, "rishi pathak" <mailmaverick666@gmail.com> wrot= e:

HI,
=A0=A0 =A0Here is a description of what we are trying to achieve(whether it= is possible or not is still not cear):
We have large computing clusters used majorly =A0for MPI jobs. We use PBS/T= orque and Maui for resource allocation and scheduling.
At most times utilization is very high except for very small resource pocke= ts of say 16 cores for 2-5 Hrs. We are trying establish feasibility of usin= g these small(but fixed sized) resource pockets for nutch crawls. Our confi= guration is:

# Hadoop 0.20.2 (packaged with nutch)
#Lustre parallel filesystem for data storage
# No HDFS

We have JT running on one of the login nodes at all times.
Request for resource (nodes=3D16, walltime=3D05 Hrs.) is made using batch s= ystem and as a part of job TTs are provisioned. The problem is, when a job = expires, user processes are cleaned up and thus TT gets killed. With that, = completed and running map/reduce tasks for nutch job are killed and are res= cheduled. Solution could be as we see it:

1. As the filesystem is shared(& persistent), =A0restart tasks on anoth= er TT and make intermediate task data available. i.e. sort of checkpointing= .=A0
2. TT draining - based on a speculative time for task completion, TT whose = walltime is nearing expiry will go into draining mode.i.e. no new tasks wil= l be scheduled on that TT.
=A0=A0=A0
For '1', it is very far fetched(we are no Hadoop expert)
'2' seems to be a more sensible approach.

Using exclude list for TT will not help as Koji has already mentioned
We looked into capacity scheduler but did'nt find any pointers. Phil, w= hat version of hadoop=A0
have these hooks in scheduler.

On Sat, Jan 29, 2011 at 3:34 AM, phil young <phil.wills.young@gmail.com> wro= te:
There are = some hooks available in the schedulers that could be useful also.
I think they were expected to be used to allow you to schedule tasks based<= br> on load average on the host, but I'd expect you can customize them for = your
purpose.


On Fri, Jan 28, 2011 at 6:46 AM, Harsh J <qwertymaniac@gmail.com> wrote:

> Moving discussion to the MapReduce-User list:
> = mapreduce-user@hadoop.apache.org
>
> Reply inline:
>
> On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak <mailmaverick666@gmail.com>= ;
> wrote:
> > Hi,
> > =A0 =A0 =A0 =A0Is there a way to drain a tasktracker. What we req= uire is not to
> > schedule any more map/red tasks onto a tasktracker(mark it offlin= e) but
> > still the running tasks should not be affected.
>
> You could simply shut the TT down. MapReduce was designed with faults<= br> > in mind and thus tasks that are running on a particular TaskTracker > can be re-run elsewhere if they failed. Is this not usable in your
> case?
>
> --
> Harsh J
> www.harshj.com= <http://www.harshj.= com>
>





= --
---
Rishi Pathak
National PARAM Supercomputing Facility
C-D= AC, Pune, India





--
---
Rishi Pathak
= National PARAM Supercomputing Facility
C-DAC, Pune, India


--0016e6dbe8033d2d09049b27cccf--