in which tasks submitted to that queue will only = be relayed to TT in the acl list for the queue.

On Mon, Jan 31, 2011 at 10:51 PM, rishi pathak <mailmaverick666@gmail.com> wrote:

Hi Koji,
=A0=A0 =A0 =A0 =A0 =A0 Thanks for opening feature request. Rig= ht now for the purpose stated earlier
I have upgraded to hadoop t= o 0.21. , and trying to see if creating individual leaf level queues for=A0= every tasktracker and changing the state of it to 'stopped' before = the expiry of the walltime. Seems like it will work for now.=A0

P.S. - What credentials are required for commentiong on= an issue in Jira

On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi <kn= oguchi@yahoo-inc.com> wrote:

Rishi,

> Using exclude list for TT will not help as Koji has already mentioned<= br> >
It=92ll help a bit in a sense that no more tasks are assigned to that TaskT= racker once excluded.

As for TT decommissioning and map outputs handling, opened a Jira for furth= er discussion.
https://issues.apache.org/jira/browse/MAPREDUCE-2291

Koji

On 1/29/11 5:37 AM, "rishi pathak" <mailmaverick666@gmail.com> wrot= e:

HI,
=A0=A0 =A0Here is a description of what we are trying to achieve(whether it= is possible or not is still not cear):
We have large computing clusters used majorly =A0for MPI jobs. We use PBS/T= orque and Maui for resource allocation and scheduling.
At most times utilization is very high except for very small resource pocke= ts of say 16 cores for 2-5 Hrs. We are trying establish feasibility of usin= g these small(but fixed sized) resource pockets for nutch crawls. Our confi= guration is:

# Hadoop 0.20.2 (packaged with nutch)
#Lustre parallel filesystem for data storage
# No HDFS

We have JT running on one of the login nodes at all times.
Request for resource (nodes=3D16, walltime=3D05 Hrs.) is made using batch s= ystem and as a part of job TTs are provisioned. The problem is, when a job = expires, user processes are cleaned up and thus TT gets killed. With that, = completed and running map/reduce tasks for nutch job are killed and are res= cheduled. Solution could be as we see it:

1. As the filesystem is shared(& persistent), =A0restart tasks on anoth= er TT and make intermediate task data available. i.e. sort of checkpointing= .=A0
2. TT draining - based on a speculative time for task completion, TT whose = walltime is nearing expiry will go into draining mode.i.e. no new tasks wil= l be scheduled on that TT.
=A0=A0=A0
For '1', it is very far fetched(we are no Hadoop expert)
'2' seems to be a more sensible approach.

Using exclude list for TT will not help as Koji has already mentioned
We looked into capacity scheduler but did'nt find any pointers. Phil, w= hat version of hadoop=A0
have these hooks in scheduler.

On Sat, Jan 29, 2011 at 3:34 AM, phil young <phil.wills.young@gmail.com> wro= te:

There are = some hooks available in the schedulers that could be useful also.
I think they were expected to be used to allow you to schedule tasks based<= br> on load average on the host, but I'd expect you can customize them for = your
purpose.

On Fri, Jan 28, 2011 at 6:46 AM, Harsh J <qwertymaniac@gmail.com> wrote:

> Moving discussion to the MapReduce-User list:
> = mapreduce-user@hadoop.apache.org
>
> Reply inline:
>
> On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak <mailmaverick666@gmail.com>= ;
> wrote:
> > Hi,
> > =A0 =A0 =A0 =A0Is there a way to drain a tasktracker. What we req= uire is not to
> > schedule any more map/red tasks onto a tasktracker(mark it offlin= e) but
> > still the running tasks should not be affected.
>
> You could simply shut the TT down. MapReduce was designed with faults<= br> > in mind and thus tasks that are running on a particular TaskTracker > can be re-run elsewhere if they failed. Is this not usable in your
> case?
>
> --
> Harsh J
> www.harshj.com= <http://www.harshj.= com>
>

= --
---
Rishi Pathak
National PARAM Supercomputing Facility
C-D= AC, Pune, India

--
---
Rishi Pathak
= National PARAM Supercomputing Facility
C-DAC, Pune, India