hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4085) Kill task attempts longer than a configured queue max time
Date Fri, 30 Mar 2012 14:46:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242391#comment-13242391
] 

Robert Joseph Evans commented on MAPREDUCE-4085:
------------------------------------------------

I can see the need for something like this, to ensure that new jobs can run and meet their
SLAs, but I think it would be better to have it be part of a preemption like mechanism, where
we let the tasks run until there is some other Task/Container(for MRv2) that is requested.
 Once there is a need for those resources iff the current task/container has gone over the
configured limit the JT/RM, on the next heartbeat, can inform the TT/NM to kill the task/container.
 The fair scheduler already supports preemption and perhaps this could be added there.

MAPREDUCE-3938 was filed to add preemption to the Capacity Scheduler for 2.0 and it might
be good to add this in as part of the design there.

I don't really like the idea of having a hard limit on the runtime.  What is more if there
is a hard limit on how long a task can run for I see very little benefit in having it rescheduled
more then once.  If it was a slow node, then OK we can pick another node and it might finish
in time, but unless the cluster is very heterogeneous the task is just going to run to the
maximum time limit 4 times and then the Job will be failed.  
                
> Kill task attempts longer than a configured queue max time
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-4085
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4085
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task
>            Reporter: Allen Wittenauer
>
> For some environments, it is desirable to have certain queues have an SLA with regards
to task turnover.  (i.e., a slot will be free in X minutes and scheduled to the appropriate
job)  Queues should have a 'task time limit' that would cause task attempts over this time
to be killed. This leaves open the possibility that if the task was on a bad node, it could
still be rescheduled up to max.task.attempt times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message