hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time
Date Tue, 17 Jun 2014 21:09:07 GMT
Anubhav Dhoot created YARN-2175:
-----------------------------------

             Summary: Container localization has no timeouts and tasks can be stuck there
for a long time
                 Key: YARN-2175
                 URL: https://issues.apache.org/jira/browse/YARN-2175
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: Anubhav Dhoot


There are no timeouts that can be used to limit the time taken by various container startup
operations. Localization for example could take a long time and there is no way to kill an
task if its stuck in these states. These may have nothing to do with the task itself and could
be an issue within the platform. 

Ideally there should be configurable limits for various states within the NodeManager to limit
various states. The RM does not care about most of these and its only between AM and the NM.
We can start by making these global configurable defaults and in future we can make it fancier
by letting AM override them in the start container request.

This jira will be used to limit localization time and we open others if we feel we need to
limit other operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message