Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23A9411623 for ; Tue, 17 Jun 2014 21:09:08 +0000 (UTC) Received: (qmail 59115 invoked by uid 500); 17 Jun 2014 21:09:07 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 59065 invoked by uid 500); 17 Jun 2014 21:09:07 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 59010 invoked by uid 99); 17 Jun 2014 21:09:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jun 2014 21:09:06 +0000 Date: Tue, 17 Jun 2014 21:09:06 +0000 (UTC) From: "Anubhav Dhoot (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-2175) Container localization has no timeouts and tasks can be stuck there for a long time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Anubhav Dhoot created YARN-2175: ----------------------------------- Summary: Container localization has no timeouts and tasks can be stuck there for a long time Key: YARN-2175 URL: https://issues.apache.org/jira/browse/YARN-2175 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot There are no timeouts that can be used to limit the time taken by various container startup operations. Localization for example could take a long time and there is no way to kill an task if its stuck in these states. These may have nothing to do with the task itself and could be an issue within the platform. Ideally there should be configurable limits for various states within the NodeManager to limit various states. The RM does not care about most of these and its only between AM and the NM. We can start by making these global configurable defaults and in future we can make it fancier by letting AM override them in the start container request. This jira will be used to limit localization time and we open others if we feel we need to limit other operations. -- This message was sent by Atlassian JIRA (v6.2#6252)