Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B311104A2 for ; Sat, 20 Apr 2013 00:35:17 +0000 (UTC) Received: (qmail 70619 invoked by uid 500); 20 Apr 2013 00:35:17 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 70568 invoked by uid 500); 20 Apr 2013 00:35:17 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 70560 invoked by uid 99); 20 Apr 2013 00:35:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Apr 2013 00:35:17 +0000 Date: Sat, 20 Apr 2013 00:35:17 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-542) Change the default global AM max-attempts value to be not one MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637049#comment-13637049 ] Vinod Kumar Vavilapalli commented on YARN-542: ---------------------------------------------- +1, this looks good. Checking it in. > Change the default global AM max-attempts value to be not one > ------------------------------------------------------------- > > Key: YARN-542 > URL: https://issues.apache.org/jira/browse/YARN-542 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Vinod Kumar Vavilapalli > Assignee: Zhijie Shen > Attachments: YARN-542.1.patch > > > Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. > I propose we change it to atleast two. Can change it to 4 to match other retry-configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira