Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F016911B4D for ; Fri, 27 Jun 2014 19:14:26 +0000 (UTC) Received: (qmail 77916 invoked by uid 500); 27 Jun 2014 19:14:26 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 77876 invoked by uid 500); 27 Jun 2014 19:14:26 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 77861 invoked by uid 99); 27 Jun 2014 19:14:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 19:14:26 +0000 Date: Fri, 27 Jun 2014 19:14:26 +0000 (UTC) From: "Xuan Gong (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046300#comment-14046300 ] Xuan Gong commented on YARN-614: -------------------------------- Not sure why this org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions fails, it passed on my local machine. org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter is not related For org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart, it fails because of time-out. I added more logic on the test case, I need to increase the time-out. Submitted new patch to kick the Jenkins again.. > Separate AM failures from hardware failure or YARN error and do not count them to AM retry count > ------------------------------------------------------------------------------------------------ > > Key: YARN-614 > URL: https://issues.apache.org/jira/browse/YARN-614 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Bikas Saha > Assignee: Xuan Gong > Fix For: 2.5.0 > > Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch > > > Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message was sent by Atlassian JIRA (v6.2#6252)