Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE50D175B3 for ; Tue, 3 Feb 2015 08:10:32 +0000 (UTC) Received: (qmail 58969 invoked by uid 500); 3 Feb 2015 08:09:35 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 58904 invoked by uid 500); 3 Feb 2015 08:09:35 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 58892 invoked by uid 99); 3 Feb 2015 08:09:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 08:09:35 +0000 Date: Tue, 3 Feb 2015 08:09:35 +0000 (UTC) From: "Rohith (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302924#comment-14302924 ] Rohith commented on YARN-3094: ------------------------------ patch looks overall good, nit : there is unused import in test class : org.apache.hadoop.yarn.util.Clock. This can be removed > reset timer for liveness monitors after RM recovery > --------------------------------------------------- > > Key: YARN-3094 > URL: https://issues.apache.org/jira/browse/YARN-3094 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Jun Gong > Assignee: Jun Gong > Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.patch > > > When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). > In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)