Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 182E292C5 for ; Tue, 21 May 2013 00:53:17 +0000 (UTC) Received: (qmail 32101 invoked by uid 500); 21 May 2013 00:53:17 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 32057 invoked by uid 500); 21 May 2013 00:53:17 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 32046 invoked by uid 99); 21 May 2013 00:53:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 May 2013 00:53:17 +0000 Date: Tue, 21 May 2013 00:53:17 +0000 (UTC) From: "Xuan Gong (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-513) Create common proxy client for communicating with RM MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662524#comment-13662524 ] Xuan Gong commented on YARN-513: -------------------------------- Could you test the latest patch on a single running cluster ? After YARN-628, we will throw IOException, and In the latest patch, we are using retryByException, we register the ConnectException.class into the exceptionToPolicyMap. How the retryByException works is when we catch the exception we registered in exceptionToPolicyMap (In the source code, RetryPolicies.java, it will compare the exception by using exception.getClass()), we will use related retry policy. Could you double check whether we can catch the ConnectionException (we will definitely catch IOException)? Could you do the following test : 1. start namenode, datanode, nodemanager 2. check if NM will do the retry 3. start ResourceManager 4. Run a simple MR example 5. shut down the RM 6. Check if NM will do the retry 7. start RM 8. Run a simple MR example > Create common proxy client for communicating with RM > ---------------------------------------------------- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Jian He > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira