From yarn-issues-return-23310-apmail-hadoop-yarn-issues-archive=hadoop.apache.org@hadoop.apache.org Mon Feb 24 10:56:26 2014 Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F1E210B5F for ; Mon, 24 Feb 2014 10:56:26 +0000 (UTC) Received: (qmail 22950 invoked by uid 500); 24 Feb 2014 10:56:25 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 22776 invoked by uid 500); 24 Feb 2014 10:56:22 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 22751 invoked by uid 99); 24 Feb 2014 10:56:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Feb 2014 10:56:20 +0000 Date: Mon, 24 Feb 2014 10:56:20 +0000 (UTC) From: "Rohith (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1686: ------------------------- Attachment: YARN-1686.2.patch Thank you vinod for your reviewing patch. I have updated the patch addressing all your comments. Please review new patch. Jian He, tx for motivation.:-) > NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. > ------------------------------------------------------------------------------------ > > Key: YARN-1686 > URL: https://issues.apache.org/jira/browse/YARN-1686 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.3.0 > Reporter: Rohith > Assignee: Rohith > Fix For: 3.0.0 > > Attachments: YARN-1686.1.patch, YARN-1686.2.patch > > > During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. > Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in "resyncWithRM" (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)