Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E8E810D50 for ; Wed, 14 Aug 2013 18:57:48 +0000 (UTC) Received: (qmail 59631 invoked by uid 500); 14 Aug 2013 18:57:48 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 59591 invoked by uid 500); 14 Aug 2013 18:57:48 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 59565 invoked by uid 99); 14 Aug 2013 18:57:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Aug 2013 18:57:47 +0000 Date: Wed, 14 Aug 2013 18:57:47 +0000 (UTC) From: "Omkar Vinit Joshi (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740048#comment-13740048 ] Omkar Vinit Joshi commented on YARN-1061: ----------------------------------------- How can NM wait infinitely? I mean what is your connection timeout set to? can you add below parameters to your log4j.properties and see if actually times out or wait infinitely for RM... Also can attach those logs once you simulate it? {code} log4j.logger.org.apache.hadoop.ipc.Server=DEBUG log4j.logger.org.apache.hadoop.ipc.Client=DEBUG {code} Also helpful configurations from *CommonConfigurationKeysPublic* {code} public static final String IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY = "ipc.client.connection.maxidletime"; /** Default value for IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY */ public static final int IPC_CLIENT_CONNECTION_MAXIDLETIME_DEFAULT = 10000; // 10s /** See core-default.xml */ public static final String IPC_CLIENT_CONNECT_TIMEOUT_KEY = "ipc.client.connect.timeout"; /** Default value for IPC_CLIENT_CONNECT_TIMEOUT_KEY */ public static final int IPC_CLIENT_CONNECT_TIMEOUT_DEFAULT = 20000; // 20s {code} > NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager. > ------------------------------------------------------------------------------------- > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.5-alpha > Reporter: Rohith Sharma K S > > It is observed that in one of the scenario, NodeManger is indefinetly waiting for nodeHeartbeat response from ResouceManger where ResouceManger is in hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira