Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F64F9970 for ; Sat, 25 Feb 2012 13:58:35 +0000 (UTC) Received: (qmail 3042 invoked by uid 500); 25 Feb 2012 13:58:35 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 3002 invoked by uid 500); 25 Feb 2012 13:58:35 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 2993 invoked by uid 99); 25 Feb 2012 13:58:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 13:58:35 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 13:58:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3486733AE7B for ; Sat, 25 Feb 2012 13:57:51 +0000 (UTC) Date: Sat, 25 Feb 2012 13:57:51 +0000 (UTC) From: "Hudson (Commented) (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <860609046.19626.1330178271216.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <238708097.78868.1327536758342.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-3730) Allow restarted NM to rejoin cluster before RM expires it MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216458#comment-13216458 ] Hudson commented on MAPREDUCE-3730: ----------------------------------- Integrated in Hadoop-Mapreduce-trunk #1001 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1001/]) MAPREDUCE-3730. Modified RM to allow restarted NMs to be able to join the cluster without waiting for expiry. Contributed by Jason Lowe. (Revision 1293436) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293436 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeReconnectEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java > Allow restarted NM to rejoin cluster before RM expires it > --------------------------------------------------------- > > Key: MAPREDUCE-3730 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3730 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2, resourcemanager > Affects Versions: 0.23.1, 0.24.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Minor > Fix For: 0.23.2 > > Attachments: MAPREDUCE-3730.patch, MAPREDUCE-3730.patch, MAPREDUCE-3730.patch > > > When a node in the RUNNING state (healthy or unhealthy) is rebooted, the resourcemanager rejects the nodemanager's registration request as a duplicate because it is convinced that the nodemanager is already running on that node. It won't allow that node to rejoin the cluster until the node expiration time elapses which is 10min+ by default. We should allow the NM to rejoin the cluster if it re-registers within the expiration timeout. > Note that this problem occurs with NMs that are configured to specific ports. If ephemeral ports are used then a NM reboot "works" because the RM thinks the NM registration is for a new node. See the discussions in MAPREDUCE-3070 and MAPREDUCE-3363. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira