hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3730) Allow restarted NM to rejoin cluster before RM expires it
Date Fri, 24 Feb 2012 21:57:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215968#comment-13215968
] 

Hudson commented on MAPREDUCE-3730:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #589 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/589/])
    MAPREDUCE-3730. Modified RM to allow restarted NMs to be able to join the cluster without
waiting for expiry. Contributed by Jason Lowe.
svn merge --ignore-ancestry -c 1293436 ../../trunk (Revision 1293439)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293439
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeReconnectEvent.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java

                
> Allow restarted NM to rejoin cluster before RM expires it
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-3730
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3730
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Minor
>             Fix For: 0.23.2
>
>         Attachments: MAPREDUCE-3730.patch, MAPREDUCE-3730.patch, MAPREDUCE-3730.patch
>
>
> When a node in the RUNNING state (healthy or unhealthy) is rebooted, the resourcemanager
rejects the nodemanager's registration request as a duplicate because it is convinced that
the nodemanager is already running on that node.  It won't allow that node to rejoin the cluster
until the node expiration time elapses which is 10min+ by default.  We should allow the NM
to rejoin the cluster if it re-registers within the expiration timeout.
> Note that this problem occurs with NMs that are configured to specific ports.  If ephemeral
ports are used then a NM reboot "works" because the RM thinks the NM registration is for a
new node.  See the discussions in MAPREDUCE-3070 and MAPREDUCE-3363.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message