hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
Date Tue, 25 Feb 2014 11:13:33 GMT

    [ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911478#comment-13911478
] 

Hudson commented on YARN-1686:
------------------------------

SUCCESS: Integrated in Hadoop-Yarn-trunk #492 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/492/])
YARN-1686. Fixed NodeManager to properly handle any errors during re-registration after a
RESYNC and thus avoid hanging. Contributed by Rohith Sharma. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1571474)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java


> NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-1686
>                 URL: https://issues.apache.org/jira/browse/YARN-1686
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Rohith
>            Assignee: Rohith
>             Fix For: 2.4.0
>
>         Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch
>
>
> During start of NodeManager,if registration with resourcemanager throw exception then
nodemager shutdown happens. 
> Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception
thrown in "resyncWithRM" (starts new thread which does not handle exception) during RESYNC
evet, then this thread is lost. NodeManger enters hanged state. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message