hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xianyin Xin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and namenode is deployed on the same node.
Date Wed, 13 May 2015 10:04:59 GMT
Xianyin Xin created YARN-3639:
---------------------------------

             Summary: It takes too long time for RM to recover all apps if the original active
RM and namenode is deployed on the same node.
                 Key: YARN-3639
                 URL: https://issues.apache.org/jira/browse/YARN-3639
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Xianyin Xin
            Assignee: Xianyin Xin


If the node on which the active RM runs dies and if the active namenode is running on the
same node, the new RM will take long time to recover all apps. After analysis, we found the
root cause is renewing HDFS tokens in the recovering process. The HDFS client created by the
renewer would firstly try to connect to the original namenode, the result of which is time-out
after 10~20s, and then the client tries to connect to the new namenode. The entire recovery
cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message