hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
Date Mon, 09 Feb 2015 22:39:37 GMT

    [ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313040#comment-14313040
] 

Hudson commented on YARN-3094:
------------------------------

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe:
rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* hadoop-yarn-project/CHANGES.txt


> reset timer for liveness monitors after RM recovery
> ---------------------------------------------------
>
>                 Key: YARN-3094
>                 URL: https://issues.apache.org/jira/browse/YARN-3094
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>             Fix For: 2.7.0
>
>         Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.4.patch, YARN-3094.5.patch,
YARN-3094.patch
>
>
> When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor
if they are not in final state. AM will time out in RM if the recover process takes long time
due to some reasons(e.g. too many apps). 
> In our system, we found the recover process took about 3 mins, and all AM time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message