hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
Date Tue, 21 Jul 2015 21:54:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635876#comment-14635876
] 

Junping Du commented on YARN-2019:
----------------------------------

If so, I think we should at least differentiate RM and NM policies - user could be conservative
to RM state store failure but be aggressive to NM state store failure. May be using "yarn.resourcemanager.fail-fast"
here? Then we can use "yarn.nodemanager.fail-fast" later and may for other daemons (timeline
service, etc.).

> Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2019
>                 URL: https://issues.apache.org/jira/browse/YARN-2019
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Jian He
>            Priority: Critical
>              Labels: ha
>         Attachments: YARN-2019.1-wip.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal exception
to crash RM down. As shown in YARN-1924, it could due to RM HA internal bug itself, but not
fatal exception. We should retrospect some decision here as HA feature is designed to protect
key component but not disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message