hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
Date Thu, 19 Jun 2014 18:50:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037664#comment-14037664
] 

Junping Du commented on YARN-2019:
----------------------------------

[~kasha], sorry that I ignored your comments as my email/company changed during that time.
My thought on right behave is:
If any issue in ZK cluster side, although it is distributed and should be more robust but
could down due to bug or bad configuration, we can let ActiveRM continue to run as no-HA case.
In addition, we should report Admin that the HA is not playing well, and let admin to decide
when it is the proper timeline to bring down RM and reconfigure the HA things. Make sense?

> Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2019
>                 URL: https://issues.apache.org/jira/browse/YARN-2019
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Priority: Critical
>              Labels: ha
>         Attachments: YARN-2019.1-wip.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal exception
to crash RM down. As shown in YARN-1924, it could due to RM HA internal bug itself, but not
fatal exception. We should retrospect some decision here as HA feature is designed to protect
key component but not disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message