hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed
Date Mon, 18 May 2015 21:00:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549218#comment-14549218

Steve Loughran commented on YARN-3668:

[~sandflee] : I know you are using something else, I was just describing what we do to deal
with failures. 

If it is purely AM failure you care about, then setting the restart bit at launch time is
enough for YARN to bring things back. If the AM fails too many times in the failure window
then the app will fail, for which there is one fix: don't fail as often.

I'd actually like a failure code to tell YARN to restart us without counting it as a failure;
this would help us do live updates more safely.

> Long run service shouldn't be killed even if Yarn crashed
> ---------------------------------------------------------
>                 Key: YARN-3668
>                 URL: https://issues.apache.org/jira/browse/YARN-3668
>             Project: Hadoop YARN
>          Issue Type: Wish
>            Reporter: sandflee
> For long running service, it shouldn't be killed even if all yarn component crashed,
with RM work preserving and NM restart, yarn could take over applications again.

This message was sent by Atlassian JIRA

View raw message