hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1430) InvalidStateTransition exceptions are ignored in state machines
Date Thu, 21 Nov 2013 19:11:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829229#comment-13829229
] 

Vinod Kumar Vavilapalli commented on YARN-1430:
-----------------------------------------------

There are pros and cons to both approaches.

If we completely ignore the errors, nobody knows about the problem. One solution to this is
have these invalid transitions bubble up to the UI, say on RM UI, AM UI etc in wild, bold
and red colors.

On the other side, I agree that crashing RM all the time is going to be more and more painful
in production environments.

As for tests, I think we SHOULD clearly crash the tests, so that we can catch as many of these
errors as quickly as possible.

But as of today, we are treating them inconsistently. An invalid event to the scheduler crashes
the RM but an invalid event in RMNode isn't. We need to be consistent.

> InvalidStateTransition exceptions are ignored in state machines
> ---------------------------------------------------------------
>
>                 Key: YARN-1430
>                 URL: https://issues.apache.org/jira/browse/YARN-1430
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>
> We have all state machines ignoring InvalidStateTransitions. These exceptions will get
logged but will not crash the RM / NM. We definitely should crash it as they move the system
into some invalid / unacceptable state.
> * Places where we hide this exception :-
> ** JobImpl
> ** TaskAttemptImpl
> ** TaskImpl
> ** NMClientAsyncImpl
> ** ApplicationImpl
> ** ContainerImpl
> ** LocalizedResource
> ** RMAppAttemptImpl
> ** RMAppImpl
> ** RMContainerImpl
> ** RMNodeImpl
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message