hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1842) InvalidApplicationMasterRequestException raised during AM-requested shutdown
Date Mon, 17 Mar 2014 13:47:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937809#comment-13937809

Jason Lowe commented on YARN-1842:

Wondering if this is a case where the NM or AM somehow failed to heartbeat and expired from
the RM's point of view.  At that point the RM will ask the NM to kill all containers when
it resyncs and will have cleaned up the bookkeeping on the AM (hence an unknown app attempt).
 The RM log should shed some light on what happened there.

Normally when an AM is told to "go away" by the RM there will be a subsequent AM attempt following
it up (assuming there are app attempt retries left).  In those cases the AM attempt should
leave without causing any damage to subsequent attempts (e.g.: don't cleanup staging areas
and prevent subsequent attempts from launching).  However if the attempt is the last one then
it should go ahead and perform any normal shutdown cleanup as there will not be any subsequent
attempts to clean up the mess.

> InvalidApplicationMasterRequestException raised during AM-requested shutdown
> ----------------------------------------------------------------------------
>                 Key: YARN-1842
>                 URL: https://issues.apache.org/jira/browse/YARN-1842
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Steve Loughran
> Report of the RM raising a stack trace [https://gist.github.com/matyix/9596735] during
AM-initiated shutdown. The AM could just swallow this and exit, but it could be a sign of
a race condition YARN-side, or maybe just in the RM client code/AM dual signalling the shutdown.

> I haven't replicated this myself; maybe the stack will help track down the problem. Otherwise:
what is the policy YARN apps should adopt for AM's handling errors on shutdown? go straight
to an exit(-1)?

This message was sent by Atlassian JIRA

View raw message