hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
Date Wed, 08 Mar 2017 16:59:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901553#comment-15901553

Jason Lowe commented on YARN-4051:

Thanks for updating the patch!  In the future, please don't delete patches and re-upload them
with the same name.  It can lead to very confusing cases where Jenkins comments on a patch
that happens to have the same name as one of the current attachments but isn't actually the
patch that was tested.

The following code won't actually cause it to ignore the FINISH_APPS event.  The {{continue}}
in the for loop is degenerate, so all this does is log warnings but otherwise is semantically
the same logic:
        for (Container container : app.getContainers().values()) {
          if (container.isRecovering()) {
            LOG.warn("drop FINISH_APPS event to " + appID + "because container "
                + container.getContainerId() + "is recovering");

Also this shouldn't be a warning since it's not actually wrong when this happens, correct?
 Similarly the warn log when ignoring the FINISH_CONTAINERS event seems like that should just
be an info log at best.

I'm also wondering about the scenario where the kill event is coming in from an AM and not
the RM.  If a container is still in the recovering state when we open up the client service
for new requests it seems a client (e.g.: AM) could come in and ask for a still-recovering
container to be killed.  I think the container process will be orphaned if that occurs, since
the NM will mistakenly believe the container has not been launched yet.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, YARN-4051.03.patch, YARN-4051.04.patch,
YARN-4051.05.patch, YARN-4051.06.patch
> As in YARN-4050, NM event dispatcher is blocked, and container is in New state, when
we finish application, the container still alive even after NM event dispatcher is unblocked.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message