hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
Date Fri, 02 Oct 2015 15:07:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941233#comment-14941233

Jason Lowe commented on YARN-4051:

Thanks for the patch!  Sorry for the delay, as I missed this when it was originally filed.

I'm lukewarm on an event buffering approach since we have to track it and remember to propagate
it at all the appropriate times which is a maintenance burden.  Would it be simpler if we
simply prevented the kill request from coming in too soon?  Seems like another way to fix
this would be to prevent kill requests from arriving before we're done recovering containers.
 We could do a similar "try again" response as we do for container start requests while still
recovering, and we can postpone finish application processing until after containers are recovered.

However we decide to fix this, there should be a unit test to cover the scenario.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, YARN-4051.03.patch
> As in YARN-4050, NM event dispatcher is blocked, and container is in New state, when
we finish application, the container still alive even after NM event dispatcher is unblocked.

This message was sent by Atlassian JIRA

View raw message