hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )
Date Wed, 10 Oct 2012 19:13:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473483#comment-13473483
] 

Sandy Ryza commented on YARN-72:
--------------------------------

Code already exists to kill existing containers when the resource manager requests it.  Three
events are dispatched to make this happen: a COMPLETED_CONTAINERS event is handled by the
ContainerManager, which dispatches a KILL_CONTAINER event for each container to be killed,
which the ContainerImpls handle by dispatching CLEANUP_CONTAINER events, which are finally
handled by the ContainersLauncher, which tries to kills the containers.

Does it make more sense to use this chain events or to try to call the kill code directly?
 For the former, the issue would be how do we know when the cleanup has been completed?  It
looks like ContainerImpls have their state changed when their containers are killed, so the
shutdown code could monitor them until they all reach the correct state, but a fair bit of
plumbing would be required for the shutdown code to be able to get to them.  For the latter,
similar plumbing would be required for the shutdown code to reach the ContainerImpls, and
the other issue would be circumventing the event system, which might have consequences that
I'm not able to foresee?

This is my first foray into nodemanager code, so maybe someone who understands it better can
provide some perspective?
                
> NM should handle cleaning up containers when it shuts down ( and kill containers from
an earlier instance when it comes back up after an unclean shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>
> Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal
for existing containers to complete and kill the containers ( if we pick an aggressive approach
) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through its directories
for existing container.pids and try and kill an existing containers matching the pids found.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message