hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )
Date Mon, 19 Nov 2012 12:28:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500195#comment-13500195
] 

Tom White commented on YARN-72:
-------------------------------

Sandy, this looks like a good start, hooking in the code for container cleanup. I would focus
on the part to cleanup on shutdown in this patch, and tackle cleanup on startup in YARN-73.

As Bikas mentioned there needs to be a timeout on waiting for the containers to shutdown.
The shutdown process waits for up to yarn.nodemanager.process-kill-wait.ms for the PID to
appear, then yarn.nodemanager.sleep-delay-before-sigkill.ms before sending a SIGKILL signal
(after a SIGTERM) if the process hasn't died - see ContainerLaunch#cleanupContainer. Waiting
for a little longer than the sum of these durations would be sufficient.

Regarding testing, you could have a test like the one in TestContainerLaunch#testDelayedKill
to test that containers are correctly cleaned up after stopping a NM.
                
> NM should handle cleaning up containers when it shuts down ( and kill containers from
an earlier instance when it comes back up after an unclean shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Sandy Ryza
>         Attachments: YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal
for existing containers to complete and kill the containers ( if we pick an aggressive approach
) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through its directories
for existing container.pids and try and kill an existing containers matching the pids found.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message