hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4213) nodemanager should cleanup running containers when shutdown
Date Tue, 01 May 2012 17:58:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265942#comment-13265942

Bikas Saha commented on MAPREDUCE-4213:

Once this jira is done, in which scenarios do you see NM terminating tasks upon start up also?

For RM restart, I think it might be ok for NM's to terminate running tasks upon shutdown as
long as NM gives the RM some time to come back up. If the RM comes back up within that much
time, then it can take over control of the tasks as if nothing has happened. If it does not,
then I think its best for the NM to terminate the resources utilization it is responsible
for, and leave the node in the state it had been upon startup. Thoughts?

> nodemanager should cleanup running containers when shutdown
> -----------------------------------------------------------
>                 Key: MAPREDUCE-4213
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4213
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
> Currently the nodemanager doesn't cleanup running containers when it gets restarted.
 This can cause containers to get lost and stick around forever.  We've seen this happen multiple
times when the RM is restarted. When the RM is brought back up, it doesn't know about what
was running on the cluster, it tells the NMs to reboot and when the NM reboots it loses what
it had running. If there are any containers that are behaving badly there is no one left that
knows about them to kill them. 
> We should try to kill any running containers when the node manager is shutting down.
 We should also check when the nodemanager is being brought back up - but that will be a separate
> This might change a bit when RM restart is implemented if tasks can actually survive
across RM/NM being rebooted, but that can be addressed at that point.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message