hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
Date Thu, 04 Oct 2018 20:43:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638840#comment-16638840

Eric Badger commented on YARN-7644:

Personally, I think that {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher}}
is the more appropriate location for the new class because it is related to the container
launch cycle of events. Based on the name of the class ({{CleanupContainer}}, it probably
should be in the deletion package. But based on the actual implementation of what it actually
does, I think it belongs in launcher. I think there are pros and cons to each, and I agree
that it gets a little messy since we have to involve a deletion task to actually remove the
docker containers, but I think that is the deviation and that we should maintain course in
this case. 

Overall, I think the patch looks good. +1 (non-binding) from me. [~jlowe], do you have any

> NM gets backed up deleting docker containers
> --------------------------------------------
>                 Key: YARN-7644
>                 URL: https://issues.apache.org/jira/browse/YARN-7644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Eric Badger
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-7644.001.patch, YARN-7644.002.patch
> We are sending a {{docker stop}} to the docker container with a timeout of 10 seconds
when we shut down a container. If the container does not stop after 10 seconds then we force
kill it. However, the {{docker stop}} command is a blocking call. So in cases where lots of
containers don't go down with the initial SIGTERM, we have to wait 10+ seconds for the {{docker
stop}} to return. This ties up the ContainerLaunch handler and so these kill events back up.
It also appears to be backing up new container launches as well. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message