hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
Date Thu, 04 Oct 2018 20:43:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638840#comment-16638840
] 

Eric Badger commented on YARN-7644:
-----------------------------------

Personally, I think that {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher}}
is the more appropriate location for the new class because it is related to the container
launch cycle of events. Based on the name of the class ({{CleanupContainer}}, it probably
should be in the deletion package. But based on the actual implementation of what it actually
does, I think it belongs in launcher. I think there are pros and cons to each, and I agree
that it gets a little messy since we have to involve a deletion task to actually remove the
docker containers, but I think that is the deviation and that we should maintain course in
this case. 

Overall, I think the patch looks good. +1 (non-binding) from me. [~jlowe], do you have any
comments? 

> NM gets backed up deleting docker containers
> --------------------------------------------
>
>                 Key: YARN-7644
>                 URL: https://issues.apache.org/jira/browse/YARN-7644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Eric Badger
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-7644.001.patch, YARN-7644.002.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 seconds
when we shut down a container. If the container does not stop after 10 seconds then we force
kill it. However, the {{docker stop}} command is a blocking call. So in cases where lots of
containers don't go down with the initial SIGTERM, we have to wait 10+ seconds for the {{docker
stop}} to return. This ties up the ContainerLaunch handler and so these kill events back up.
It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message