hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rayman (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (YARN-9192) Deletion Taks will be picked up to delete running containers
Date Mon, 01 Apr 2019 22:16:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rayman updated YARN-9192:
    Comment: was deleted

(was: I'm observing a similar issue, when running Samza over YARN. 
 When bouncing an NM, the NM being killed writes LevelDB state for the deletion-service to
act on. 
 The "new" NM reads it and acts upon it, but ends up deleting directories for running containers. 
 This happens when containers are long-running, and are placed on a fixed host.

I also observed this in the log 
*[INFO] [shutdown-hook-0] containermanager.ContainerManagerImpl.cleanUpApplicationsOnNMShutDown(ContainerManagerImpl.java:718)
- Waiting for Applications to be Finished*)

> Deletion Taks will be picked up to delete running containers
> ------------------------------------------------------------
>                 Key: YARN-9192
>                 URL: https://issues.apache.org/jira/browse/YARN-9192
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications
>    Affects Versions: 2.9.1
>            Reporter: Sihai Ke
>            Priority: Major
> I suspect there is a bug in Yarn deletion task service, below is my repo steps:
>  # First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means when the
app finished, the Binary/container folder will be deleted after 3600 seconds.
>  # when the application App1 (long running service) is running on machine machine1, and
machine1 shutdown, ContainerManagerImpl#serviceStop() will be called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown,
and ApplicationFinishEvent will be sent, and then some delection tasks will be created, but
be stored in DB and will be picked up to execute 3600 seconds.
>  # 100 seconds later, machine1 comes back, and the same app is assigned to run this this
machine, container created and works well.
>  # then deleting task created in step 2 will be picked up to delete containers created
in step 3 later.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message