Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 21 Jul 2017 16:25:00 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13088576.1500502432000.303129.1500654300243@Atlassian.JIRA>
In-Reply-To: <JIRA.13088576.1500502432000@Atlassian.JIRA>
References: <JIRA.13088576.1500502432000@Atlassian.JIRA> <JIRA.13088576.1500502432122@jira-lw-us.apache.org>
Subject: [jira] [Updated] (YARN-6846) Nodemanager can fail to fully delete
 application local directories when applications are killed
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 21 Jul 2017 16:25:05 -0000


     [ https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated YARN-6846:
-----------------------------
    Attachment: YARN-6846.001.patch

Attaching a patch that makes the container-executor more tolerant of paths being already deleted when trying to delete a hierarchy.  It also changes the deletion code to be best-effort by attempting to delete other entries even if unlinking one of the entries encountered an error.


> Nodemanager can fail to fully delete application local directories when applications are killed
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-6846
>                 URL: https://issues.apache.org/jira/browse/YARN-6846
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-6846.001.patch
>
>
> When an application is killed all of the running containers are killed and the app waits for the containers to complete before cleaning up.  As each container completes the container directory is deleted via the DeletionService.  After all containers have completed the app completes and the app directory is deleted.  If the app completes quickly enough then the deletion of the container and app directories can race against each other.  If the container deletion executor deletes a file just before the application deletion executor then it can cause the application deletion executor to fail, leaving the remaining entries in the application directory lingering.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org