hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6846) Nodemanager can fail to fully delete application local directories when applications are killed
Date Tue, 01 Aug 2017 22:38:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109918#comment-16109918

Eric Badger commented on YARN-6846:

bq. If I'm reading the man pages correctly for geteuid(), seteuid(), and readdir(), they don't
generate ENOENT
For {{geteuid()}} and {{seteuid()}}, these aren't the methods that are setting {{errno}} in
the code change in the first block referenced (1837). 
-    if (rmdir(path) != 0) {
+    if (rmdir(path) != 0 && errno != ENOENT) {
{{rmdir(path)}} is what sets {{errno}} here and can return {{ENOENT}}. 

As far as {{readdir()}} goes, it looks like posix has it returning {{ENOENT}}, while Linux
doesn't. I think it's better to go with Posix here, but I'll refer to [~jlowe] on that.

> Nodemanager can fail to fully delete application local directories when applications
are killed
> -----------------------------------------------------------------------------------------------
>                 Key: YARN-6846
>                 URL: https://issues.apache.org/jira/browse/YARN-6846
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-6846.001.patch, YARN-6846.002.patch, YARN-6846.003.patch
> When an application is killed all of the running containers are killed and the app waits
for the containers to complete before cleaning up.  As each container completes the container
directory is deleted via the DeletionService.  After all containers have completed the app
completes and the app directory is deleted.  If the app completes quickly enough then the
deletion of the container and app directories can race against each other.  If the container
deletion executor deletes a file just before the application deletion executor then it can
cause the application deletion executor to fail, leaving the remaining entries in the application
directory lingering.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message