hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenxin He (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
Date Thu, 09 Nov 2017 06:56:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenxin He updated HDFS-10323:
-----------------------------
    Attachment: HDFS-10323.003.patch

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10323
>                 URL: https://issues.apache.org/jira/browse/HDFS-10323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: federation
>    Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1
>            Reporter: Ben Podgursky
>            Assignee: Wenxin He
>         Attachments: HDFS-10323.001.patch, HDFS-10323.002.patch, HDFS-10323.003.patch
>
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began failing frequently,
displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the error is,
but I believe what is happening is that the ViewFileSystem’s child FileSystems are being
close()’d before the ViewFileSystem, due to the random order ClientFinalizer closes FileSystems;
so then when the ViewFileSystem tries to close(), it tries to forward the delete() calls to
the appropriate child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it involves
testing behavior on actual JVM shutdown.  However, I can verify that while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {
    
>     fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first glance I
see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child FileSystem,
and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems.
 
> Would appreciate any thoughts of whether this seems accurate, and thoughts (or help)
on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message