hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinglun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
Date Mon, 12 Aug 2019 04:50:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904843#comment-16904843
] 

Jinglun commented on HDFS-10323:
--------------------------------

Hi [~vincent he], [~bpodgursky], thanks for working on this. ViewFileSystem has another problem,
because the children filesystems are shared and ViewFileSystem.close() does nothing but calling
super.close(), it will break the semantic of FileSystem.newInstance(). See HADOOP-15565.

I'm considering to add an inner-cache to ViewFileSystem. It can also solve this deleteOnExit
issue. The patch is ready in HADOOP-15565 now, do you have time to take a look? Let me know
your thoughts, thanks.

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10323
>                 URL: https://issues.apache.org/jira/browse/HDFS-10323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: federation
>    Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1
>            Reporter: Ben Podgursky
>            Assignee: Wenxin He
>            Priority: Major
>         Attachments: HDFS-10323.001.patch, HDFS-10323.002.patch, HDFS-10323.003.patch
>
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began failing frequently,
displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the error is,
but I believe what is happening is that the ViewFileSystem’s child FileSystems are being
close()’d before the ViewFileSystem, due to the random order ClientFinalizer closes FileSystems;
so then when the ViewFileSystem tries to close(), it tries to forward the delete() calls to
the appropriate child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it involves
testing behavior on actual JVM shutdown.  However, I can verify that while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {
    
>     fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first glance I
see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child FileSystem,
and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems.
 
> Would appreciate any thoughts of whether this seems accurate, and thoughts (or help)
on the fix.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message