hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Wed, 02 Oct 2013 22:51:42 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784585#comment-13784585

Colin Patrick McCabe commented on HADOOP-9984:

bq. My understanding is that a breaking change will be done in 2.3.0 for HADOOP-9972, regardless
of what happens in this patch. Is that not the case? Do we expect to implement those new APIs
fully in the base class without requiring anything new of subclasses?

HADOOP-9972 is not going to be an incompatible change, or require anything new from subclasses.

bq. What do you think of this as a compromise? It helps control some of the bad consequences
discussed earlier.

I think I'm missing something in this whole discussion.  You seem to want to break the API
after Hadoop 2 goes GA, but breaking the API is exactly what is not supposed to happen after
GA, according to Arun.

I also don't understand the comments about "giving them more time."  Proprietary or out-of-tree
filesystems are *not* part of the Hadoop release, by definition.  What are the "downstream
projects" you're referring to?  I suppose Ceph, QFS, and GlusterFS are three examples of out-of-tree
FileSystems.   Are we delaying our release or reducing its quality for them?  If so, why?

Making {{listLinkStatus}} an abstract function is actually better for these out-of-tree implementors
anyway.  It will bring to their attention the fact that the semantics of {{listStatus}} have
changed, rather than sweeping it under the rug.  Allowing the code to silently compile and
do the wrong thing doesn't seem like it's doing anyone any favors.  I can say firsthand that
no matter what option we choose, the ceph hadoop plugin will need to be updated (I worked
on that at one point).

Finally, you can't implement symlink resolution in the subclasses of AbstractFileSystem. 
For FileContext, symlink resolution has to happen in FC.  So that means either AbstractFileSystem#listStatus
is going to be the equivalent of FileSystem#listLinkStatus, or you have to completely redesign
FC.  Neither of those options seems like a good idea.  I think this, more than anything else,
convinced me to take the path I did.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch,
HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch,
HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message was sent by Atlassian JIRA

View raw message