hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Wed, 25 Sep 2013 18:18:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777852#comment-13777852

Andrew Wang commented on HADOOP-9984:

Cross-posting some of my review feedback from HADOOP-9981 that Colin plans to address in this
JIRA instead:

* I think we have an existing bug in the paths of the returned FileStatus. When going through
a glob, it sets the path to the built-up path which can include symlinks, while for a non-glob
it's using getFileStatus which has a resolved path. I'm pretty sure FileStatus are supposed
to have a resolved path. This is complicated by how PathFilter still needs to compare against
the complete built-up path; maybe we could do something like:
if (filter.accept(new Path(prefix, status.getPath().getName()))) {
* Our symlink resolution right now is inconsistent: listStatus does not resolve results, getFileStatus
does. Shouldn't this be getFileLinkStatus? Or are we waiting to fix this again in HDFS-9877
when it gets recommitted? I know HADOOP-9972 with the new APIs is coming down the pipe, so
I just wanted to bring this up.
* I'd like to see tests that would have caught these correctness concerns: that resolved paths
are returned correctly (with and without a wildcard), that PathFilters are matching against
built-up paths as expected (with and without wildcards), and the looping /a/b -> .. symlink
case you mentioned in a comment. Whether it's a terminal or intermediate wildcard also matters
here. There are unfortunately a lot of edge cases.
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message