hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Fri, 15 May 2015 21:59:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546242#comment-14546242

Jason Lowe commented on HADOOP-9984:

bq. In the case of globStatus, things are even worse if you choose to resolve symlinks, since
then you can glob for '*foo' and get back 'bar'. A lot of software breaks if globs return
back file names that the glob doesn't match.

As I understand it, globStatus is simply listStatus with filtering applied to the results.
 If that's the case then globStatus should do whatever listStatus does with respect to symlinks,
and that would be to resolve the symlink _except_ for the path in the resulting FileStatus.
 This goes back to the readdir() + stat() analogy -- everything in the resulting FileStatus
needs to be about where the symlink points _except_ the path.  The path would still be the
path to the link, since that's what readdir() would see as well.  Every other field in FileStatus
has to do with what stat() would return, so those fields should be reflective of what the
symlink references.  So globStatus should not lead to surprises where "foo*" returns "bar"
even in the presence of symlinks.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch,
HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch,
HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message was sent by Atlassian JIRA

View raw message