hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Fri, 20 Sep 2013 02:02:53 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772566#comment-13772566

Binglin Chang commented on HADOOP-9972:

There are two issues we are talking about, one is new API:

bq. The discussion about whether HDFS should replace listStatus with something more like POSIX
readdir seems like a tangent.
I think there is a confusion here, I didn't propose to use POSIX readdir. The API name readdir
is probably causing confusion here so I changed to the listLinkStatus instead, it's semantics
is the same as current hdfs listStatus which doesn't resolve links.

bq. To prevent this scenario, we want to change FileStatus#listStatus and FileStatus#globStatus
to resolve all symlinks
I'am fully aware of this, and my proposal do not break this.

Frankly I don't see any conflict in the two proposals. I order to implement listStatus(Path,
PathOption), a listLinkStatus(or something with the same semantics) primitive/core API is
required, and it is mostly there(HDFS, other fs doesn't support symlink, except LocalFS).
Since there is no conflict from my side, I think you can just submit the patch or give the
implementation detail of listStatus(Path, PathOption) first. 

Another issue is globbing didn't follow linux practice:
It is probably a tangent, it is brought up just because the example about usage of PathErrorHandler.
I say that Linux shell globbing ignore all errors, the example can be solved by following
linux practice. If we decide not to follow linux practice and solve it another way, that is
OK, although I prefer linux practice.

> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message