hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Fri, 20 Sep 2013 23:29:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773584#comment-13773584

Binglin Chang commented on HADOOP-9972:

bq. Hmm. We could have a convenience method called listLinkStatus which just called into listStatus
with the correct PathOptions. I sort of lean towards fewer APIs rather than more, but maybe
it makes sense.
I mean listStatus(Path, PathOption) should call into listLinkStatus(it is HDFS::listStatus
which is a primitive RPC call), not the other way around. I wonder how can we implement listStatus(Path,
PathOption) without the primitive of listLinkStatus(Path)?

bq. Shell globbing doesn't ignore all errors
What I say of globbing is just shell wildcard substitution, it indeed ignore all errors, glob
just substitute a string with wildcard to some string. 
drwxr-xr-x  2 decster  staff  68 Sep 19 17:09 aa
drwxr-xr-x  2 decster  staff  68 Sep 19 17:12 bb
decster:~/projects/test> echo *
aa bb
decster:~/projects/test> echo */cc

In your example:

cmccabe@keter:~/mydir> ls b/c
ls: cannot access b/c: Permission denied
# this error is thrown by ls, not globbing

cmccabe@keter:~/mydir> ls *
ls: cannot open directory b: Permission denied
# "ls *" first become "ls a c"
# then ls throw the error when process c
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message