hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Tue, 24 Sep 2013 22:54:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776870#comment-13776870
] 

Binglin Chang commented on HADOOP-9972:
---------------------------------------

bq. Also, if we want to add more options in the future, we don't want to create listLinkStatusWithFoo
and listLinkStatusWithFooAndBar. Just listStatus(Path, PathOption).
That is exactly why I propose listStatus(Path, PathOption) implemented in FileSystem using
more primitive listLinkStatus(Path), so If we add an option, we don't end up modify all sub
FileSystems code. 

bq. we don't want to create listLinkStatusWithFoo and listLinkStatusWithFooAndBar. Just listStatus(Path,
PathOption).
I am not against listStatus(Path, PathOption) API, just its implementation detail, this issue
can be solved by listStatus(Path, PathOption). 

bq. Hadoop and HDFS exist in an environment where there are unreliable networks.
I don't think "ignore all error" including network issues, it is like disk failure/temporary
unreadable issues in linux, globbing can't ignore that either, in that case error should just
be passed all the way up to user, most user don't want to handle this error in ErrorHandler
too.

bq. So if globStatus swallows unresolved symlink errors.
Are you saying network issue can cause unresolved symlink error? If dead link error is already
mixed up with network errors, plus compatibility reasons, I agree with you, we can't follow
linux practice.


                
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message