hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Fri, 20 Sep 2013 21:46:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773471#comment-13773471

Colin Patrick McCabe commented on HADOOP-9972:

bq.Just to be clear, what happens if the error handler does not rethrow the exception?

If the error handler doesn't rethrow the exception, the listStatus / globStatus operation
continues normally and returns the remaining results.  (We can't return the result that had
the error.)  Unresolved symlinks are one type of error.  Whether to handle {{UnresolvedLinkException}}
differently than other exceptions is up to the {{PathErrorHandler}} you provide.

bq. I'm not sure if the difference between "log exception and continue" vs. "ignore it completely"
is a different return code from the error handler method or just whether the handler logs
or not.

I was proposing that the logging happen inside the {{PathErrorHandler}}.  From the point of
file of FileSystem / FileContext, all we care about is whether the {{PathErrorHandler}} rethrows
the exception or not.  (We can provide a class implementing PathErrorHandler that logs to
FileSystem#LOG if that is a common use case.)

bq.  I suppose one could derive a new interface from PathFilter that becomes PathOptions and
listStatus(Path, PathFilter) could check internally if it's actually got a PathOption instead
of a PathFilter and behave differently. However I think an explicit, separate API would be
preferable though, simply for clarity of what the API expects from callers.

Yeah, I was proposing adding a new type, {{PathOptions}}, which could contain an instance
of {{PathFilter}}.  We could add new methods to {{PathFilter}}, but since it's a public/stable
interface rather than an abstract class, that would be an incompatible change.
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message