hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Tue, 17 Sep 2013 16:25:54 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769653#comment-13769653

Colin Patrick McCabe commented on HADOOP-9972:

Proposed new APIs (in FileSystem and FileContext):
FileStatus[] listStatus(Path path, PathOptions options) throws IOException;
FileStatus[] globStatus(Path path, PathOptions options) throws IOException;

The {{PathOptions}} class will contain three fields:
  private PathFilter pathFilter;
  private PathErrorHandler errorHandler;
  private Boolean resolveSymlinks;

{{PathFilter}} serves the same purpose that it currently does-- filtering out paths from the

{{PathErrorHandler}} has a {{handleError}} function taking a {{Path}} and {{IOException}}.
 This function gets invoked whenever there is an IOException.  It can choose to rethrow the
exception,  log the exception and continue, or simply ignore it completely.

{{resolveSymlinks}} determines whether we should fully resolve all symlinks that we come across.
 If it is set, we will never get back a FileStatus for a symlink from either {{listStatus}}
or {{globStatus}}.

We can add more fields to {{PathOptions}} later if it becomes necessary.
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message