hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Wed, 29 Apr 2015 19:09:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519971#comment-14519971

Sanjay Radia commented on HADOOP-9984:

bq. The problem with dereferencing all symlinks in listStatus is that it's disastrously inefficient

# In the proposal listStatus2 is the new API that replaces listStatus
# all our libraries need to be changed to use listStatus2 (see item 3 in the4 proposal)
# customer who have old code that calls the old listStatus and cannot convert that code immediately
can disable symlinks,  not use symlinks,  or use symlinks sparinglg. In practice I don't think
there will dirs with oven tens of symlinks (but symlink2 addresses the problem going forward.

bq.  isSymlink is broken for dangling symlinks, FileSystem#rename is broken for symlinks,
the behavior of symlinks in globStatus is controversial, distCp doesn't support it, ...
These are fixable. I think this jira itslef was attempting to fix some of these when we ran
into the design flaw of the orignal listStatus

bq.  cross-filesystem symlinks ...
As I pointed out this needs to be discussed. Let make a separate comment that summarizes the
cross-namspace issues that have been presented in the various comments in this and other jiras.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch,
HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch,
HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message was sent by Atlassian JIRA

View raw message