hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Tue, 28 Apr 2015 18:29:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517600#comment-14517600

Colin Patrick McCabe commented on HADOOP-9984:

Hi Sanjay,

The problem with dereferencing all symlinks in listStatus is that it's disastrously inefficient.
 In a directory with 100 symlinks, it leads to 101 RPCs to the NameNode.  1 to do the listStatus,
and 100 to dereference the symlinks.  RPC load on the NameNode is already a concern for us.
 A scheme like this is just not practical.

I understand the concerns that led to this idea.  People are unsure if their software can
handle symlinks in the listStatus return value.  But in my opinion a better solution to this
is for people to keep symlinks disabled until they can test it with their software.

I also want to clarify that there are also a lot of blocker issues in HADOOP-10019.  There's
at least 5 or 6 other JIRAs we would need to implement to get symlinks anywhere near usable.
 For example, cross-filesystem symlinks are even more controversial than this JIRA (some people
want to get rid of them altogether),  isSymlink is broken for dangling symlinks, FileSystem#rename
is broken for symlinks, the behavior of symlinks in globStatus is controversial, distCp doesn't
support it, etc. etc.  The application-level security issues are even worse (will post a follow-up
about them)

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch,
HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch,
HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message was sent by Atlassian JIRA

View raw message