Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75E7E1863E for ; Wed, 29 Apr 2015 19:09:08 +0000 (UTC) Received: (qmail 72655 invoked by uid 500); 29 Apr 2015 19:09:07 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 72605 invoked by uid 500); 29 Apr 2015 19:09:07 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 72593 invoked by uid 99); 29 Apr 2015 19:09:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 19:09:07 +0000 Date: Wed, 29 Apr 2015 19:09:07 +0000 (UTC) From: "Sanjay Radia (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519971#comment-14519971 ] Sanjay Radia commented on HADOOP-9984: -------------------------------------- bq. The problem with dereferencing all symlinks in listStatus is that it's disastrously inefficient # In the proposal listStatus2 is the new API that replaces listStatus # all our libraries need to be changed to use listStatus2 (see item 3 in the4 proposal) # customer who have old code that calls the old listStatus and cannot convert that code immediately can disable symlinks, not use symlinks, or use symlinks sparinglg. In practice I don't think there will dirs with oven tens of symlinks (but symlink2 addresses the problem going forward. bq. isSymlink is broken for dangling symlinks, FileSystem#rename is broken for symlinks, the behavior of symlinks in globStatus is controversial, distCp doesn't support it, ... These are fixable. I think this jira itslef was attempting to fix some of these when we ran into the design flaw of the orignal listStatus bq. cross-filesystem symlinks ... As I pointed out this needs to be discussed. Let make a separate comment that summarizes the cross-namspace issues that have been presented in the various comments in this and other jiras. > FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default > ---------------------------------------------------------------------------------- > > Key: HADOOP-9984 > URL: https://issues.apache.org/jira/browse/HADOOP-9984 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 2.1.0-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Critical > Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch > > > During the process of adding symlink support to FileSystem, we realized that many existing HDFS clients would be broken by listStatus and globStatus returning symlinks. One example is applications that assume that !FileStatus#isFile implies that the inode is a directory. As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning resolved paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)