Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA97518B93 for ; Fri, 15 May 2015 21:59:04 +0000 (UTC) Received: (qmail 41860 invoked by uid 500); 15 May 2015 21:59:04 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 41813 invoked by uid 500); 15 May 2015 21:59:04 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 41800 invoked by uid 99); 15 May 2015 21:59:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2015 21:59:04 +0000 Date: Fri, 15 May 2015 21:59:04 +0000 (UTC) From: "Jason Lowe (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546242#comment-14546242 ] Jason Lowe commented on HADOOP-9984: ------------------------------------ bq. In the case of globStatus, things are even worse if you choose to resolve symlinks, since then you can glob for '*foo' and get back 'bar'. A lot of software breaks if globs return back file names that the glob doesn't match. As I understand it, globStatus is simply listStatus with filtering applied to the results. If that's the case then globStatus should do whatever listStatus does with respect to symlinks, and that would be to resolve the symlink _except_ for the path in the resulting FileStatus. This goes back to the readdir() + stat() analogy -- everything in the resulting FileStatus needs to be about where the symlink points _except_ the path. The path would still be the path to the link, since that's what readdir() would see as well. Every other field in FileStatus has to do with what stat() would return, so those fields should be reflective of what the symlink references. So globStatus should not lead to surprises where "foo*" returns "bar" even in the presence of symlinks. > FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default > ---------------------------------------------------------------------------------- > > Key: HADOOP-9984 > URL: https://issues.apache.org/jira/browse/HADOOP-9984 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 2.1.0-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Critical > Labels: BB2015-05-TBR > Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch > > > During the process of adding symlink support to FileSystem, we realized that many existing HDFS clients would be broken by listStatus and globStatus returning symlinks. One example is applications that assume that !FileStatus#isFile implies that the inode is a directory. As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning resolved paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)