Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7531510B45 for ; Fri, 18 Oct 2013 20:48:11 +0000 (UTC) Received: (qmail 55138 invoked by uid 500); 18 Oct 2013 20:47:45 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 55003 invoked by uid 500); 18 Oct 2013 20:47:26 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 54561 invoked by uid 99); 18 Oct 2013 20:46:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 20:46:51 +0000 Date: Fri, 18 Oct 2013 20:46:51 +0000 (UTC) From: "Sanjay Radia (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799499#comment-13799499 ] Sanjay Radia commented on HADOOP-9984: -------------------------------------- *Background:* Applications often call listStatus and then call fileStatus.isDir() on each of the retuned children to decide if a node is a dir or a file. Such code would potentially break if any of the children are symlinks. This jira proposed that listStatus should follow any child symlinks and return a resolved list of children. Note symlinks that occur in the pathname passed to listStatus are always transparently followed and are not an issue. Also note that when symlinks was introduced, isDir() was deprecated and isDirectory(), isFile(), iSymlink() were added. *Compare with Posix:* Posix has separate readDir and stat/lstat. While readDir does not return the full status of each child, it does return the file type in the struct-dirent (i.e. regular file, dir, symlink etc). *Issue with following child symlinks* This lira's proposed solution (follow the child symlinks) has an issue. Comment [by daryn|https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13786431&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13786431] and [Oct9th|https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13790972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13790972] in this jira shows potential problems with following child symlinks - the most egregious being the duplicate entry. *New Proposed Solution* listStatus should NOT follow child symlinks. Fix all internal utilities, hive, pig, map reduce, yarn, etc to not use isDir() and understand that a directory may contain symlinks. We have two choices for isDir() (which, btw, has already been deprecated) a) isDir() returns the file type of child without following the symlink (this is the code in trunk) b) isDir() returns the file type of child after following the symlink. ( unless the link is dangling). My own preference is (a). The argument in favor of (b) is that it would provide greater compatibility. I think regardless of which choice we pick we will break some apps; in that case I rather pick the cleaner solution, (a). > FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default > ---------------------------------------------------------------------------------- > > Key: HADOOP-9984 > URL: https://issues.apache.org/jira/browse/HADOOP-9984 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 2.1.0-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Blocker > Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch > > > During the process of adding symlink support to FileSystem, we realized that many existing HDFS clients would be broken by listStatus and globStatus returning symlinks. One example is applications that assume that !FileStatus#isFile implies that the inode is a directory. As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning resolved paths. -- This message was sent by Atlassian JIRA (v6.1#6144)