Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 266E710B35 for ; Fri, 20 Sep 2013 23:29:53 +0000 (UTC) Received: (qmail 78619 invoked by uid 500); 20 Sep 2013 23:29:52 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 78427 invoked by uid 500); 20 Sep 2013 23:29:52 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 78398 invoked by uid 99); 20 Sep 2013 23:29:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 23:29:52 +0000 Date: Fri, 20 Sep 2013 23:29:52 +0000 (UTC) From: "Binglin Chang (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773584#comment-13773584 ] Binglin Chang commented on HADOOP-9972: --------------------------------------- bq. Hmm. We could have a convenience method called listLinkStatus which just called into listStatus with the correct PathOptions. I sort of lean towards fewer APIs rather than more, but maybe it makes sense. I mean listStatus(Path, PathOption) should call into listLinkStatus(it is HDFS::listStatus which is a primitive RPC call), not the other way around. I wonder how can we implement listStatus(Path, PathOption) without the primitive of listLinkStatus(Path)? bq. Shell globbing doesn't ignore all errors What I say of globbing is just shell wildcard substitution, it indeed ignore all errors, glob just substitute a string with wildcard to some string. http://www.linuxjournal.com/content/bash-extended-globbing http://tldp.org/LDP/abs/html/globbingref.html {code} drwxr-xr-x 2 decster staff 68 Sep 19 17:09 aa drwxr-xr-x 2 decster staff 68 Sep 19 17:12 bb decster:~/projects/test> echo * aa bb decster:~/projects/test> echo */cc */cc {code} In your example: {code} cmccabe@keter:~/mydir> ls b/c ls: cannot access b/c: Permission denied # this error is thrown by ls, not globbing cmccabe@keter:~/mydir> ls * a: c ls: cannot open directory b: Permission denied # "ls *" first become "ls a c" # then ls throw the error when process c {code} > new APIs for listStatus and globStatus to deal with symlinks > ------------------------------------------------------------ > > Key: HADOOP-9972 > URL: https://issues.apache.org/jira/browse/HADOOP-9972 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Affects Versions: 2.1.1-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > > Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, > there is a lot of code out there that looks at FileStatus#isFile, and > if it returns false, assumes that what it is looking at is a > directory. In the case of a symlink, this assumption is incorrect. > It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira