Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C334106B5 for ; Fri, 20 Sep 2013 20:58:54 +0000 (UTC) Received: (qmail 35812 invoked by uid 500); 20 Sep 2013 20:58:53 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 35769 invoked by uid 500); 20 Sep 2013 20:58:53 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 35716 invoked by uid 99); 20 Sep 2013 20:58:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 20:58:53 +0000 Date: Fri, 20 Sep 2013 20:58:53 +0000 (UTC) From: "Jason Lowe (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773417#comment-13773417 ] Jason Lowe commented on HADOOP-9972: ------------------------------------ +1 to the idea of having a new API where symlinks resolution and a per-entrhy error handler can be specified. That should allow the client to cover all the three scenarios based on how the handler reacts to errors. Just to be clear, what happens if the error handler does not rethrow the exception? Is the entry removed from the listStatus results, returned as a raw symlink, or ...? Is it controllable by the error handler? I'm not sure if the difference between "log exception and continue" vs. "ignore it completely" is a different return code from the error handler method or just whether the handler logs or not. bq. At first glance, I like extending the PathFilters. That's a twist on the approach, not sure that's been proposed. I suppose one could derive a new interface from PathFilter that becomes PathOptions and listStatus(Path, PathFilter) could check internally if it's actually got a PathOption instead of a PathFilter and behave differently. However I think an explicit, separate API would be preferable though, simply for clarity of what the API expects from callers. > new APIs for listStatus and globStatus to deal with symlinks > ------------------------------------------------------------ > > Key: HADOOP-9972 > URL: https://issues.apache.org/jira/browse/HADOOP-9972 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Affects Versions: 2.1.1-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > > Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, > there is a lot of code out there that looks at FileStatus#isFile, and > if it returns false, assumes that what it is looking at is a > directory. In the case of a symlink, this assumption is incorrect. > It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira