hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
Date Wed, 18 Sep 2013 06:04:54 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770471#comment-13770471
] 

Binglin Chang commented on HADOOP-9972:
---------------------------------------

Hi Colin, 
About globStatus example, if we follow linux practice, globStatus(p) = glob(pattern).map(path
=> getFileStatus(path))
String [] glob(pattern):
  if matches none, return pattern
  else return matched paths
  ignore all exceptions

I did some experiments, you can see ls * indeed should error message, but ls */stuff should
not show error message.
{code}
[root@master01 test]# mkdir -p aa/cc/foo
[root@master01 test]# mkdir -p bb/cc/foo
[root@master01 test]# chmod 700 bb
[root@master01 test]# ll /home/serengeti/.bash
[root@master01 test]# su serengeti
[serengeti@master01 test]$ ll
total 8
drwxr-xr-x 3 root root 4096 Sep 18 08:30 aa
drwx------ 3 root root 4096 Sep 18 08:31 bb
[serengeti@master01 test]$ ls *
aa:
cc
ls: bb: Permission denied
[serengeti@master01 test]$ ls */cc
foo
{code}

Separate globStatus to glob and getFileStatus seems a more proper way of doing globStatus
rather than add new classes/interface and callback handler, and this is linux practice, should
be more robust.






                
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with
symlinks.  The issue is that code has been written which is incompatible with the existence
of things which are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}}
be fully resolving symlinks, and ignoring dangling ones.  This will prevent incompatibility
with existing MR jobs and other HDFS users.  We should also add new versions of listStatus
and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message