hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
Date Fri, 27 Sep 2013 20:39:03 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780350#comment-13780350

Colin Patrick McCabe commented on HADOOP-9984:

bq. Both listStatus and listLinkStatus must return file stats with the exact same paths. Those
paths must be constructed from the "unresolved" paths. The only difference is whether the
stat is for the symlink itself, or the stat of the resolved path.

Yes, exactly.

bq. Let's say a privileged server blocks access to certain schemes like "file" but happily
accepts "hdfs" paths. As a devious user, I now create a symlink in hdfs back to local filesystem.
I use this link to steal your keytab or maybe scribble over your config.

I could be ignorant here, but as far as I know, we have any mechanisms that "block access
to certain schemes."  If you are running code on the machine as Fred, you get access to all
the stuff Fred can do.  So in your hypothetical scenario, fred could just open localFS himself.
 Plus, Kerberos doesn't store its keytab files as world-readable.  Your privileges are not
escalated one iota by symlink resolution-- hence the discussion about people creating symlinks
other people couldn't follow, etc.

With regard to data loss as a result of apps that can't handle schemes other than "hdfs"--
that problem already exists today with federation, as you yourself admitted.  With regard
to delegation tokens, you will get an access control failure in the job, fix it, and move
on.  Or you'll have to go after the MR job author for trying to access stuff that he's not
supposed to.  This seems very similar to the situation today, where MR jobs can try to talk
to any FS they want.

Why don't we include a configuration option that disables cross-filesystem symlink resolution
on the client?
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, HADOOP-9984.005.patch
> During the process of adding symlink support to FileSystem, we realized that many existing
HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example
is applications that assume that !FileStatus#isFile implies that the inode is a directory.
 As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
resolved paths.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message