hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9780) Filesystem and FileContext methods that follow symlinks should return unresolved paths
Date Mon, 07 Oct 2013 22:35:42 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788653#comment-13788653

Colin Patrick McCabe commented on HADOOP-9780:

So, the problem we have right now with returning unresolved paths right now is that each time
the server encounters a symlink, it throws an exception, which triggers the client to do another
RPC (well, actually two RPCs, due to an implementation quirk right now-- see HDFS-5293). 
Returning unresolved paths would cause the client to keep redoing these path resolution RPCs
over and over.  This doesn't scale-- basically it multiplies the load on the NameNode by at
least 3x and possibly more, depending on the number of links.

To avoid this, I think we should resolve as much as possible of the symlink on the NameNode.
 The NameNode already knows which inodes are symlinks, and it knows what they point to.  If
what they point to is on the local NameNode (which should be the common case), we should just
resolve it then and there and keep going, rather than doing the "please make another RPC to
me" dance.

Obviously, this doesn't help in the case of cross-namespace symlinks.  However, it does help
a lot in the extremely common case of links to things on the same NameNode.

In a way, this is similar to how {{LocalFileSystem}} already operates.  When you try to read
a local file, it resolves as many symlinks as it can without throwing {{UnresolvedLinkException}},
unless a symlink is dangling.  There's no reason to ask the client for help if you don't need
the help.

> Filesystem and FileContext methods that follow symlinks should return unresolved paths
> --------------------------------------------------------------------------------------
>                 Key: HADOOP-9780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9780
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Colin Patrick McCabe
>            Priority: Minor
> Currently, when you follow a symlink, you get back the resolved path, with all symlinks
removed.  For compatibility reasons, we might want to have the returned path be an unresolved
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}

This message was sent by Atlassian JIRA

View raw message