hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Thu, 16 Oct 2008 17:01:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640224#action_12640224

Doug Cutting commented on HADOOP-4044:

> I am almost certain that it won't affect any benchmark other than NNBench.

If that's really the case, what are we worried about here?

> What happens if a link or directory is changed between these two operations? open() fails
though it should not.

The same thing that happens if that change is made just after a file is opened.  If you open
a file, then someone else deletes it, subsequent accesses to that file will fail.  The namenode
doesn't keep any state for files open for read, so a short-lived cache of block locations
doesn't change things fundamentally.

That said, the cache idea only works for open, and doesn't work for rename, delete, etc. 
In these cases we don't want to pre-fetch a list of block locations.  So nevermind the cache
idea anyway.

The current options on the table seem to be:
 - Dhruba's patch modified to use Nicholas's idea of LinkResult<T> style, to avoid defining
new return type classes for the SPI methods.
 - A less-invasive approach that requires two RPCs.  We may later optimize this by converting
FileSystem's API to use the above style, but we may not need to.  We do need to be careful
not to incompatibly change FileSystem's public API, but the SPI's not so constrained, since
all FileSystem implementations are currently in trunk and can be easily maintained in a coordinated
manner.  In the meantime, we can start using symbolic links in archives, etc. while we work
out if and how to better optimize them.

Does that sound right?

I don't have a strong preference.  If I were implementing it myself I'd probably go for the
simple approach first, early in a release cycle, then benchmark things and optimize it subsequently
if needed.  The risk is not that great, since we already have good ideas of how to optimize
it.  But the optimization will clearly help scalability, so it wouldn't hurt to have it from
the outset either.

FYI, I tried implementing my patch above as a LinkedFileSystem subclass, to better contain
the changes.  This turned out to be messy, since a LinkedFileSystem can link to an unlinked
FileSystem.  With the subclass approach this must be explicitly handled by casts and 'instanceof',
while when FileSystem itself supports links this can be handled by default method implementations.
 So I am not convinced that a LinkedFileSystem subclass is a net win.

> Create symbolic links in HDFS
> -----------------------------
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch,
symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message