hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Thu, 09 Oct 2008 17:06:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638330#action_12638330

Doug Cutting commented on HADOOP-4044:

Owen> I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside
the namenode is a better programming style.

One could in theory use exceptions to handle any condition.  For example, we could, if you
attempt to open a directory, return the directory listing in an exception.  But, rather, we
prefer to use data structures unless exceptions are clearly more appropriate.  So, the question
is, when are exceptions appropriate?

A primary advantage of exceptions is their long-distance throwing.  The maxim is, "throw early",
"catch late", since in most cases its better to pass exceptions through and let the highest
layers deal with them.  So one sign that exceptions are appropriate is when you don't know
who should handle them.  That is not the case here.  There's only one place where this is
intended to be caught, in boilerplate methods of FileSystem.java.

In http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html they list just three cases
where exceptions are appropriate: programming errors, client code errors and resource failures.
 Links are clearly not programming errors or client code errors.  But are they resource failures?
 Examples of resource failures are things like network timeouts, out of memory, lack of permission,
etc.  These generally concern resources that are out of the direct control of the application,
and are hence present unexpected outcomes.  Reinforcing that interpretation one finds sentences
like, "This is a misuse of the idea of exceptions, which are meant only for defects or for
items outside the direct control of the program.", in one of the links I provided earlier.
 So, are links a condition outside the direct control of the program?  Are they external resources?
 I don't see that.  The same component that implements open() also directly implements the
storage of the link.  So the fact that something is a link is entirely within the filesystem's
domain.  So, again, using exceptions does not seem justified here.

Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem.
 This misses the intended sense of  "normal".  The value of a link is normal data, under the
control of the filesystem implementation, regardless of whether it points to a different filesystem.
 As I tried to elaborate above, "abnormal", with regards to exceptions, is defined as out
of the direct control of the module, or in error.  One could reasonably call directories as
"abnormal" in some sense--they're not files--but they are not errors nor are they conditions
out of the control of the application, nor are they issues that are best handled at some unknown
spot higher on the stack.  Directories, like links, are thus best modelled directly by the
API, not as exceptions.

The primary argument folks have provided for the superiority of exceptions is that they better
preserve the current service provider interface, and that this interface more obviously describes
the actions in question.  It does describe the non-link part of these actions more concisely,
but it hides the link part, so preserving that API is not a clear win.

This proposed use of exceptions in fact seems to me a poster child for how *not* to use exceptions.
 If we go down this route, I don't see how we'll be able to argue against exceptions in other
places.  Non-local control is not required.  No error is involved.  External conditions are
not involved.  None of the standard reasons for using exceptions are met.  Exceptions are
used here purely as a convenient control flow mechanism.  It's effectively setjmp/longjmp,
whose use we should not encourage.

Owen> we aren't changing the FileSystem API at all.

We're not changing the public API much, but we are changing the service provider API substantially.
 We'd like most FileSystems to support links, and to do so they'll need to change what many
methods return.  It would be nice to be able to do this back-compatibly, without altering
the existing API much.  But I don't see a way to do that and still achieve our performance
goals without setting a bad precedent for the appropriate use of exceptions.

I'm not trying to be an uncompromising ass here.  ("You don't have to try, it comes naturally",
I hear you say.)  If you look at the early history of this issue, you'll see that I only with
reluctance and skepticism originally outlined the return type approach, because it is clearly
a big, disruptive change that's hard to justify.  But I really can see no other way that doesn't
set precedents that we cannot live with as a project.  I'd love for someone to provide a "hail
mary" solution, that leaves the API's return types mostly alone, performs well and doesn't
set a bad example, but until that happens I don't see an alternative.  (Or for someone to
convince me that this is actually a reasonable, appropriate, intended use of exceptions.)
 But until then, I think we just need to accept that FileSystem's SPI must get more complex
in order to efficiently support links.

> Create symbolic links in HDFS
> -----------------------------
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch,
symLink6.patch, symLink8.patch, symLink9.patch
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message