hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Tue, 07 Oct 2008 05:42:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637384#action_12637384

dhruba borthakur commented on HADOOP-4044:

1. "symlink" vs "link":
   I think it makes sense to call the current implementation as "links" instead of symlinks.
None of the existing file system implementations have any support for any kinds of links.
It is ok for the first implementation to refer to this new construct as a generic "link".
HDFS implements it as a symbolic link, but some other file system may implement "links" as
hard links.

  +1 for calling this construct as "link" instead of "symlink".

2. Exceptions vs Objects-as-return-status in an public API (FileSystem or ClientProtocol API)
   Exceptions or Object-as-return-value approaches are two ways of communication a certain
piece of information to the user of the API. 
   (a) One goal is to discuss  how we can attempt to make that API somewhat future proof.
If we consider our current Hadoop RPC, the only way to serialize/deserialize an exception
object is to serialize its message string. The client side can de-serialize this string and
reconstruct an exception object. If the return status need to contain various different pieces
of information, then serializing/deser as a string inside the exception object is not very
elegant. Many other RPC systems (e.g. Thrift)  allow versioning objects (adding new fields)
but many might not allow adding new exceptions to a pre-existing method call.  Thus, making
API calls return objects-as-return-types seem to be more future-proof than adding exceptions.
   (b) The thumb-rule that we have been following is that exceptions are generated when an
abnormal situation occurs. If an exception is thrown by the ClientProtocol, it is logged by
the RPC subsystem into an error log. This is a good characteristic to have in a distributed
system, makes debugging easy because a scan in the error logs
pinpoints the exceptions raised by the API. Access control checks or disk-full conditions
raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse
a symbolic link" to log an exception message in the error logs, do we? (Of course, we can
special case it and say that we will not log UnresolvedPathException; but by
special-casing it, we are acknowledging that this exception is not an abnormal behaviour).

 +1 for RenameResult rename(Path src, Path dst) throws IOException;

3. Exceptions vs Object-as-return-status inside the NameNode
   (a)  Different filesystem implementations can have very different implementations and a
very different set of developers. For example, HDFS might implement code in such a way that
traversing a link returns a object-status where S3 or KFS throws an exception (internal to
the implementation). If we write a file system implementation for Ceph, we are likely to not
rewrite the Ceph code to not use exceptions (or vice versa). I would like to draw the distinction
that this issue is not related to what is decide in case (2) above.
   (b) The primary focus for designing the internal methods of the NameNode is not future-proof
for backward compatibility. Also, there isn't any requirement to serialize/deserialize any
exception objects as long as that object is used inside the NameNode. Thus, exceptions could
be used here. This keeps most of the HDFS code clean and elegant.

  +1 for Using Exceptions inside the NameNode internal methods.

> Create symbolic links in HDFS
> -----------------------------
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch,
symLink6.patch, symLink8.patch, symLink9.patch
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message