hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Mon, 20 Oct 2008 21:56:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641174#action_12641174

Sanjay Radia commented on HADOOP-4044:

> I would like to avoid a design that incurs an overhead of an additional RPC everytime
a link is traversed.

>+1. This will affect not only NNBench but all benchmarks including DFSIO and especially
>GridMix and Sort will probably be less affected, but will suffer too.

+1. I would also like to avoid an extra rpc, since avoiding one is straight forward.

Doug >What did you think about my suggestion above that we might use a cache to avoid this?
First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with
a pre-fetch cache of block locations.

Clearly your cache solution  deals with the extra RPC issue.
Generally I see a cache as a way of improving the  performance of an ordinarily good design
or algorithm. I don't like the use of caches as  part of a design to make an algorithm work
 when alternate good designs are available that don't need a cache. Would we have come up
with this design if we didn't have such an emotionally charged discussion on exceptions?

We have a good design where if the resolution fails due to a symlink, we return this information
to the caller. It does not require the use of a cache.
We are divided over how to return this information - use the return status or use an exception.

The cache solution is a way to avoid making the painfully emotionally charged decision for
the Hadoop community.
I don't want to explain the reason we use the cache to hadoop developers again and again down
the road. 
We should not avoid the decision, but make it. 
A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that
the same is true now.

> Create symbolic links in HDFS
> -----------------------------
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch,
symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message