hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Mon, 20 Oct 2008 21:56:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641174#action_12641174
] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

> I would like to avoid a design that incurs an overhead of an additional RPC everytime
a link is traversed.

>+1. This will affect not only NNBench but all benchmarks including DFSIO and especially
NNThroughputBenchmark.
>GridMix and Sort will probably be less affected, but will suffer too.

+1. I would also like to avoid an extra rpc, since avoiding one is straight forward.

Doug >What did you think about my suggestion above that we might use a cache to avoid this?
First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with
a pre-fetch cache of block locations.

Clearly your cache solution  deals with the extra RPC issue.
Generally I see a cache as a way of improving the  performance of an ordinarily good design
or algorithm. I don't like the use of caches as  part of a design to make an algorithm work
 when alternate good designs are available that don't need a cache. Would we have come up
with this design if we didn't have such an emotionally charged discussion on exceptions?

We have a good design where if the resolution fails due to a symlink, we return this information
to the caller. It does not require the use of a cache.
We are divided over how to return this information - use the return status or use an exception.

The cache solution is a way to avoid making the painfully emotionally charged decision for
the Hadoop community.
I don't want to explain the reason we use the cache to hadoop developers again and again down
the road. 
We should not avoid the decision, but make it. 
A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that
the same is true now.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch,
symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message