hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daryn Sharp <da...@yahoo-inc.com>
Subject Re: relative symbolic links in HDFS
Date Mon, 31 Oct 2011 16:46:33 GMT
Universal support for FileContext & symlinks in all commands should be coming "soon". 
A few jiras that removed complications recently were committed or in the process of being
committed.  Copy commands will require some extra parameters to control whether symlinks are
dereferenced.

Daryn


On Oct 31, 2011, at 11:27 AM, Charles Baker wrote:

> Hey guys. Thanks for the replies. Fully qualified symbolic links are
> problematic in that when we wish to restore a directory structure containing
> symlinks from HDFS to local filesystem, the relativity is lost. For instance:
> 
> /user/cbaker/foo/
>                link1 -> ../../cbaker
> 
> The current behavior of getFileLinkStatus() results in a path for link 1
> being:
> 
> /user/cbaker
> 
> Not:
> 
> ../../cbaker
> 
> 
> Also, some symlinks may point to non-existent locations within HDFS which
> only have relevance to the local filesystem. This appears as though it could
> (though I haven't tested yet) result in an exception when the attempt is made
> to qualify it. If I get a chance, I'll try it out later today.
> 
> FileContext.getLinkTarget() doesn't work for this case since it returns only
> the final component of the target, not the complete relative path. But even
> if it did return the relative path, it seems counter-intuitive to me. I agree
> with Daryn and expect the behavior of getFileLinkStatus() to return the
> symlink as is and not presume that I wanted it qualified. If I wanted a
> qualified path for a symlink, I would expect to call Path.makeQualified() to
> do so. 
> 
> Insofar as porting FsShell to FileContext, I've only modified it to support
> our use-case. I haven't gone to the extent of fully porting it to
> FileContext. Though I'd love to, unfortunately I'm too busy right now to
> contribute :(
> 
> Thanks!
> 
> -Chuck
> 
> 
> 
> -----Original Message-----
> From: Daryn Sharp [mailto:daryn@yahoo-inc.com] 
> Sent: Monday, October 31, 2011 7:46 AM
> To: hdfs-dev@hadoop.apache.org
> Subject: Re: relative symbolic links in HDFS
> 
> It's generally been a problem that filesystem operations mangle paths to be
> something other than what the user provided.  FsShell has to go to some
> (unnecessary, imho) lengths to independently track the user's given path so
> the output paths will match what the user provided.  Not displaying the
> user-given path makes it difficult/impossible for scripts to accurately parse
> the output for the results of an operation on the given paths.
> 
> I like getLinkTarget returning the exact target, but I'd also like a
> FileStatus to return the given path both in the case of a normal path and a
> symlink.  If the user needs a fully qualified path for an operation, my
> opinion is they should request it?
> 
> Daryn
> 
> 
> On Oct 29, 2011, at 9:02 PM, Eli Collins wrote:
> 
>> Hey Chuck,
>> 
>> Why is it problematic for your use that the symlink is stored in
>> FileStatus fully qualified - you'd like FileContext#getSymlink to
>> return the same Path that you used as the target in createSymlink?
>> 
>> The current behavior is so getFileLinkStatus is consistent with
>> getFileStatus(new Path("/some/file")) which returns a fully qualified
>> path (eg hdfs://myhost:123/some/file).   Note that you can use
>> FileContext#getLinkTarget to return the path used when creating the
>> link. Some more background is in the design doc:
>> https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt
>> 
>> There's a jira for porting FsShell to FileContext (HADOOP-6424), if
>> you have a patch (even partial) feel free to post it to the jira.
>> Note that since symlinks are not implemented in FileSystem, clients
>> that use FileSystem to access paths with symlinks will fail.
>> 
>> Btw when looking at the code you pointed out I noticed a bug in link
>> resolution (HADOOP-7783), thanks!
>> 
>> Thanks,
>> Eli
>> 
>> 
>> On Fri, Oct 28, 2011 at 9:46 AM, Charles Baker <cbaker@sdl.com> wrote:
>>> Hey guys. We are in the early stages of planning and evaluating a hadoop
>>> 'cold-storage' cluster for medium to long term storage of mixed data
> (small
>>> to large files, zips, tar, etc...) and tons of symlinks. We do realize
> that
>>> small files aren't ideal in HDFS but it's for long-term storage and beats
> the
>>> cost of more NetApps by potentially several hundred thousand dollars by
>>> leveraging existing equipment. We are already successfully using Hadoop
> and
>>> the MapReduce framework in a different project and have developed quite a
> bit
>>> of in-house expertise when it comes to Hadoop.
>>> 
>>> 
>>> 
>>> Since this use-case is preserving and restoring an arbitrary directory
>>> structure, I have been evaluating 0.21.0's support of symlinks and found
> that
>>> although it happily creates relative symlinks, the code that is called to
>>> retrieve the symlink 'FileContext.getFileLinkStatus()' always converts the
>>> relative Path object to an absolute one through the use of the
>>> qualifySymlinkTarget() method. Though I was easily able to work around
> this
>>> limitation by changing the one line of code that calls this function from:
>>> 
>>> 
>>> 
>>> fi.setSymlink(qualifySymlinkTarget(fs, p, fi.getSymlink()));
>>> 
>>> 
>>> 
>>> to:
>>> 
>>> 
>>> 
>>> fi.setSymlink(fi.getSymlink());
>>> 
>>> 
>>> 
>>> It has made us curious as to why the decision was made to always return
> the
>>> absolute path of a symlink in the first place. Is it that attempts to open
>>> targets to relative symlinks throw exceptions and it saves having the user
> do
>>> the work to construct the absolute path since that's the general use-case?
> Or
>>> does this workaround violate some internal assumptions of the code or
> ideas
>>> about how a URI should behave (even though relative paths are implicitly
>>> supported by URI object)? Any insight you guys can shed on this would be
>>> great. I've tested the above change by adding support for symlinks (into
> and
>>> out of HDFS) into FsShell.copyToLocal() and copyFromLocal() using a mixed
> bag
>>> of relative and absolute symlinks and symlinks->symlinks and have so far
>>> found no ill effects.
>>> 
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> -Chuck
>>> 
>>> 
>>> 
>>> </pre>
>>> <BR style="font-size:4px;">
>>> <a href = "http://www.sdl.com/sdl-vision"><img
> src="http://www.sdl.com/images/email_new_logo.png"
> alt="www.sdl.com/sdl-vision" border="0"/></a>
>>> <BR>
>>> <font face="arial"  size="2"><a href ="http://www.sdl.com/sdl-vision"
> style="color:005740; font-weight: bold">www.sdl.com/sdl-vision</a></font>
>>> <BR>
>>> <BR>
>>> <font face="arial"  size="1" color="#736F6E">
>>> <b>SDL PLC confidential, all rights reserved.</b>
>>> If you are not the intended recipient of this mail SDL requests and
> requires that you delete it without acting upon or copying any of its
> contents, and we further request that you advise us.<BR>
>>> SDL PLC is a public limited company registered in England and Wales.
> Registered number: 02675207.<BR>
>>> Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6
> 7DY, UK.
>>> </font>
>>> 
> 


Mime
View raw message