hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Baker" <cba...@sdl.com>
Subject RE: relative symbolic links in HDFS
Date Fri, 28 Oct 2011 16:56:03 GMT
Oh, and sorry about the signature. The mailserver injects that


-----Original Message-----
From: Charles Baker [mailto:cbaker@sdl.com] 
Sent: Friday, October 28, 2011 9:47 AM
To: hdfs-dev@hadoop.apache.org
Subject: relative symbolic links in HDFS

Hey guys. We are in the early stages of planning and evaluating a hadoop
'cold-storage' cluster for medium to long term storage of mixed data (small
to large files, zips, tar, etc...) and tons of symlinks. We do realize that
small files aren't ideal in HDFS but it's for long-term storage and beats the
cost of more NetApps by potentially several hundred thousand dollars by
leveraging existing equipment. We are already successfully using Hadoop and
the MapReduce framework in a different project and have developed quite a bit
of in-house expertise when it comes to Hadoop.


Since this use-case is preserving and restoring an arbitrary directory
structure, I have been evaluating 0.21.0's support of symlinks and found that
although it happily creates relative symlinks, the code that is called to
retrieve the symlink 'FileContext.getFileLinkStatus()' always converts the
relative Path object to an absolute one through the use of the
qualifySymlinkTarget() method. Though I was easily able to work around this
limitation by changing the one line of code that calls this function from: 


fi.setSymlink(qualifySymlinkTarget(fs, p, fi.getSymlink()));






It has made us curious as to why the decision was made to always return the
absolute path of a symlink in the first place. Is it that attempts to open
targets to relative symlinks throw exceptions and it saves having the user do
the work to construct the absolute path since that's the general use-case? Or
does this workaround violate some internal assumptions of the code or ideas
about how a URI should behave (even though relative paths are implicitly
supported by URI object)? Any insight you guys can shed on this would be
great. I've tested the above change by adding support for symlinks (into and
out of HDFS) into FsShell.copyToLocal() and copyFromLocal() using a mixed bag
of relative and absolute symlinks and symlinks->symlinks and have so far
found no ill effects. 






<BR style="font-size:4px;">
<a href = "http://www.sdl.com/sdl-vision"><img
alt="www.sdl.com/sdl-vision" border="0"/></a>
<font face="arial"  size="2"><a href ="http://www.sdl.com/sdl-vision"
style="color:005740; font-weight: bold">www.sdl.com/sdl-vision</a></font>
<font face="arial"  size="1" color="#736F6E">
<b>SDL PLC confidential, all rights reserved.</b>
If you are not the intended recipient of this mail SDL requests and requires
that you delete it without acting upon or copying any of its contents, and we
further request that you advise us.<BR>
SDL PLC is a public limited company registered in England and Wales.
Registered number: 02675207.<BR>
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6
7DY, UK.

View raw message