Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EEE389656 for ; Mon, 31 Oct 2011 22:41:43 +0000 (UTC) Received: (qmail 3563 invoked by uid 500); 31 Oct 2011 22:41:43 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 3511 invoked by uid 500); 31 Oct 2011 22:41:43 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 3503 invoked by uid 99); 31 Oct 2011 22:41:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2011 22:41:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eli@cloudera.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2011 22:41:36 +0000 Received: by wyg34 with SMTP id 34so1532938wyg.35 for ; Mon, 31 Oct 2011 15:41:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.60.130 with SMTP id p2mr20255606wbh.3.1320100876350; Mon, 31 Oct 2011 15:41:16 -0700 (PDT) Received: by 10.180.105.136 with HTTP; Mon, 31 Oct 2011 15:41:16 -0700 (PDT) In-Reply-To: References: <88C1907F-C975-4261-9740-502088C648D1@yahoo-inc.com> Date: Mon, 31 Oct 2011 15:41:16 -0700 Message-ID: Subject: Re: relative symbolic links in HDFS From: Eli Collins To: hdfs-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Oct 31, 2011 at 2:19 PM, Charles Baker wrote: > I did a hasty test initially of getLinkTarget() but forgot to also use th= e > same for the input path to FileContext#createSymlink() so yeah, turns out= it > does indeed work. Sorry about that. Looks like I won't need to modify > FileContext after all which is good :) > > The rationale of keeping things consistent so as not to break compatibili= ty > makes sense, it just isn't that intuitive coming at it from a 'fresh' > perspective. Was the original idea to return the symlink information in > getFileStatus() instead of having to access it via =A0getFileLinkStatus()= ? > Maybe it's naive but it seems like you could just rename getFileLinkStatu= s() > to getFileStatus() and none would be the wiser... You need both, getFileStatus is like stat(2) and getFileLinkStatus is like lstat(2). getFileStatus resolves all symlinks in the path, ie you want the FileStatus of the file that the path points to (regardless of links), while getFileLinkStatus, if called on a symlink, will give you the FileStatus of the link (not what it points to). Only applications that are link-aware need to use getFileLinkStatus (otherwise links are resolved transparently to the caller). > > Regardless, I do think it makes sense to have a convenience method to get= the > raw path that was supplied at symlink creation. The first thing I tried w= as > Path#toString() so that I guess is pretty intuitive but I can't comment o= n > whether that would break compatibility. > Would a method Path#getPathPart that returns just the path part of the URI be sufficient? This would be similar to java's URI#getPath (remember in Hadoop Path =3D=3D URI) which just returns the path part of a URI. (It's unfortunate that the Path class is named "Path" since now we don't have a good name for just the path part). Thanks, Eli > Thanks! > > -Chuck > > > -----Original Message----- > From: Eli Collins [mailto:eli@cloudera.com] > Sent: Monday, October 31, 2011 11:45 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: relative symbolic links in HDFS > > On Mon, Oct 31, 2011 at 9:27 AM, Charles Baker wrote: >> Hey guys. Thanks for the replies. Fully qualified symbolic links are >> problematic in that when we wish to restore a directory structure > containing >> symlinks from HDFS to local filesystem, the relativity is lost. For > instance: >> >> /user/cbaker/foo/ >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0link1 -> ../../cbaker >> >> The current behavior of getFileLinkStatus() results in a path for link 1 >> being: >> >> /user/cbaker >> >> Not: >> >> ../../cbaker >> >> >> Also, some symlinks may point to non-existent locations within HDFS whic= h >> only have relevance to the local filesystem. This appears as though it > could >> (though I haven't tested yet) result in an exception when the attempt is > made >> to qualify it. If I get a chance, I'll try it out later today. >> >> FileContext.getLinkTarget() doesn't work for this case since it returns > only >> the final component of the target, not the complete relative path. > > Really? =A0FC#getLinkTarget should return the target verbatim, as > specified by the user when creating the link: > > Eg see test testCreateLinkToDotDotPrefix: > =A0fc.createSymlink(new Path("../file"), link, false); > =A0... > =A0assertEquals(new Path("../file"), fc.getLinkTarget(link)); > > >> But even >> if it did return the relative path, it seems counter-intuitive to me. I > agree >> with Daryn and expect the behavior of getFileLinkStatus() to return the >> symlink as is and not presume that I wanted it qualified. If I wanted a >> qualified path for a symlink, I would expect to call Path.makeQualified(= ) > to >> do so. > > It does this because getFileStatus always returns fully qualified > paths in HDFS, and we don't make to make callers check the type and > care about the method that was used to obtain the FileStatus, eg to > know whether it contains a fully qualified path or not. > > I think the original rationale for while FileStatus objects always > have fully qualified paths is so they can be passed around w/o callers > having to do future work to access them ie didn't want to disassociate > the path from the file system it exists on. Note that in Hadoop > "Paths" are actually URIs, vs file system paths (a subset of URIs). > > Regardless of the rationale, changing getFileStatus to return objects > w/o fully qualified paths would break compatibility with a lot of > existing programs. It would also hinder people porting to FileContext > which tries to be consistent with FileSystem. > > Would a new method on FileStatus or Path that returns the unqualified > version of the path (ie w/o the scheme and authority, and w/o > resolving relative paths relative to the FileContext) work? =A0Ie the > FileStatus could return the contents of the HdfsFileStatus w/o making > it fully qualified. > > Thanks, > Eli > SDL PLC confidential, all rights reserved. > If you are not the intended recipient of this mail SDL requests and requi= res that you delete it without acting upon or copying any of its contents, = and we further request that you advise us. > SDL PLC is a public limited company registered in England and Wales. =A0R= egistered number: 02675207. > Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL= 6 7DY, UK. > >