hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: symlink support in Hadoop 2 GA
Date Thu, 19 Sep 2013 17:40:22 GMT
What we're trying to get to here is a consensus on whether
FileSystem#listStatus and FileSystem#globStatus should return symlinks
__as_symlinks__.  If 2.1-beta goes out with these semantics, I think
we are not going to be able to change them later.  That is what will
happen in the "do nothing" scenario.

Also see Jason Lowe's comment here:


On Wed, Sep 18, 2013 at 5:11 PM, J. Rottinghuis <jrottinghuis@gmail.com> wrote:
> However painful protobuf version changes are at build time for Hadoop
> developers, at runtime with multiple clusters and many Hadoop users this is
> a total nightmare.
> Even upgrading clusters from one protobuf version to the next is going to
> be very difficult. The same users will run jobs on, and/or read&write to
> multiple clusters. That means that they will have to fork their code, run
> multiple instances? Or in the very least they have to do an update to their
> applications. All in sync with Hadoop cluster changes. And these are not
> doable in a rolling fashion.
> All Hadoop and HBase clusters will all upgrade at the same time, or we'll
> have to have our users fork / roll multiple versions ?
> My point is that these things are much harder than just fix the (Jenkins)
> build and we're done. These changes are massively disruptive.
> There is a similar situation with symlinks. Having an API that lets users
> create symlinks is very problematic. Some users create symlinks and as Eli
> pointed out, somebody else (or automated process) tries to copy to / from
> another (Hadoop 1.x?) cluster over hftp. What will happen ?
> Having an API that people should not use is also a nightmare. We
> experienced this with append. For a while it was there, but users were "not
> allowed to use it" (or else there were large #'s of corrupt blocks). If
> there is an API to create a symlink, then some of our users are going to
> use it and others are going to trip over those symlinks. We already know
> that Pig does not work with symlinks yet, and as Steve pointed out, there
> is tons of other code out there that assumes that !isDir() means isFile().
> I like symlink functionality, but in our migration to Hadoop 2.x this is a
> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> a) Not uprev until symlink support is figured out up and down the stack,
> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> (equivalent). Or
> b) rip out the API altogether. Or
> c) change the implementation to throw an UnsupportedOperationException
> I'm not sure yet which of these I like least.
> Thanks,
> Joep
> On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy <acm@hortonworks.com> wrote:
>> On Sep 16, 2013, at 6:49 PM, Andrew Wang <andrew.wang@cloudera.com> wrote:
>> > Hi all,
>> >
>> > I wanted to broadcast plans for putting the FileSystem symlinks work
>> > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
>> think
>> > it's pretty important we get it in since it's not a compatible change; if
>> > it misses the GA train, we're not going to have symlinks until the next
>> > major release.
>> Just catching up, is this an incompatible change, or not? The above reads
>> 'not an incompatible change'.
>> Arun
>> >
>> > However, we're still dealing with ongoing issues revealed via testing.
>> > There's user-code out there that only handles files and directories and
>> > will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
>> > for a nice example where globStatus returning symlinks broke Pig; some of
>> > us had a conference call to talk it through, and one definite conclusion
>> > was that this wasn't solvable in a generally compatible manner.
>> >
>> > There are also still some gaps in symlink support right now. For example,
>> > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
>> > resolution, and tooling like the FsShell and Distcp still need to be
>> > updated as well.
>> >
>> > So, there's definitely work to be done, but there are a lot of users
>> > interested in the feature, and symlinks really should be in GA. Would
>> > appreciate any thoughts/input on the matter.
>> >
>> > Thanks,
>> > Andrew
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> --
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.

View raw message