hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: symlink support in Hadoop 2 GA
Date Wed, 18 Sep 2013 19:13:35 GMT
It's an incompatible change. Existing APIs like listStatus and globStatus
need to be symlink aware now, which can break assumptions of user code.
We've had FileStatus#isSymlink() since the early days, but lots of user
code hasn't been updated to use it.

I think Eli's earlier email did a good job at laying out the current state
and our options. I didn't realize this before, but most of HADOOP-8040 is
already in branch-2.1-beta, but many of the subsequent changes are not
(e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state
of symlink support in branch-2.1-beta is half-baked, which is why "do
nothing" is not a good option.

With that in mind, perhaps Eli's proposals (abbreviated here) make more
sense:

1) Delay 2.2 GA and put in some more effort to fix API issues like
HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of
this post-GA, but we can do our best to fix these issues compatibly in 2.3.
2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that
makes 2.3 a pretty big jump from GA. Since symlinks have already appeared
in the 2.1.0 release, it'd also technically make 2.2 a regression from
2.1.0.
3) Wait for 3.0, which I don't think anyone wants.




On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran <stevel@hortonworks.com>wrote:

> the main change is whatever APIs are going to be provided (and implicitly:
> supported for a long time) to handle symlinks separately from directories
>
>
> On 18 September 2013 17:24, Eli Collins <eli@cloudera.com> wrote:
>
> > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran <stevel@hortonworks.com
> > >wrote:
> >
> > > On 18 September 2013 12:53, Alejandro Abdelnur <tucu@cloudera.com>
> > wrote:
> > >
> > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran <
> > stevel@hortonworks.com
> > > > >wrote:
> > > >
> > > > > I'm reluctant for this as while delaying the release, because we
> are
> > > > going
> > > > > to find problems all the way up the stack -which will require a
> > > > > choreographed set of changes. Given the grief of the protbuf
> update,
> > I
> > > > > don't want to go near that just before the final release.
> > > > >
> > > >
> > > > Well, I would use the exact same argument used for protobuf (which
> only
> > > > complication was getting protoc 2.5.0 in the jenkins boxes and
> > > communicate
> > > > developers to do the same, other than that we didn't hit any other
> > issue
> > > > AFAIK) ...
> > > >
> > >
> > > protobuf was traumatic at build time, as I recall because it was
> neither
> > > forwards or backwards compatible. Those of us trying to build different
> > > branches had to choose which version to have on the path, or set up
> > scripts
> > > to do the switching. HBase needed rebuilding, so did other things. And
> I
> > > still have the pain of downloading and installing protoc on all Linux
> > VMs I
> > > build up going forward, until apt-get and yum have protoc 2.5
> artifacts.
> > >
> > > This means it was very painful for developer, added a lot of late
> > breaking
> > > pain to the developers, but it had one key feature that gave it an
> edge:
> > it
> > > was immediately obvious where you had a problem as things didn't
> compile
> > or
> > > classload without linkage problems. No latent bugs, unless protobuf 2.5
> > has
> > > them internally -for which we have to rely on google's release testing
> to
> > > have found.
> > >
> > > That is a lot simpler to regression test than adding any new feature to
> > > HDFS and seeing what breaks -as that is something that only surfaces
> out
> > in
> > > the field. Which is why I think it's too late in the 2.1 release
> > timetable
> > > to add symlinks. We've had a 2.1-beta out there, we've got feedback.
> Fix
> > > those problems that are show stoppers, but don't add more stuff. Which
> is
> > > precisely why I have not been pushing in any of my recent changes. I
> may
> > > seem ruthless arguing against symlinks -but I'm not being inconsistent
> > with
> > > my own commit history. The only two things I've put in branch-2.1 since
> > > beta-1 were a separate log for the Configuration deprecation warnings
> > and a
> > > patch to the POM for a java7 build on OSX: and they weren't even my
> > > patches.
> > >
> > >
> > > -Steve
> > >
> > > (One of these days I should volunteer to be the release manager and
> it'll
> > > be obvious that Arun is being quite amenable to all the other
> developers)
> > >
> > >
> > >
> > > >
> > > > IMO, it makes more sense to do this change during the beta rather
> than
> > > when
> > > > GA. That gives us more flexibility to iron out things if necessary.
> > > >
> > > >
> > > I'm arguing this change can go into the beta of the successor to 2.1
> -not
> > > GA.
> > >
> > >
> > What does "this change" refer to?  Symlinks are already in 2.1, and the
> > existing semantics create problems for programs (eg see the pig
> > example in HADOOP-9912)
> > that we need to resolve.  I don't think do nothing is an option for 2.2.
> > GA.
> >
> > Thanks,
> > Eli
> >
> >
> >
> >
> >
> >
> >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
>
>
>
> --
> Steve Loughran
> Hortonworks Inc
> stevel@hortonworks.com
> skype: steve_loughran
> tel: +1 408 400 3721
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message