hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: symlink support in Hadoop 2 GA
Date Wed, 18 Sep 2013 09:29:28 GMT
On 17 September 2013 23:05, Eli Collins <eli@cloudera.com> wrote:

> (Looping in Arun since this impacts 2.x releases)
> I updated the versions on HADOOP-8040 and sub-tasks to reflect where
> the changes have landed. All of these changes (modulo HADOOP-9417)
> were merged to branch-2.1 and are in the 2.1.0 release.
> While symlinks are in 2.1.0 I don't think we can really claim they're
> ready until issues like HADOOP-9912 are resolved, and they are
> supported in the shell, distcp and WebHDFS/HttpFS/Hftp (these are not
> esoteric!).  Someone can create a symlink with FileSystem causing
> someone else's distcp job to fail. Unlikely given they're not exposed
> outside the Java API but still not great.   Ideally this work would
> have been done on a feature branch and then merged when complete, but
> that's water under the bridge.
> I see the following options:
> 1. Fixup the current symlink support so that symlinks are ready for
> 2.2 (GA), or at least the public APIs. This means the APIs will be in
> GA from the get go so while the functionality might be fully baked we
> don't have to worry about incompatible changes like FileStatus#isDir()
> changing behavior in 2.3 or a later update.  The downside is this will
> take at least a couple weeks (to resolve HADOOP-9912 and potentially
> implement the remaining pieces) and so may impact the 2.2 release
> timing. This option means 2.2 won't remove the new APIs introduced in
> 2.1.  We'd want to spin a 2.1.2 beta with the new API changes so we
> don't introduce new APIs in the beta to GA transition.

I'm reluctant for this as while delaying the release, because we are going
to find problems all the way up the stack -which will require a
choreographed set of changes. Given the grief of the protbuf update, I
don't want to go near that just before the final release.

We already have lots of 1.x era code that assume !isDir() == isFile() -I
know that from spending lots of time in the FS specification layer. That's
something which is going to break with Symlinks, irrespective of when the
feature is rolled out.

The other thing we have to do is push back the API changes into 1.x, at
least at the FileSystem interface layer, so that code which uses
IsDirectory, isSymlink, etc does not need to be edited to compile & run
against both versions. I know Chris Nauroth has been doing this, but think
we need to make sure it is all there. This will let things like Pig compile
against all versions with symlink-ready code.

The other issues is thatit goes on to increase the pressure to get other
features in there "hey, we've got 2 more weeks! let's add X!(where for me,
X:={HADOOP-8545, some restrictions on valid names of app types & instance
names for YARN, ...).

My vote then: freeze and ship. We're happy with the wire formats, the API
has added knowledge of Symlink and Filesystem features can evolve
afterwards -with layers above handling the changes.

> 2. Revert symlinks from branch-2.1-beta and branch-2. Finish up the
> work in trunk (or a feature branch) and merge for a subsequent 2.x
> update.  While this helps get us to GA faster it would be preferable
> to get an API change like this in for 2.2 GA since they may be
> disruptive to introduce in an update (eg see example in #1). And of
> course our users would like symlinks functionality in the GA release.
> This option would mean 2.2 is incompatible with 2.1 because it's
> dropping the new APIs, not ideal for a beta to GA transition.

Why just ship as is, with a note "symlinks not live yet, leave alone".
That's what's been in the betas to date.

> 3. Revert and punt symlinks to 3.x.  IMO should be the last resort.
I'd prefer it in 2.3 -which is where I'm targeting all my feature creep.

IMO 2.1 is frozen except for bug fixes

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message