hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: [DISCUSS] secure 0.20-based branch
Date Tue, 27 Apr 2010 21:15:52 GMT

On Apr 23, 2010, at 10:38 PM, Chris Douglas wrote:

>> I'm not proposing any more back- or forward-porting than will be done anyway.
> Because trunk is the shared repository that contains the security
> work. And a working append. And dozens of smaller, but important
> features including the 1.0 APIs. Symlinks. Optimizations to the
> shuffle. Splittable bzip compression. Stability and scalability fixes
> to the NameNode and JobTracker. Unicorns and happiness.

I'm for anything that gets all the goodies above out in a release.  I don't care if they all
get in one release or if its spread out over 2 or 3.
Right now, about 1/4 of the above (e.g. happiness, but no unicorns) is in CDH2/3.  Trunk has
stalled, getting new --  CORE -- features requires using other branches. 

Although I would like to see the changes that these other branches have in apache's SVN, they
belong in trunk.  0.20 is old already.  Its the old, stable branch now and new stuff should
go into newer releases.  I've been waiting for things like the Shuffle refactor (30% performance
improvement for some of my job flows) for a long time.

Just because Y! is not going to upgrade their deployment past their branch for a long time
does not mean the rest of the community has to wait.  I lived on 0.19.2 in production until
very recently -- it became a solid branch without Y! or Facebook.  Without the same testing
muscle, it might take 1 or two more minor releases to stabilize, but the community's release
schedule IMO desperately needs to become more independent of the biggest players.  

Trunk should be moved forward and incorporate Cloudera and Yahoo's improvements aggressively.
 Its OK to have a 0.x.0 release that isn't completely stable yet, or backed by the biggest
users.   It is important to incorporate improvements made by productive contributors into
actual releases in a timely fashion, or else those contributors will roll their own versions
and eventually diverge significantly from the community rather than wait to get value from
their work.

> Stabilizing, packaging, and testing trunk is drudgery, but it can be shared.
> I can see the value in restarting collaboration between major
> contributors by reestablishing a common branch, and 0.20 will probably
> be more successful in that respect, at least earlier. However, I
> continue to oppose sinking combined energy into 0.20 at the expense of
> trunk, for reasons already discussed at length. -C

I would love to see an apache release with new, useful features and enhancements.  That could
be a 0.20 with all or most of the Y! and Cloudera stuff in there.  However, if any such effort
slows down progress on trunk -- forget it.  Get a 0.21 or 0.22 out with whatever features
are ready, and move the ball forward on trunk.   We should not encourage 0.20 to live forever.

0.21 and 0.22 should be releases that are compelling enough for Y!, Cloudera, and anyone else
with their own customizations to want to move to for their own sake.

>> Doug

View raw message