hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Streamlining the Hadoop release process
Date Thu, 25 Apr 2013 15:28:50 GMT
I like the idea.  +1

I would just make a few tweaks to it, but I am not married to the tweaks
and would be happy to adopt the proposal as is. I would prefer a trunk
release every quarter.  I am fine if a release manager wants to release
more often, but I think a regular release cadence makes it simpler for
downstream projects and customers to plan for testing and integrating with
these new releases.

I would also like to have a formal process for adopting long term
stabilization branches and for retiring them. The problem with branch-0.23
is that it really is just being stabilized by one group.  This in and of
itself isn't bad but if the majority of the community is on the same base
it will stabilize more quickly, and downstream projects/customers can know
that for the next N-months this branch will be available and receive
bug/security fixes.  At the end of N-months the community can decide to
continue supporting the branch for another period of time or to stop.
That way customers can have confidence that when they adopt a branch they
know how long it will be around for.

Also I would not really want 0.23.7 to be marked as the latest stable
release, unless 
1) the community decides that we will maintain it going forward for at
least 6 to 12 more months
2) we have a nice long explanation about why 0.23.X is actually a higher
release than 1.X is and
3) 0.23 gets the changes to the YARN APIs back ported to it.
The 1.X line has support from several organizations to maintain it for at
least the next year or so and there is no need to confuse people about
version numbering.  (Putting on my Yahoo! hat now) we at Yahoo! are still
trying to figure out the exact time frame as to when and how we can move
off of 0.23 and on to 2.0. I don't know if we are going to be supporting
0.23 beyond critical bug fixes nor really for how long we would be doing
that.  It will probably be for the next 6 months or so, but I really don't
know.  It depends on how fast 2.0 becomes stable.


On 4/24/13 10:17 PM, "Konstantin Shvachko" <shv.hadoop@gmail.com> wrote:

> Hi everybody,
>There was and is a number of discussions about Hadoop version
>compatibility, feature porting, stability. I think that many problems of
>Hadoop are the result of our flawed release processes and can be solved by
>streamlining the releases.
>It is a fact that current trunk turned into a junk-yard of partly
>implemented ideas at different stages of the development. Saying this not
>because trunk should not evolve, but mostly because there are no any plans
>on the horizon to release anything from trunk and therefore nobody cares.
>Thus now in order for A feature to make into A release the former should
>ported into an earlier branch. And that becomes a controversy because
>a) most major features contain incompatible changes, and
>b) they introduce massive changes, which break reliability
>This destabilizes the release canceling former efforts to fix bugs and
>provide working environment for the upstream projects. I mean stabilizing
>and adding features are mutually exclusive activities. This is in part why
>Hadoop 2 stabilization effort is perpetual.
>The solution imho is to release instead of backporting. That is, produce a
>new release for every or a few new major feature. Say, snapshots would be
>new release, Windows support another, InodeIDs or local reads optimization
>would have been the next, etc. As the community we should release even if
>that particular release is not planned to be stable, which we do anyway.
>Stabilization requires meticulous work and rather stable code base. This
>was done for Hadoop 1 and Hadoop 0.23 as the most recent examples.
>In fact we cannot predict in advance which branch will become stable until
>it is somewhere in production, which is beyond the scope of control of
>My practical suggestions are:
>1. Produce a series of feature releases to catch up branch-2 with trunk.
>   We can prioritize features in general or let the release manager to
>decide which feature to pick up from trunk.
>   Version numbering is also up for discussion. I would call them 2.x and
>reserve the minor numbers for subsequent stabilization bug-fix-releases.
>2. Build new features in dev-branches until they are done. We do it now,
>but should enforce more.
>3. After branch-2 is caught up with trunk release from trunk by merging a
>few new dev-branches. Releasing with 2-3 new features should be the golden
>   I would call these releases 3.x
>4. Disallow backporting features that have not been released from trunk.
>   This is VERY important for forward going release process.
>5. I'd like to propose to move the latest stable version from 1.0.X to
>   I think that running a release in production for 9 months on 40K nodes
>is consistent with the definition of stable.
>Some ideas for discussion.
>P.S. This is not to preempt discussion on stabilizing 2.0.5 started by
>Roman. I am just not sure why we call it stabilization and 2.0.5 and
>if incompatible and new features has already been committed to the branch.
>BTW as an illustration to my observations above.

View raw message