hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Streamlining the Hadoop release process
Date Mon, 29 Apr 2013 18:53:45 GMT
Sorry I have not responded sooner.  I feel like I have had nothing but
meetings for the past two weeks or so :) My responses are inline below.

On 4/25/13 5:06 PM, "Konstantin Shvachko" <shv.hadoop@gmail.com> wrote:

>Bobby,
>
>Thanks for sharing your opinion.
>
>On Thu, Apr 25, 2013 at 8:28 AM, Robert Evans <evans@yahoo-inc.com> wrote:
>>
>> I like the idea.  +1
>>
>> I would just make a few tweaks to it, but I am not married to the tweaks
>> and would be happy to adopt the proposal as is. I would prefer a trunk
>> release every quarter.  I am fine if a release manager wants to release
>> more often, but I think a regular release cadence makes it simpler for
>> downstream projects and customers to plan for testing and integrating
>>with
>> these new releases.
>
>Well put.
>
>> I would also like to have a formal process for adopting long term
>> stabilization branches and for retiring them.
>
>What do you mean by that?
>Branches are adopted if a group of people decides to use and support
>one. Just the way it happened with branch 0.23, but did not with
>0.21 or 0.22. The latter naturally retired, not to say it was a lost
>cause.
>I don't think there is a way to tell people to use or not one branch or
>another.
>So the purpose of the community as I see it is to create the ground
>for stabilizing the releases we produce.

I agree that we cannot say "you are not allowed to use branch-X", even for
very old releases.  It is just nice if the community can agree to some
degree about what branches we are going to receive bug fixes/support long
term, and which branches are not.  It would be very nice if we could
reduce the number of branches being used, but some sort of minor
commitment from the community that serious bugs on branches X and Y Š are
going to be fixed.  Maybe we don't have to worry about it because the
distribution vendors will provide their own long term support.

>
>> The problem with branch-0.23
>> is that it really is just being stabilized by one group.  This in and of
>> itself isn't bad but if the majority of the community is on the same
>>base
>> it will stabilize more quickly, and downstream projects/customers can
>>know
>> that for the next N-months this branch will be available and receive
>> bug/security fixes.  At the end of N-months the community can decide to
>> continue supporting the branch for another period of time or to stop.
>> That way customers can have confidence that when they adopt a branch
>>they
>> know how long it will be around for.
>
>True. Good or bad this is how the community works.
>Anybody can build and propose a release or a change.
>So as anybody can drop a feature in the middle of implementation and
>never come back. Don't even need a vote on that.
>That is why we should develop them in branches and release upon
>completion.
>So that our customers could be at least sure that the branch they picked
>up
>is not going to change and they can contribute to its stabilization as
>they
>go.
>
>It is really great that Yahoo team (with you as RM) yet again helped
>Hadoop move forward. Not talking about 0.23 only here.


OK to clarify here I am -0 on marking 0.23.7 stable. And this is my
personal opinion here (no Yahoo! hat involved)

>
>> Also I would not really want 0.23.7 to be marked as the latest stable
>> release, unless
>
>Not sure I understood. Are you saying 0.23.7 could be marked stable
>under the three conditions. What are they?
>
>> 1) the community decides that we will maintain it going forward for at
>> least 6 to 12 more months
>
>Are you asking for help or are you concerned that 0.23 branch will
>somehow disappear? Don't think the latter is possible.
>Look at branch 1, previously known as 0.20. It's been there for
>many years.

This depends on how quickly Yahoo will be off of branch-0.23 and if Yahoo
is the only one maintaining it then it will not receive any bug fixes.
Does that make branch-0.23 any less stable, no.  Does it make it less
desirable for people to move to? Yes.

>
>> 2) we have a nice long explanation about why 0.23.X is actually a higher
>> release than 1.X is and
>
>I know people are confused about version numbers.
>
>> 3) 0.23 gets the changes to the YARN APIs back ported to it.
>
>Interesting, so you want your customers to keep updating their
>applications to adapt to new APIs? Usually it is the other way around.

We stabilized 0.23 with the primarily goal of stabilizing Map Reduce
running on YARN.  I don't have a lot of visibility into other applications
that are using YARN.  I don't really want YARN applications to need to put
in hacks like pig and hive have to do in order to run on different
versions.  But if others feel strongly about this I am fine with relenting.

>
>> The 1.X line has support from several organizations to maintain it for
>>at
>> least the next year or so and there is no need to confuse people about
>> version numbering.  (Putting on my Yahoo! hat now) we at Yahoo! are
>>still
>> trying to figure out the exact time frame as to when and how we can move
>> off of 0.23 and on to 2.0. I don't know if we are going to be supporting
>> 0.23 beyond critical bug fixes nor really for how long we would be doing
>> that.  It will probably be for the next 6 months or so, but I really
>>don't
>> know.  It depends on how fast 2.0 becomes stable.
>> --Bobby
>
>I hope that moving 0.23 to stable will accelerate stabilization of branch
>2.
>
>Thanks,
>Konstantin
>
>>
>> On 4/24/13 10:17 PM, "Konstantin Shvachko" <shv.hadoop@gmail.com> wrote:
>>
>> > Hi everybody,
>> >
>> >There was and is a number of discussions about Hadoop version
>> >compatibility, feature porting, stability. I think that many problems
>>of
>> >Hadoop are the result of our flawed release processes and can be solved
>by
>> >streamlining the releases.
>> >
>> >It is a fact that current trunk turned into a junk-yard of partly
>> >implemented ideas at different stages of the development. Saying this
>>not
>> >because trunk should not evolve, but mostly because there are no any
>plans
>> >on the horizon to release anything from trunk and therefore nobody
>>cares.
>> >
>> >Thus now in order for A feature to make into A release the former
>>should
>> >be
>> >ported into an earlier branch. And that becomes a controversy because
>> >
>> >a) most major features contain incompatible changes, and
>> >b) they introduce massive changes, which break reliability
>> >
>> >This destabilizes the release canceling former efforts to fix bugs and
>> >provide working environment for the upstream projects. I mean
>>stabilizing
>> >and adding features are mutually exclusive activities. This is in part
>why
>> >Hadoop 2 stabilization effort is perpetual.
>> >
>> >The solution imho is to release instead of backporting. That is,
>>produce
>a
>> >new release for every or a few new major feature. Say, snapshots would
>>be
>> >a
>> >new release, Windows support another, InodeIDs or local reads
>optimization
>> >would have been the next, etc. As the community we should release even
>>if
>> >that particular release is not planned to be stable, which we do
>>anyway.
>> >
>> >Stabilization requires meticulous work and rather stable code base.
>>This
>> >was done for Hadoop 1 and Hadoop 0.23 as the most recent examples.
>> >In fact we cannot predict in advance which branch will become stable
>until
>> >it is somewhere in production, which is beyond the scope of control of
>> >this
>> >community.
>> >
>> >My practical suggestions are:
>> >1. Produce a series of feature releases to catch up branch-2 with
>>trunk.
>> >   We can prioritize features in general or let the release manager to
>> >decide which feature to pick up from trunk.
>> >   Version numbering is also up for discussion. I would call them 2.x
>>and
>> >reserve the minor numbers for subsequent stabilization
>>bug-fix-releases.
>> >2. Build new features in dev-branches until they are done. We do it
>>now,
>> >but should enforce more.
>> >3. After branch-2 is caught up with trunk release from trunk by
>>merging a
>> >few new dev-branches. Releasing with 2-3 new features should be the
>golden
>> >rule.
>> >   I would call these releases 3.x
>> >4. Disallow backporting features that have not been released from
>>trunk.
>> >   This is VERY important for forward going release process.
>> >5. I'd like to propose to move the latest stable version from 1.0.X to
>> >0.23.7.
>> >   I think that running a release in production for 9 months on 40K
>>nodes
>> >is consistent with the definition of stable.
>> >
>> >Some ideas for discussion.
>> >Thanks,
>> >--Konstantin
>> >
>> >P.S. This is not to preempt discussion on stabilizing 2.0.5 started by
>> >Roman. I am just not sure why we call it stabilization and 2.0.5 and
>> >betta,
>> >if incompatible and new features has already been committed to the
>branch.
>> >BTW as an illustration to my observations above.
>>


Mime
View raw message