hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: [DISCUSS] Branches and versions for Hadoop 3
Date Mon, 28 Aug 2017 20:03:53 GMT
+1 to Andrew’s proposal for 3.x releases.

We had fairly elaborate threads on this branching & compatibility topic before. One of
them’s here: [1]

+1 to what Jason said.
 (a) Incompatible changes are not to be treated lightly.  We need to stop breaking stuff and
‘just dump it on trunk'.
 (b) Major versions are expensive. We should hesitate before asking our users to move from
2.0 to 3.0 or 3.0 to 4.0 (with incompatible changes) *without* any other major value proposition.

Some of the incompatible changes can clear wait while others cannot and so may mandate a major
release. What are the some of the common types of incompatible changes?
 - Renaming APIs, removing deprecated APIs, renaming configuration properties, changing the
default value of a configuration, changing shell output / logging etc:
    — Today, we do this on trunk even though the actual effort involved is very minimal
compared to the overhead it forces in maintaining incompatible trunk.
 - Dependency library updates - updating guava, protobuf etc in Hadoop breaks upstreaming
applications. I am assuming Classpath Isolation [2] is still a blocker for 3.0 GA.
 - JDK upgrades: We tried two different ways with JDK 7 and JDK 8, we need a formal policy
on this.

If we can managing the above common breaking changes, we can cause less pain to our end users.

Here’s what we can do for 3.x / 4.x specifically.
 - Stay on trunk based 3.x releases
 - Avoid all incompatible changes as much as possible
 - If we run into a bunch of minor incompatible changes that have be done, we either (a) make
the incompatible behavior optional or (b) just park them say with an parked-incompatible-change
label if making it optional is not possible
 - We create a 4.0 only when (a) we hit the first major incompatible change because a major
next-step for Hadoop needs it (for e.g. Erasure Coding), and/or (b) the number of parked incompatible
changes passes a certain threshold. Unlike Jason, I don’t see the threshold to be 1 for
cases that don’t fit (1).

References
 [1] Looking to a Hadoop 3 release: http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results
<http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results>
 [2] Classpath isolation for downstream client: https://issues.apache.org/jira/browse/HADOOP-11656
<https://issues.apache.org/jira/browse/HADOOP-11656>

Thanks
+Vinod

> On Aug 25, 2017, at 1:23 PM, Jason Lowe <jlowe@oath.com.INVALID> wrote:
> 
> Allen Wittenauer wrote:
> 
> 
>> Doesn't this place an undue burden on the contributor with the first
>> incompatible patch to prove worthiness?  What happens if it is decided that
>> it's not good enough?
> 
> 
> It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
> 
> I do not think it makes sense to pay for the maintenance overhead of two
> nearly-identical lines with no backwards-incompatible changes between them
> until we have the need.  Otherwise if past trunk behavior is any
> indication, it ends up mostly enabling people to commit to just trunk,
> forgetting that the thing they are committing is perfectly valid for
> branch-3.  If we can agree that trunk and branch-3 should be equivalent
> until an incompatible change goes into trunk, why pay for the commit
> overhead and potential for accidentally missed commits until it is really
> necessary?
> 
> How many will it take before the dam will break?  Or is there a timeline
>> going to be given before trunk gets set to 4.x?
> 
> 
> I think the threshold count for the dam should be 1.  As soon as we have a
> JIRA that needs to be committed to move the project forward and we cannot
> ship it in a 3.x release then we create branch-3 and move trunk to 4.x.
> As for a timeline going to 4.x, again I don't see it so much as a "baking
> period" as a "when we need it" criteria.  If we need it in a week then we
> should cut it in a week.  Or a year then a year.  It all depends upon when
> that 4.x-only change is ready to go in.
> 
> Given the number of committers that openly ignore discussions like this,
>> who is going to verify that incompatible changes don't get in?
>> 
> 
> The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.  Yes, I know that means
> it's inevitable that compatibility breakages will happen, and we can and
> should improve the automation around compatibility testing when possible.
> But I don't think there's a magic bullet for preventing all compatibility
> bugs from being introduced, just like there isn't one for preventing
> general bugs.  Does having a trunk branch separate but essentially similar
> to branch-3 make this any better?
> 
> Longer term:  what is the PMC doing to make sure we start doing major
>> releases in a timely fashion again?  In other words, is this really an
>> issue if we shoot for another major in (throws dart) 2 years?
>> 
> 
> If we're trying to do semantic versioning then we shouldn't have a regular
> cadence for major releases unless we have a regular cadence of changes that
> break compatibility.  I'd hope that's not something we would strive
> towards.  I do agree that we should try to be better about shipping
> releases, major or minor, in a more timely manner, but I don't agree that
> we should cut 4.0 simply based on a duration since the last major release.
> The release contents and community's desire for those contents should
> dictate the release numbering and schedule, respectively.
> 
> Jason
> 
> 
> On Fri, Aug 25, 2017 at 2:16 PM, Allen Wittenauer <aw@effectivemachines.com>
> wrote:
> 
>> 
>>> On Aug 25, 2017, at 10:36 AM, Andrew Wang <andrew.wang@cloudera.com>
>> wrote:
>> 
>>> Until we need to make incompatible changes, there's no need for
>>> a Hadoop 4.0 version.
>> 
>> Some questions:
>> 
>>        Doesn't this place an undue burden on the contributor with the
>> first incompatible patch to prove worthiness?  What happens if it is
>> decided that it's not good enough?
>> 
>>        How many will it take before the dam will break?  Or is there a
>> timeline going to be given before trunk gets set to 4.x?
>> 
>>        Given the number of committers that openly ignore discussions like
>> this, who is going to verify that incompatible changes don't get in?
>> 
>>        Longer term:  what is the PMC doing to make sure we start doing
>> major releases in a timely fashion again?  In other words, is this really
>> an issue if we shoot for another major in (throws dart) 2 years?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message