hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giridharan Kesavan <gkesa...@hortonworks.com>
Subject Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream
Date Wed, 06 Mar 2013 06:05:02 GMT
Thanks Bobby. I 've setup a unit test
job<https://builds.apache.org/view/Hadoop/job/Hadoop-Common-2-Build/>to
execute unit test's on branch-2 on a daily basis.
I'm happy to help with build setup. Let me know.

-Giri


On Tue, Mar 5, 2013 at 9:02 PM, Konstantin Boudnik <cos@apache.org> wrote:

> Great start, Bobby! I certainly can jump on fix something quickly if
> needed as
> well (neither an RE person, but CI is truly a dev. tool!)
>
> Thanks!
>   Cos
>
> On Tue, Mar 05, 2013 at 07:18AM, Robert Evans wrote:
> > That is a great point.  I have been meaning to set up the Jenkins build
> > for branch-2 for a while, so I took the 10 mins and just did it.
> >
> > https://builds.apache.org/job/Hadoop-Common-2-Commit/
> >
> > Don't let the name fool you, it publishes not just common, but HDFS,
> YARN,
> > MR, and tools too.  You should now have branch-2 SNAPSHOTS updated on
> each
> > commit to branch-2.  Feel free to bug me if you need more integration
> > points.  I am not an RE guy, but I can hack it to make things work :)
> >
> > --Bobby
> >
> > On 3/5/13 12:15 AM, "Konstantin Boudnik" <cos@apache.org> wrote:
> >
> > >Arun,
> > >
> > >first of all, I don't think anyone is trying to put a blame on someone
> > >else. E.g. I had similar experience with Oozie being broken because of
> > >certain released changes in the upstream.
> > >
> > >I am sure that most people in BigTop community - especially those who
> > >share the committer-ship privilege in BigTop and other upstream
> > >projects, including Hadoop, - would be happy to help with the
> > >stabilization of the Hadoop base. The issue that a downstream
> > >integration project is likely to have is - for once - the absence of
> > >regularly published development artifacts. In the light of "it didn't
> > >happen if there's no picture" here's a couple of examples:
> > >
> > >  - 2.0.2-SNAPSHOT weren't published at all; only release 2.0.2-alpha
> > >artifacts were
> > >  - 2.0.3-SNAPSHOT weren't published until Feb 29, 2013 (it happened
> just
> > >once)
> > >
> > >So, technically speaking, unless an integration project is willing to
> > >build and maintain its own artifacts, it is impossible to do any
> > >preventive validation.
> > >
> > >Which brings me to my next question: how do you guys address
> > >"Integration is high on the list of *every* release". Again, please
> > >don't get me wrong - I am not looking to lay a blame on or corner
> > >anyone - I am really curious and would appreciate the input.
> > >
> > >
> > >Vinod:
> > >
> > >> As you yourself noted later, the pain is part of the 'alpha' status
> > >> of the release. We are targeting +one of the immediate future
> > >> releases to be a beta and so these troubles are really only the
> > >> short +term.
> > >
> > >I don't really want to get into the discussion about of what
> > >constitutes the alpha and how it has delayed the adoption of Hadoop2
> > >line. However, I want to point out that it is especially important for
> > >"alpha" platform to work nicely with downstream consumers of the said
> > >platform. For quite obvious reasons, I believe.
> > >
> > >> I think there is a fundamental problem with the interaction of
> > >> Bigtop with the downstream projects, if nothing else, with
> > >
> > >BigTop is as downstream as it can get, because BigTop essentially
> > >consumes all other component releases in order to produce a viable
> > >stack. Technicalities aside...
> > >
> > >> Hadoop. We never formalized on the process, will BigTop step in
> > >> after an RC is up for vote or before? As I see it, it's happening
> > >
> > >Bigtop essentially can give any component, including Hadoop, and
> > >better yet - the set of components - certain guaratees about
> > >compatibility and dependencies being included. Case in point is
> > >missing commons libraries missed in 1.0.1 release that essentially
> > >prevented HBase from working properly.
> > >
> > >> after the vote is up, so no wonder we are in this state. Shall we
> > >> have a pre-notice to Bigtop so that it can step in before?
> > >
> > >The above is in contradiction with earlier statement of "Integration
> > >is high on the list of *every* release". If BigTop isn't used for
> > >integration testing, then how said integration testing is performed?
> > >Is it some sort of test-patch process as Luke referred earlier?  And
> > >why it leaves the room for the integration issues being uncaught?
> > >Again, I am genuinely interested to know.
> > >
> > >> these short term pains. I'd rather like us swim through these now
> > >> instead of support broken APIs and features in our beta, having seen
> > >> this very thing happen with 1.*.
> > >
> > >I think you're mixing the point of integration with downstream and
> > >being in an alpha phase of the development. The former isn't about
> > >supporting "broken APIs" - it is about being consistent and avoid
> > >breaking the downstream applicaitons without letting said applications
> > >to accomodate the platform changes first.
> > >
> > >Changes in the API, after all, can be relatively easy traced by
> > >integration validation - this is the whole point of integration
> > >testing. And BigTop does the job better then anything around, simply
> > >because there's nothing else around to do it.
> > >
> > >If you stay in shape-shifting "alpha" that doesn't integrate well for
> > >a very long time, you risk to lose downstream customers' interest,
> > >because they might get tired of waiting until a next stable API will
> > >be ready for them.
> > >
> > >> Let's fix the way the release related communication is happening
> > >> across our projects so that we can all work together and make 2.X a
> > >> success.
> > >
> > >This is a very good point indeed! Let's start a separate discussion
> > >thread on how we can improve the release model for coming Hadoop
> > >releases, where we - as the community - can provide better guarantees
> > >of the inter-component compatibility (sorry for an overused word).
> > >
> > >Cos
> > >
> > >On Fri, Mar 01, 2013 at 10:58AM, Arun C Murthy wrote:
> > >> I feel this is being blown out of proportion.
> > >>
> > >> Integration is high on the list of *every* release. In future, if
> > >>anyone or
> > >> bigtop wants to help, running integration tests on a hadoop RC and
> > >>providing
> > >> feedback would be very welcome. I'm pretty sure I will stop an RC if
> it
> > >> means it breaks and Oozie or HBase or Pig or Hive and re-spin it. For
> > >>e.g.
> > >> see recent efforts to do a 2.0.4-alpha.
> > >>
> > >> With hadoop-2.0.3-alpha we discovered 3 *bugs* - making it sound like
> we
> > >> intentionally disregard integation issues is very harsh.
> > >>
> > >> Please also see other thread where we discussed stabilizing APIS,
> > >>protocols
> > >> etc. for the next 'beta' release.
> > >>
> > >> Arun
> > >>
> > >> On Feb 26, 2013, at 5:43 PM, Roman Shaposhnik wrote:
> > >>
> > >> > Hi!
> > >> >
> > >> > for the past couple of releases of Hadoop 2.X code line the issue
> > >> > of integration between Hadoop and its downstream projects has
> > >> > become quite a thorny issue. The poster child here is Oozie, where
> > >> > every release of Hadoop 2.X seems to be breaking the compatibility
> > >> > in various unpredictable ways. At times other components (such
> > >> > as HBase for example) also seem to be affected.
> > >> >
> > >> > Now, to be extremely clear -- I'm NOT talking about the *latest*
> > >>version
> > >> > of Oozie working with the *latest* version of Hadoop, instead
> > >> > my observations come from running previous *stable*  releases
> > >> > of Bigtop on top of Hadoop 2.X RCs.
> > >> >
> > >> > As many of you know Apache Bigtop aims at providing a single
> > >> > platform for integration of Hadoop and Hadoop ecosystem projects.
> > >> > As such we're uniquely positioned to track compatibility between
> > >> > different Hadoop releases with regards to the downstream components
> > >> > (things like Oozie, Pig, Hive, Mahout, etc.). Every single single
RC
> > >> > we've been pretty diligent at trying to provide integration-level
> > >>feedback
> > >> > on the quality of the upcoming release,  but it seems that our
> efforts
> > >> > don't quite suffice in Hadoop 2.X stabilizing.
> > >> >
> > >> > Of course, one could argue that while Hadoop 2.X code line was
> > >> > designated 'alpha' expecting much in the way of perfect integration
> > >> > and compatibility was NOT what the Hadoop community was
> > >> > focusing on. I can appreciate that view, but what I'm interested in
> > >> > is the future of Hadoop 2.X not its past. Hence, here's my question
> > >> > to all of you as a Hadoop community at large:
> > >> >
> > >> > Do you guys think that the project have reached a point where
> > >>integration
> > >> > and compatibility issues should be prioritized really high on the
> list
> > >> > of things that make or break each future release?
> > >> >
> > >> > The good news, is that Bigtop's charter is in big part *exactly*
> about
> > >> > providing you with this kind of feedback. We can easily tell you
> when
> > >> > Hadoop behavior, with regard to downstream components, changes
> > >> > between a previous stable release and the new RC (or even
> > >>branch/trunk).
> > >> > What we can NOT do is submit patches for all the issues. We are
> simply
> > >> > too small a project and we need your help with that.
> > >> >
> > >> > I truly believe that we owe it to the downstream projects, and in
> the
> > >> > second half of this email I will try to convince you of that.
> > >> >
> > >> > We all know that integration projects are impossible to pull off
> > >> > unless there's a general consensus between all of the projects
> > >>involved
> > >> > that they indeed need to work with each other. You can NOT force
> > >> > that notion, but you can always try to influence. This relationship
> > >> > goes both ways.
> > >> >
> > >> > Consider a question in front of the downstream communities
> > >> > of  whether or not to adopt Hadoop 2.X as the basis. To answer
> > >> > that question each downstream project has to be reasonably
> > >> > sure that their concerns will NOT fall on deaf ears and that
> > >> > Hadoop developers are, essentially, 'ready' for them to pick
> > >> > up Hadoop 2.X. I would argue that so far the Hadoop community
> > >> > had gone out of its way to signal that 2.X codeline is NOT
> > >> > ready for the downstream.
> > >> >
> > >> > I would argue that moving forward this is a really unfortunate
> > >> > situation that may end up undermining the long term success
> > >> > of Hadoop 2.X if we don't start addressing the problem. Think
> > >> > about it -- 90% of unit tests that run downstream on Apache
> > >> > infrastructure are still exercising Hadoop 1.X underneath.
> > >> > In fact, if you were to forcefully make, lets say, HBase's
> > >> > unit tests run on top of Hadoop 2.X quite a few of them
> > >> > are going to fail. Hadoop community is, in effect, cutting
> > >> > itself off from the biggest source of feedback -- its downstream
> > >> > users. This in turn:
> > >> >
> > >> >   * leaves Hadoop project in a perpetual state of broken
> > >> >     windows syndrome.
> > >> >
> > >> >   * leaves Apache Hadoop 2.X releases in a state considerably
> > >> >     inferior to the releases *including* Apache Hadoop done by the
> > >> >     vendors. The users have no choice but to alight themselves
> > >> >     with vendor offerings if they wish to utilize latest Hadoop
> > >>functionality.
> > >> >     The artifact that is know as Apache Hadoop 2.X stopped being
> > >> >     a viable choice thus fracturing the user community and reducing
> > >> >     the benefits of a commonly deployed codebase.
> > >> >
> > >> >    * leaves downstream projects of Hadoop  in a jaded state where
> > >> >      they legitimately get very discouraged and frustrated and
> > >>eventually
> > >> >      give up thinking that -- well, we work with one release of
> Hadoop
> > >> >      (the stable one Hadoop 1.X) and we shall wait for the Hadoop
> > >> >      community to get their act together.
> > >> >
> > >> > In my view (shared by quite a few members of the Apache Bigtop) we
> > >> > can definitely do better than this if we all agree that the proposed
> > >> > first 'beta' release of Hadoop 2.0.4 is the right time for it to
> > >>happen.
> > >> >
> > >> > It is about time Hadoop 2.X community wins back all those end users
> > >> > and downstream projects that got left behind during the alpha
> > >> > stabilization phase.
> > >> >
> > >> > Thanks,
> > >> > Roman.
> > >>
> > >> --
> > >> Arun C. Murthy
> > >> Hortonworks Inc.
> > >> http://hortonworks.com/
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message