hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Fri, 31 Aug 2012 06:42:00 GMT
If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical
to develop end applications or downstream projects on, the community will
disappear. I don't follow your logic. I deal with the technical realities
of actually trying to use an Apache Hadoop distribution, the pieces
released as source from the ASF, directly in production, and your position
is dismissive if not hostile to my concerns as an end user. What
"community" do you mean then? Vendors? Academics? People who like to tinker
with things they can't actually use?

And you can't just hand waive that this will all work out if done RIGHT
NOW, especially with something as inelegant as a SVN copy.

On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote:

> Hi Andrew,
> How many new Apache Foundation *members* has the Hadoop PMC added over the
> past
> 3-4 years, and by whom (the answer to this question might surprise you)?
> The thing you and others continue not to see is that the ASF isn't about
> the
> most superior technical solutions, or the best refactorings to prevent
> Google Guava
> dependencies, the ASF is about *community* _over_ *code*.
> Period. The metrics that the Foundation and its members are interested in
> are
> the metrics that demonstrate the health of the project. Technical prowess
> and
> market-share are great, as are diverse, hungry, downstream user
> communities.
> But the ASF is here to create communities, communities that work together
> to
> develop code for public good at no charge to the public. Scope out Board
> resolutions to create projects and read the repetitive text in them --
> there's a
> pattern there that elucidates this.
> Also, the project members and community members here could slice and
> dice the project into 50 different Top Level Projects, but it doesn't mean
> that
> Hadoop would be at its "ending".
> Cheers,
> Chris
> On Aug 30, 2012, at 11:02 PM, Andrew Purtell wrote:
> > Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW
> but
> > at the price of the complete decoherence of the Apache Hadoop platform.
> For
> > all of us who have invested in the Apache Hadoop platform, how does this
> > benefit us? Certainly our interests seem to get little consideration with
> > this plan to just blow everything up tomorrow.
> >
> > How does a downstream project that imports HDFS and MapReduce coordinate
> > the shared dependencies with those new projects? For, example Guava. One
> > could have a multi way library incompatibility problem; this has already
> > happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified
> 3
> > or 4 times just in the smoking ruins of "core". The obvious answer is:
> Once
> > these pieces are moving in different trajectories at different rates, end
> > users and downstream projects will be forced to negotiate with many
> > parties, and those parties explicitly wont care about the issues
> concerning
> > another, according to this discussion. YARN must have broken our
> > minicluster based MapReduce tests 5 times over the last year. HDFS took
> up
> > a certain version of Guava and this required us to refactor some code to
> > match that version. We had a coherent group of committers to assist us
> then
> > but that would go away. Proponents of the split seem to want exactly this
> > situation. BigTop was suggested as a vehicle for addressing that concern
> > but then explicitly rejected on this thread. A commercial vendor looking
> to
> > torpedo the ability of anyone to build something on Apache Hadoop
> directly
> > couldn't come up with a better plan, because only a full time operation
> can
> > be expected to have the resources to harmonize the pieces plus all of
> their
> > dependencies with build patches, code wrangling, testing, testing,
> testing.
> > Volunteer contributor and committer time is a precious gift. I wonder if
> > the many professional full time Hadoop devs voting here have lost sight
> of
> > this. Pushing your integration work downstream doesn't mean resources
> will
> > be there to pick it up. Downstream projects could be forced to
> reluctantly
> > abandon working with Apache releases for a commercial distribution such
> as
> > CDH, or the MapR platform. Or, they will be unable to move from a "known
> > good" combination in the face of a combinatorial explosion of dependency
> > changes, so their general utility to the end user steadily declines.
> Maybe
> > the consensus is that is acceptable, but I would find that kind of a sad
> > ending to this remarkable project.
> >
> > On Friday, August 31, 2012, Devaraj Das wrote:
> >
> >> Andrew's points are fair IMHO. In general, I think it makes sense to
> have
> >> the TLPs but we aren't there yet (as others have pointed out). I'd
> propose
> >> that we should think about the timelines (maybe an appropriate time is
> when
> >> we have Hadoop-2.0 GA'ed).
> >>
> >> On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote:
> >>
> >>> As a direct Apache software product consumer and sometimes
> contributor, I
> >>> also experienced firsthand the pain of the project splits. It was not
> >>> possible to build an installable release. It may have been many days or
> >>> weeks before that was cured by a re-merge. I gave up after burning too
> >> many
> >>> hours on it, went back to the 1.0 code base, and came back only after
> the
> >>> damage was repaired.
> >>>
> >>> It's also frustrating to hear, even if just one person's proposal, that
> >> we
> >>> have spent months preparing to stabilize our next production deployment
> >>> based on the 2.0 branch, with the expectation that it will be the new
> >>> stable, but now maybe 0.23 will be the new stable. 0.23 is quite
> >> backwards
> >>> in comparison and missing all of the critical HA HDFS work.
> >>>
> >>> This thread seems to be becoming a competition for which is the more
> >>> radical proposal to snatch defeat from the jaws of success.
> >>>
> >>> These proposals seem to be made with a total lack of care for the end
> >> user.
> >>>
> >>> From my point of view, things were going reasonably well until suddenly
> >>> there is this sudden turn into lunacy. I am positive this kind of
> >>> "foundation" / PMC / project / administrivia tinkering is what will
> >>> fragment or disband the Hadoop community of users and
> contributors++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message