hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Fri, 31 Aug 2012 06:02:26 GMT
Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW but
at the price of the complete decoherence of the Apache Hadoop platform. For
all of us who have invested in the Apache Hadoop platform, how does this
benefit us? Certainly our interests seem to get little consideration with
this plan to just blow everything up tomorrow.

How does a downstream project that imports HDFS and MapReduce coordinate
the shared dependencies with those new projects? For, example Guava. One
could have a multi way library incompatibility problem; this has already
happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified 3
or 4 times just in the smoking ruins of "core". The obvious answer is: Once
these pieces are moving in different trajectories at different rates, end
users and downstream projects will be forced to negotiate with many
parties, and those parties explicitly wont care about the issues concerning
another, according to this discussion. YARN must have broken our
minicluster based MapReduce tests 5 times over the last year. HDFS took up
a certain version of Guava and this required us to refactor some code to
match that version. We had a coherent group of committers to assist us then
but that would go away. Proponents of the split seem to want exactly this
situation. BigTop was suggested as a vehicle for addressing that concern
but then explicitly rejected on this thread. A commercial vendor looking to
torpedo the ability of anyone to build something on Apache Hadoop directly
couldn't come up with a better plan, because only a full time operation can
be expected to have the resources to harmonize the pieces plus all of their
dependencies with build patches, code wrangling, testing, testing, testing.
Volunteer contributor and committer time is a precious gift. I wonder if
the many professional full time Hadoop devs voting here have lost sight of
this. Pushing your integration work downstream doesn't mean resources will
be there to pick it up. Downstream projects could be forced to reluctantly
abandon working with Apache releases for a commercial distribution such as
CDH, or the MapR platform. Or, they will be unable to move from a "known
good" combination in the face of a combinatorial explosion of dependency
changes, so their general utility to the end user steadily declines. Maybe
the consensus is that is acceptable, but I would find that kind of a sad
ending to this remarkable project.

On Friday, August 31, 2012, Devaraj Das wrote:

> Andrew's points are fair IMHO. In general, I think it makes sense to have
> the TLPs but we aren't there yet (as others have pointed out). I'd propose
> that we should think about the timelines (maybe an appropriate time is when
> we have Hadoop-2.0 GA'ed).
> On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote:
> > As a direct Apache software product consumer and sometimes contributor, I
> > also experienced firsthand the pain of the project splits. It was not
> > possible to build an installable release. It may have been many days or
> > weeks before that was cured by a re-merge. I gave up after burning too
> many
> > hours on it, went back to the 1.0 code base, and came back only after the
> > damage was repaired.
> >
> > It's also frustrating to hear, even if just one person's proposal, that
> we
> > have spent months preparing to stabilize our next production deployment
> > based on the 2.0 branch, with the expectation that it will be the new
> > stable, but now maybe 0.23 will be the new stable. 0.23 is quite
> backwards
> > in comparison and missing all of the critical HA HDFS work.
> >
> > This thread seems to be becoming a competition for which is the more
> > radical proposal to snatch defeat from the jaws of success.
> >
> > These proposals seem to be made with a total lack of care for the end
> user.
> >
> > From my point of view, things were going reasonably well until suddenly
> > there is this sudden turn into lunacy. I am positive this kind of
> > "foundation" / PMC / project / administrivia tinkering is what will
> > fragment or disband the Hadoop community of users and contributors, not
> > disagreements between committers. A Hadoop competitor couldn't be happer.
> >
> > On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko
> > <shv.hadoop@gmail.com>wrote:
> >
> >> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
> >> <chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> OK I lied and said I wouldn't reply :)
> >>
> >> Long thread. I just picked a random Chris's (as the initiator) email to
> >> reply.
> >>
> >> Chris,
> >> You are basically saying there's been a history of community problems
> >> in Hadoop project,
> >> and proposing a technical solution to split the project by replicating
> >> the source base under three new names,
> >> implying that this will solve the community problems we (the Hadoop
> >> community) are facing.
> >>
> >> I see several issues.
> >>
> >> 1. There are other ways to split the project.
> >> We essentially have a "natural" split of the project already in place.
> >> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk
> >> are in a sense competing projects by themselves, with own contributors
> >> and release cycles.
> >>
> >> 2. From technical (not community) viewpoint your "svn copy" is an ugly
> >> approach,
> >> as it creates a lot of code duplication and will result in a
> >> maintenance nightmare or / and
> >> will require many man-months to fix. My point is that you cannot
> >> neglect "technical issues" when you solve community problems.
> >>
> >> 3. I am as skeptical as Todd that the community problems will be
> >> solved by simply TLP-ing the three projects.
> >> Two years ago Hadoop was in crises as vendors were producing their own
> >> releases calling it Hadoop.
> >> I think this was solved, but "poor community behavior" and contentions
> >> remained, embrace them or not.
> >>
> >> 4. Having said the above, separating the projects seems reasonable.
> >> (See timing though)
> >> HDFS will inevitable have to inherit and maintain most of Common.
> >> Totally understand frustration of people who just put a huge effort
> >> into merging
> >> the sources back under common root.
> >>
> >> 5. Timing is important.
> >> Waiting until Hadoop 2 is stable as Arun suggested earlier would
> >> probably be too long.
> >> Doing it next week, without discussing and solving technical issue
> >> listed in the thread would be premature.
> >> I think Hadoop 0.23.3 release backed by Yahoo production has a
> >> potential to become
> >> the next stable version, letting the project to move ahead off the
> >> four year old code base.
> >> We should help that happen first, and do necessary preparations for
> >> the split in the mean time.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >
> >
> >
> > --
> > Best regards,
> >

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message