hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <had...@holsman.net>
Subject Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Date Tue, 01 Feb 2011 14:04:15 GMT
Congratulations Eric.
this is fantastic news.
On Jan 31, 2011, at 10:27 PM, Eric Baldeschwieler wrote:

> Hi Folks,
> I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue
the  "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop.  We plan to remove all
references to a Yahoo distribution from our website (developer.yahoo.com/hadoop), close our
github repo (yahoo.github.com/hadoop-common) and focus on working more closely with the Apache
community.  Our intent is to return to helping Apache produce binary releases of Apache Hadoop
that are so bullet proof that Yahoo and other production Hadoop users can run them unpatched
on their clusters.
> Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary Apache
Hadoop releases that the entire community used on their clusters.    As the community grew,
we have experiment with using the "Yahoo! Distribution of Hadoop" as the vehicle to share
our work.  Unfortunately, Apache is no longer the obvious place to go for Hadoop releases.
 The Yahoo! team wants to return to a world where anyone can download and directly use releases
of Hadoop from Apache.  We want to contribute to the stabilization and testing of those releases.
 We also want to share our regular program of sustaining engineering that backports minor
feature enhancements into new dot releases on a regular basis, so that the world sees regular
improvements coming from Apache every few months, not years.
> Recently the Apache Hadoop community has been very turbulent.  Over the last few months
we have been developing Hadoop enhancements in our internal git repository while doing a complete
review of our options. Our commitment to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
but the future of the "Yahoo distribution of Hadoop" was far from clear.  We've concluded
that focusing on Apache Hadoop is the way forward.  We believe that more focus on communicating
our goals to the Apache Hadoop community, and more willingness to compromise on how we get
to those goals, will help us get back to making Hadoop even better.
> Unfortunately, we now have to sort out how to contribute several person-years worth of
work to Apache to let us unwind the Yahoo! git repositories.  We currently run two lines of
Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future.  Hadoop-0.20-sustaining
is the stable version of Hadoop we currently run on Yahoo's 40,000 nodes.  It contains a series
of fixes and enhancements that are all backwards compatible with our "Hadoop 0.20 with security".
 It is our most stable and high performance release of Hadoop ever.  We've expended a lot
of energy finding and fixing bugs in it this year. We have initiated the process of contributing
this work to Apache in the branch: hadoop/common/branches/branch-0.20-security.  We've proposed
calling this the 20.100 release.  Once folks have had a chance to try this out and we've had
a chance to respond to their feedback, we plan to create 20.100 release candidates and ask
the community to vote on making them Apache releases. 
> Hadoop-future is our new feature branch.  We are working on a set of new features for
Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable
in mission critical deployments. You're going to see another burst of email activity from
us as we work to get hadoop-future patches socialized, reviewed and checked in.  These bulk
checkins are exceptional.  They are the result of us striving to be more transparent.  Once
we've merged our hadoop-future and hadoop-0.20-sustaining work back into Apache, folks can
expect us to return to our regular development cadence.  Looking forward, we plan to socialize
our roadmaps regularly, actively synchronize our work with other active Hadoop contributors
and develop our code collaboratively, directly in Apache.
> In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is a commitment
to working more effectively with the Apache Hadoop community.  Our goal is to make Apache
Hadoop THE open source platform for big data.
> Thanks,
> E14
> --
> PS Here is a draft list of key features in hadoop-future:
> * HDFS-1052 - Federation, the ability to support much more storage per Hadoop cluster.
> * HADOOP-6728 - A the new metrics framework
> * MAPREDUCE-1220 - Optimizations for small jobs
> ---
> PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W

View raw message