hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Papaioannou <to...@yahoo-inc.com>
Subject Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Date Tue, 01 Feb 2011 22:02:51 GMT
Yes. We have been and continue to be firm believers in Apache and the value of Open Source
software, as you can see from our track record to date of contributing heavily to Hadoop and
donating Pig, ZooKeeper, Avro, etc. We are excited about their potential and we hope others
will find them useful too.


On 1/31/11 7:44 PM, "Jeff Hammerbacher" <hammer@cloudera.com<mailto:hammer@cloudera.com>>

Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as

On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler

Hi Folks,

I'm pleased to announce that after some reflection, Yahoo! has decided to
discontinue the  "The Yahoo Distribution of Hadoop" and focus on Apache
Hadoop.  We plan to remove all references to a Yahoo distribution from our
website (developer.yahoo.com/hadoop), close our github repo (
yahoo.github.com/hadoop-common) and focus on working more closely with the
Apache community.  Our intent is to return to helping Apache produce binary
releases of Apache Hadoop that are so bullet proof that Yahoo and other
production Hadoop users can run them unpatched on their clusters.

Until Hadoop 0.20, Yahoo committers worked as release masters to produce
binary Apache Hadoop releases that the entire community used on their
clusters.    As the community grew, we have experiment with using the
"Yahoo! Distribution of Hadoop" as the vehicle to share our work.
  Unfortunately, Apache is no longer the obvious place to go for Hadoop
releases.  The Yahoo! team wants to return to a world where anyone can
download and directly use releases of Hadoop from Apache.  We want to
contribute to the stabilization and testing of those releases.  We also want
to share our regular program of sustaining engineering that backports minor
feature enhancements into new dot releases on a regular basis, so that the
world sees regular improvements coming from Apache every few months, not

Recently the Apache Hadoop community has been very turbulent.  Over the
last few months we have been developing Hadoop enhancements in our internal
git repository while doing a complete review of our options. Our commitment
to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
but the future of the "Yahoo distribution of Hadoop" was far from clear.
  We've concluded that focusing on Apache Hadoop is the way forward.  We
believe that more focus on communicating our goals to the Apache Hadoop
community, and more willingness to compromise on how we get to those goals,
will help us get back to making Hadoop even better.

Unfortunately, we now have to sort out how to contribute several
person-years worth of work to Apache to let us unwind the Yahoo! git
repositories.  We currently run two lines of Hadoop development, our
sustaining program (hadoop-0.20-sustaining) and hadoop-future.
  Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on
Yahoo's 40,000 nodes.  It contains a series of fixes and enhancements that
are all backwards compatible with our "Hadoop 0.20 with security".  It is
our most stable and high performance release of Hadoop ever.  We've expended
a lot of energy finding and fixing bugs in it this year. We have initiated
the process of contributing this work to Apache in the branch:
hadoop/common/branches/branch-0.20-security.  We've proposed calling this
the 20.100 release.  Once folks have had a chance to try this out and we've
had a chance to respond to their feedback, we plan to create 20.100 release
candidates and ask the community to vote on making them Apache releases.

Hadoop-future is our new feature branch.  We are working on a set of new
features for Hadoop to improve its availability, scalability and
interoperability to make Hadoop more usable in mission critical deployments.
You're going to see another burst of email activity from us as we work to
get hadoop-future patches socialized, reviewed and checked in.  These bulk
checkins are exceptional.  They are the result of us striving to be more
transparent.  Once we've merged our hadoop-future and hadoop-0.20-sustaining
work back into Apache, folks can expect us to return to our regular
development cadence.  Looking forward, we plan to socialize our roadmaps
regularly, actively synchronize our work with other active Hadoop
contributors and develop our code collaboratively, directly in Apache.

In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop"
is a commitment to working more effectively with the Apache Hadoop
community.  Our goal is to make Apache Hadoop THE open source platform for
big data.




PS Here is a draft list of key features in hadoop-future:

* HDFS-1052 - Federation, the ability to support much more storage per
Hadoop cluster.

* HADOOP-6728 - A the new metrics framework

* MAPREDUCE-1220 - Optimizations for small jobs

PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message