hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Date Tue, 01 Feb 2011 16:06:22 GMT
We will be proposing Howl as an Incubator project soon.


On Jan 31, 2011, at 7:44 PM, Jeff Hammerbacher wrote:

> Excellent news! Will you also make Howl, Oozie, and Yarn Apache  
> projects as
> well?
> On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
> <eric14@yahoo-inc.com>wrote:
>> Hi Folks,
>> I'm pleased to announce that after some reflection, Yahoo! has  
>> decided to
>> discontinue the  "The Yahoo Distribution of Hadoop" and focus on  
>> Apache
>> Hadoop.  We plan to remove all references to a Yahoo distribution  
>> from our
>> website (developer.yahoo.com/hadoop), close our github repo (
>> yahoo.github.com/hadoop-common) and focus on working more closely  
>> with the
>> Apache community.  Our intent is to return to helping Apache  
>> produce binary
>> releases of Apache Hadoop that are so bullet proof that Yahoo and  
>> other
>> production Hadoop users can run them unpatched on their clusters.
>> Until Hadoop 0.20, Yahoo committers worked as release masters to  
>> produce
>> binary Apache Hadoop releases that the entire community used on their
>> clusters.    As the community grew, we have experiment with using the
>> "Yahoo! Distribution of Hadoop" as the vehicle to share our work.
>> Unfortunately, Apache is no longer the obvious place to go for Hadoop
>> releases.  The Yahoo! team wants to return to a world where anyone  
>> can
>> download and directly use releases of Hadoop from Apache.  We want to
>> contribute to the stabilization and testing of those releases.  We  
>> also want
>> to share our regular program of sustaining engineering that  
>> backports minor
>> feature enhancements into new dot releases on a regular basis, so  
>> that the
>> world sees regular improvements coming from Apache every few  
>> months, not
>> years.
>> Recently the Apache Hadoop community has been very turbulent.  Over  
>> the
>> last few months we have been developing Hadoop enhancements in our  
>> internal
>> git repository while doing a complete review of our options. Our  
>> commitment
>> to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd) 
>> ,
>> but the future of the "Yahoo distribution of Hadoop" was far from  
>> clear.
>> We've concluded that focusing on Apache Hadoop is the way forward.   
>> We
>> believe that more focus on communicating our goals to the Apache  
>> Hadoop
>> community, and more willingness to compromise on how we get to  
>> those goals,
>> will help us get back to making Hadoop even better.
>> Unfortunately, we now have to sort out how to contribute several
>> person-years worth of work to Apache to let us unwind the Yahoo! git
>> repositories.  We currently run two lines of Hadoop development, our
>> sustaining program (hadoop-0.20-sustaining) and hadoop-future.
>> Hadoop-0.20-sustaining is the stable version of Hadoop we currently  
>> run on
>> Yahoo's 40,000 nodes.  It contains a series of fixes and  
>> enhancements that
>> are all backwards compatible with our "Hadoop 0.20 with security".   
>> It is
>> our most stable and high performance release of Hadoop ever.  We've  
>> expended
>> a lot of energy finding and fixing bugs in it this year. We have  
>> initiated
>> the process of contributing this work to Apache in the branch:
>> hadoop/common/branches/branch-0.20-security.  We've proposed  
>> calling this
>> the 20.100 release.  Once folks have had a chance to try this out  
>> and we've
>> had a chance to respond to their feedback, we plan to create 20.100  
>> release
>> candidates and ask the community to vote on making them Apache  
>> releases.
>> Hadoop-future is our new feature branch.  We are working on a set  
>> of new
>> features for Hadoop to improve its availability, scalability and
>> interoperability to make Hadoop more usable in mission critical  
>> deployments.
>> You're going to see another burst of email activity from us as we  
>> work to
>> get hadoop-future patches socialized, reviewed and checked in.   
>> These bulk
>> checkins are exceptional.  They are the result of us striving to be  
>> more
>> transparent.  Once we've merged our hadoop-future and hadoop-0.20- 
>> sustaining
>> work back into Apache, folks can expect us to return to our regular
>> development cadence.  Looking forward, we plan to socialize our  
>> roadmaps
>> regularly, actively synchronize our work with other active Hadoop
>> contributors and develop our code collaboratively, directly in  
>> Apache.
>> In summary, our decision to discontinue the "Yahoo! Distribution of  
>> Hadoop"
>> is a commitment to working more effectively with the Apache Hadoop
>> community.  Our goal is to make Apache Hadoop THE open source  
>> platform for
>> big data.
>> Thanks,
>> E14
>> --
>> PS Here is a draft list of key features in hadoop-future:
>> * HDFS-1052 - Federation, the ability to support much more storage  
>> per
>> Hadoop cluster.
>> * HADOOP-6728 - A the new metrics framework
>> * MAPREDUCE-1220 - Optimizations for small jobs
>> ---
>> PPS This is cross-posted on our blog: http://yhoo.it/i9Ww8W

View raw message