hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Severance, Steve" <ssevera...@ebay.com>
Subject RE: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Date Tue, 18 Jan 2011 18:53:32 GMT
I want to thank Yahoo! for this release. At eBay we are very excited about the opportunity
to test a build of Hadoop that has already been extensively field tested on large clusters.
At eBay we are primarily concerned with cluster availability and throughput so having a build
like this available to the community is a huge win.

Hats off to Arun, Eric and everyone at Yahoo! for releasing this.


-----Original Message-----
From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] 
Sent: Friday, January 14, 2011 10:25 AM
To: general@hadoop.apache.org
Cc: general@hadoop.apache.org
Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

Hi Ian,

Thanks for holding off on that last .5. I've been working in a big email giving move context
on this. Let me preview some issues. 

Our goal with this branch is two fold: 1) get the code out in a branch quickly so we an collaborate
on it with the community. 2) not change the character of the code. See testing below. We're
happy to compromise any other dimension, as long as we can do 1&2 above. 

1) I agree this is not a good precedent. We don't support mega-patches in general. We are
doing this as part of discontinuing the "yahoo distribution of Hadoop".  We don't plan to
continue doing 30 person year projects outside apache and then merging them in!!

2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work)
in trunk after giving up on the codeline you are suggesting we merge in. That work is what
distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append
code line. We've been hurt badly by it.  We did the rewrite only after losing a bunch of production
data a bunch of times with the previous code line.  I think the various 20 append patch lines
may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to
bet your business in them.

3) I think having a very stable recent codeline available for teams coming into Hadoop who
want to run big business apps and contribute code back is very helpful. I've been talking
to folks in other orgs and they've expressed a huge amount of interest in this work, but begged
us to put it into apache, so their oversight bodies will let them use it. 

4) we're happy to incorporate ideas into how to best merge the work into trunk. Let's find
the most cost effective way to preserve the most devel data possible. 

5) testing. Ian, I think you do us a disservice when you talk about us just testing in our
environments. If you look at the history of the project, we've been the force behind every
stable release of apache Hadoop.  And all the non-apache Hadoop release had been tracking
this patch set. We fully support the community developing independent testing capabilities.
 We plan to contribute to that effort.  But we are the organization with far and away the
best record for testing Hadoop. 

We are proud of thus release, we want to share it. Help us sort out how. 


E14 - via iPhone

On Jan 14, 2011, at 6:15 AM, "Ian Holsman" <hadoop@holsman.net> wrote:

> (with my Apache hat on)
> I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to
a series of smaller patches).
> for the following reasons:
> 1. It encourages bad behavior. We want discussion (and development) to happen on the
lists, not in some office. By allowing these large code-dumps it condones this behavior, and
we will likely see it again and again. Like it or not, this is not the apache model of open
source governance. 
> 2. There is a risk that some code that is not in a JIRA or separate patch creeps in unwittingly.
This isn't a major deal per se, but we don't really have the proper paper trail, or the documentation
on what bug it fixed etc etc.
> 3. Other groups (Facebook for example) are running with their own set of patches. They
currently have the luxury of examining each individual patch to decide if they want to integrate
it (and test it) in their environment. We are forcing them to do the work of finding the bits
they want in this huge patch.
> 4. By not including the append patch, we are making this release unusable for a large
portion of our community who run hbase.
> 5. It makes it very hard to test. While It makes me comfortable that it has gone through
Yahoo!'s QA and is running in their environments, it doesn't mean that it will work in other
organizations who have different workload mixes and software running on them. With one huge
patch it makes it all or nothing.. either they take the code-drop and perform a large QA-integration
effort, or they forgo the whole patch together.
> **BUT** we have both the Yahoo! & Cloudera guys happy to do it, and to spend their
time doing it.. so I think having the code-drop will put us in a better place then where we
> BTW, I'd like to point out a discrepancy here:
> On another thread discussing hadoop-0.20-append as a separate branch, most people agreed
that new features shouldn't be added to 0.20, now we have a major feature and we are all gung
ho for it.. 
> --Ian
> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote:
>> On Jan 13, 2011, at 10:59 PM, Stack wrote:
>>> (Man, it was looking good there for a second when 0.20.100 was about
>>> security+append!)
>>> Good luck w/ the release Arun.
>> Thanks!
>>> We might be following your 0.20.100 with a 0.20.200 append.
>> Super!
>> Arun

View raw message