hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Bhandarkar <mbhandar...@linkedin.com>
Subject Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Date Fri, 14 Jan 2011 19:59:52 GMT

While I do not think that the releasability of a branch should be determined by the market-cap
(either on nasdaq or second-market) of the contributing company, I think a well-tested release
is beneficial to the community.

So, I support two releases: 20.100 now, that has security. And 20.200 later that incorporates
appends (depending on the 0.22+appends timeline). That way, a large percentage of community
is covered in 2011.

The reasons are these:

1. The proposed 20.100 is perhaps the most tested at scale, out of all 0.20 branches. In fact,
among *all* hadoop releases in last 5 years. I know first hand that it causes the least disruption
for users, the migration from 0.20 to 0.20.10x was the smoothest, while adding a valuable

2. HBase (running on hadoop 0.20 with append) has also been scale tested at Y!, but on much
less than 4000 nodes, and certainly not for varied workloads (where the bugs tend to surface).
(To my knowledge, the largest HBase instance is at Y! in production.)

3. Operations folks need to get some experience with raw hadoop first for any release, before
other products on top of hadoop, and then handover the installation to users. So, there is
still time for HBase+0.20.100, and that can be addressed in a separate release.

4. It is not as if the community hasn't had a preview of this mega-patch already. A large
portion of the sub-patches are already in cdh3bx, and many of them have already been committed
one-by-one to 0.22.

- Milind

On Jan 14, 2011, at 11:24 AM, Dhruba Borthakur wrote:

>> 1) I agree this is not a good precedent. We don't support mega-patches in
>> general. We are doing this as part of discontinuing the "yahoo distribution
>> of Hadoop".  We don't plan to continue doing 30 person year projects outside
>> apache and then merging them in!!
> I think this is a very dangerous precedent and completely unwarranted.
> mega-patches are bad and is totally not the Apache way to go. I think if you
> want to contribute it back to Apache, you should avoid the mega-patch
> completely.
>  I think the various 20 append patch lines may be fine for specialized
>> hbase clusters, but they doesn't have the rigor behind them to bet your
>> business in them.
> I think you are completely off-track here and jumping to conclusions. Big
> business are already betting on it. HBase is becoming a big user of Hadoop
> (dunno whether Y! uses HBase) and I completely agree with Ian that all
> business have to anyway test their release themselves before using it,
> otherwise you could land up with data loss like the type you mentioned.
> thanks,
> dhruba

Milind Bhandarkar

View raw message